April 21, 2017 at 10:51 am
We recently experienced some email anomalies that I'm trying to figure out. Up until about 2 weeks ago, we were running a SQL Agent job which dispatched 400 emails, one every 4 seconds, using sp_send_dbmail( ). The process was designed before I got here, and about a year ago I found out (through some trial & error) that the 4-second interval was the magic number. If you tried to send emails on a shorter interval, SQL Server would perceive some of the emails sent after the 25th-50th email as failures, re-queue them, and send them again. The problem was, the initial email wasn't failing. It was making it through to the recipient, and so, sending emails on a shorter interval actually resulted in duplicate emails. My best description for this behavior is that once SQL Server starts re-queuing emails, you quickly fall into a negative feedback loop, with new email requests vying within the queue against the growing number of failed emails. I have no explanation for why the emails themselves are actually being sent, though, despite SQL Server thinking that they've failed.
So, two weeks ago we started experiencing duplicate emails even on the 4-second period. We've figured out that by pushing things out to a 20-second period, we get the breathing room we need between emails. We've brought in the entire team, and no one can think of anything significant happening on the particular day that everything went awry.
I was able to inspect the email process, before & after this event, via sysmail_allitems, though (and just as a note, I'm focusing on only non-duplicate emails ... aka emails that SQL Server doesn't perceive as having failed). Before the issue, the difference between send_request_date and sent_date was 0-4 seconds. After the issue, I still see some 0-4 second intervals, but just more than half bear another common interval of 30 seconds.
There's a suspicious Exchange receiver setting that defaults to 30 settings ("MaxAcknowledgementDelay"), and apparently we can disable it by setting it to 0. We're in the process of acquiring some packet inspection tools first, though, to see if there's something network related happening.
Has anyone seen this before, and if so, can you offer up your solution?
Thanks,
--=Chuck
April 21, 2017 at 2:54 pm
I know of a couple of people who worked around similar issues with the timeout setting mentioned in the comments. It seems it would need the right combinations of retry and timeout settings for database mail configurations as well so you might need to test that part out in a non-prod environment. You may also want to try setting the logging level to verbose in Database Mail - change system parameters.
A lot of times though, these can be more Exchange issues than Database Mail. I would think there would be information logged in one of the exchange logs - smtp traffic has specific logs although I don't remember the names. And there are other tools on the Exchange side for troubleshooting.
Sue
April 25, 2017 at 4:01 pm
We tried changing that timeout parameter, and it didn't make a difference. Bummer.
I'm still waiting on the packet inspection to happen. Apparently we needed to purchase some additional hardware to make it work. Our current approach wasn't gathering all of the email traffic. I'll report back whether or not we find anything.
--=Chuck
Viewing 3 posts - 1 through 2 (of 2 total)
You must be logged in to reply to this topic. Login to reply