July 31, 2013 at 4:41 pm
I'm having a very strange problem related to Database Mail. I wonder if anyone can help me with.
I have a SQL Server 2008 database that supports our help desk. Users create trouble tickets, which send out various emails.
To send an email, my app inserts a record into a ticket email queue table. The table contains a createdate field autopopulated with getdate(), and a senddate field that's initially null. Meanwhile I have a SQL job that runs once a minute, its job being to execute a stored procedure. The SP queries for all queue records having a null send date then loops thru the result set, sending each email in turn. Our helpdesk is pretty busy, so typically multiple records will be inserted into the queue in a minute. Normally, you will see that each went out within a minute or two of being put in the queue.
This morning, however, the email job history suddenly began reporting that the job failed. The reason given was that I needed to specify @recipient or @copy_recipents or @blind_copy_recipients. My SP specifies @recipients.
Then I noticed that even tho the job was failing, some emails were going out. Then I noticed that the time between the email queue record's createdate and senddate was growing. The difference grew from the usual 1-2 minutes to, at this moment, three and a half hours. I could see too from the senddate field that exactly 1 email is being sent per minute. So I'm thinking that the SP starts working thru its result set, sends the first email OK, then dies, causing the job to report failure.
I have made no changes to the SP, job, or anything else in ages. I have no idea how it happened or how to fix it. Restarting SQL Server did not help.
Can anyone shed some light?
August 1, 2013 at 6:28 am
UPDATE: The lag between createdate and senddate continued to grow overnight till it was about 9 hours. We rebooted the server this morning, but again, no change.
I noticed that if I manually execute the SP, it gives me a bunch of errors but actually sends out all the queued emails. The lag time between createdate and senddate returns to the 1-2 minute range, but grows again every few minutes. As long as I devote myself to manually running the job 24/7, everything will be great!
Here is the message I get when I manually run the SP:
Msg 14624, Level 16, State 1, Procedure sp_send_dbmail, Line 260
At least one of the following parameters must be specified. "@recipients, @copy_recipients, @blind_copy_recipients".
Msg 220, Level 16, State 2, Procedure SendEmail, Line 52
Arithmetic overflow error for data type tinyint, value = 256.
The statement has been terminated.
I created a new job and SP with the same settings thinking something may have got corrupted, but I get the same problem. I can't think of anything else to try.
August 1, 2013 at 6:41 am
Just now came across the solution:
http://www.sqlservercentral.com/Forums/Topic1031950-391-1.aspx
August 1, 2013 at 6:44 am
Sounds to me like you have no good indexes on the mail queue (table) and so the query to find new records and the query to update the ones you send is taking longer as the mail queue grows bigger and bigger. What is your indexing scheme? More than likely your query to find new rows is something like: "Where senddate is NULL" which can end up being a table scan and can escalate to a table lock if the table gets big enough, locking out other processing. Without seeing your code it is hard to tell what might be happening.
It would be better to have a separate (non-null integer) status column on your table that can be indexed effectively. Set to 0 on initial insert, updated to 1 on successful processing or 2 for failure etc. along with the senddate. I assume you have a unique ID (PK) on each message in the queue also so your updates will be efficient and use row locking only. Assuming the senddate and the status column are the only ones you are updating the updates will be in-place.
If you are implementing a cursor to pull all the rows consider first making a snapshot table of IDs of pending rows and work through that. You might also consider aging off completed records to a log table to keep the work queue small as well, or perhaps set up a partitioning scheme.
The probability of survival is inversely proportional to the angle of arrival.
August 1, 2013 at 6:49 am
Thanks very much for the reply. I will look at applying your suggestions. I'm self-taught and it sometimes shows.
Just a moment ago I came across the solution to my immediate problem: I had a few records with a blank send-to line. Deleting those cleared the problem up.
August 1, 2013 at 6:53 am
dzoll (8/1/2013)
Here is the message I get when I manually run the SP:Msg 14624, Level 16, State 1, Procedure sp_send_dbmail, Line 260
At least one of the following parameters must be specified. "@recipients, @copy_recipients, @blind_copy_recipients".
Msg 220, Level 16, State 2, Procedure SendEmail, Line 52
Arithmetic overflow error for data type tinyint, value = 256.
The statement has been terminated.
I created a new job and SP with the same settings thinking something may have got corrupted, but I get the same problem. I can't think of anything else to try.
I think the main error is in your procedure SendEmail; it's failing at line 52; i would not be surprised if that failure passes null to sp_send_dbmail, resulting in a cascading kind of error.
the error in SendEmail is clear, a tinyint variable received a large value...changing the data type would probably fix it.
post the actual procedure if you want a little peer reviewe on it, we love that kind of stuff.
Lowell
August 1, 2013 at 6:54 am
dzoll (8/1/2013)
Just a moment ago I came across the solution to my immediate problem: I had a few records with a blank send-to line. Deleting those cleared the problem up.
You might want to add a check constraint on the SendTo column to check for Empty, or invalid email addresses.
Glad you found the issue, good luck.
The probability of survival is inversely proportional to the angle of arrival.
August 1, 2013 at 6:56 am
Thanks for the reply, Lowell. I'm happy at the moment after finding that my main problem was triggered by a blank send-to line in one of the records.
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply