January 23, 2015 at 8:50 am
I have a job that has run without problem for months. It runs on both our test/dev server and our production server. The databases and jobs are identical in both environments.
A few days ago, the job stopped working in the production environment. It still works fine in the test/dev environment.
The job runs for Stored Procedures. I was first alerted to the problem when someone notified me their query wouldn't work. I found the job, which normally takes about 12 minutes had been running for 3.5 hours. It had hung on the third SP. It runs every 3 hours and had run fine the previous scheduled time.
I killed the job and since, cannot even do the first SP. The first SP has four steps: 1. Retrieve data from a linked server into a temp table (works fine). 2. Delete data from a regular table where the date range matches that retrieved from the linked server. 3. Insert the data retrieved into the regular table. 4. Rebuild indexes.
The data retrieval from the linked server works find and I can see the data in the temp table, query it, etc. However, when it goes to delete step, nothing seems to happen. It uses CPU, but a process that normally takes 3-5 seconds (based on history), will go forever it seems. I've let it go as long as 45 minutes before cancelling the process.
dbcc checkdb shows no errors. I have restarted SQL Server, then rebooted the physical server, checked hard disk space, rebuilt all indexes, examined memory and so forth. I have no clue where to go next.
After rebooting the physical server, I immediately retried the first SP before anyone or any other processes could cause a contention issue and still had the same results.
Any suggestions?
Thank you.
January 23, 2015 at 8:56 am
Probably the query plan has changed, due to increase in the number of rows.
Try to examine the execution plan for the query that appears to run forever and see if an index can help.
-- Gianluca Sartori
January 23, 2015 at 9:03 am
I concur. Plan change is good possibility. Another is blocking. Use sp_whoisactive to find that. That free tool can also show you a LOT about the query as it is running, including the query plan.
Also, depending on the fraction of data normally deleted, I would look into DROPPING all NC indexes then doing the delete and simply recreating the indexes. For larger fractions of data this is WAY faster and results in much less tlog activity. Note that it could adversely affect other concurrent access though.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 23, 2015 at 9:25 am
Thanks for the suggestions. This is very helpful.
Mark.
January 23, 2015 at 9:50 am
Interesting solution:
Changing the order of processing solves the problem.
Original:
1. Retrieve all data to be inserted to @table
2. Delete from regular table based on date range in @table
3. Insert @table data into regular table
Revised:
1. Determine date range to be deleted
2. Delete data from regular table
3. Retrieve data from linked server into @table
4. Insert @table data into regular table
The @table that was being populated had roughly 140,000 rows while the regular table as about 3.8 million rows. All I can guess is that some kind of locking was occurring causing problems.
Thanks for your help and suggestions.
Mark.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply