February 14, 2014 at 4:18 am
Hi
Following on my query about tempdb allocation contention, I thought I'd follow up with more detail in case people have any pointers.
We've a complex overnight process; a job with 65 steps. One step, which starts at 4am, can take (usually takes!) 25 minutes, but occasionally takes 90-120 min. No consistency as to when the time extends; nothing else happening SQL wise at the same time.
The step itself inserts 138 rows into a temp table, then does an insert from a select. The select itself is complex (10 inner joins, with a couple of INs and Cases thrown in). Not only that, it does one sum, grouped on a further 26 rows. So tempdb will be hit a lot.
The execution plan shows a subtree cost of 410, and the potential to create on index, albeit that has a potential saving of only 14. The resulting dataset/insert is 6.5m rows.
I can run the select during the day - at any point - and it takes c09:48, so i don't think I can do much saving on the select (but I'm willing to be convinced!).
Tempdb now has 8 datafiles, on it's own disc, which is in a group of 128 *15k discs. According to spotlight there are IO issues at 4-5am, but not what I'd call that significant. And when I run the script during the day there are no issues at all.
Any ideas? Would it be a VM issue - back up of servers say? Something else? Really running out of ideas.
Many thanks
pete
February 14, 2014 at 6:00 am
It sure sounds to me like the query could be tuned, but as to the long running stuff, I'd assume contention on resources. While it's running, have you collected wait statistics? Have you watched the server during this period to see if that process is blocked by other resources? Just because not much is happening doesn't mean absolutely nothing is happening, and this sounds like a resource contention issue.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
February 14, 2014 at 6:19 am
I've written a few scripts - tempdb is obviously being hit quite heavily, with a massive IO spike across all 8 datafiles at that particular point. The top queries at that time - or rather before hand - are pretty small, and shouldn't really be causing an issue. Page life expectancy is still over an hour, CPU is only 27%, there are no blocks; the only red on my spotlight (vainly trusting technology!) is the I/O waits on tempdb.
As I say, there doesn't appear to be anything SQL wise hitting the system (there are surprisingly a few I/O waits for other DBs that nothing official is happening on).
Looking at the windows side of spotlight - again, nothing - solidly green throughout the period in question.
Hence my thought it may well be a SAN issue, rather than SQL.
February 14, 2014 at 6:25 am
Can you monitor execution plans (via extended events), get a plan from a fast execution and a plan from a slow execution?
Can you monitor/record wait stats per query during its execution (poll sys.dm_os_waiting_tasks) or extended events and see what the query is waiting on when it executes fast and what it's waiting on when it executes slow?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
February 14, 2014 at 6:31 am
Good idea - I'll set up an Xevent today - see if it can tell me something.
Course, bound to be speedy tonight.
pete
February 14, 2014 at 9:46 am
Not much to go on here, but first thing I would do is put OPTION (RECOMPILE) on that big mess of a query. If that isn't a repeatable magic bullet to keep you at the 25 minute run time then you will likely have to go in and break the big query down into one of more temporary table-based (NOT table variable) intermediate result sets. I have done this over and over for clients to often pick up 3-5 orders of magnitude performance improvements PLUS gain normalcy on the execution times. You may wish to do this anyway because that 25 minutes may only need to be 3 minutes!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 14, 2014 at 9:52 am
funnily enough I've been thinking along the same lines, but the joins are so complex, with the tables at the bottom of the script joining four or five others further up, it may be tricky.
What I've thought this afternoon is the insert bit of the script. Trying to insert 6m rows onto a table that has had indexes (indicies?) rebuilt that step before is ridiculous; one of which had a fillfactor of 0/100. The longer serving members of staff insist they're needed, as the main table reference for the select part of the insert is the table that gets inserted into!
I'll do some checking next week, with index hints, to see whether the degradation of performance experienced by removing the index is gained back by inserting into a heap with none.
February 14, 2014 at 10:48 am
peter.cox (2/14/2014)
funnily enough I've been thinking along the same lines, but the joins are so complex, with the tables at the bottom of the script joining four or five others further up, it may be tricky.What I've thought this afternoon is the insert bit of the script. Trying to insert 6m rows onto a table that has had indexes (indicies?) rebuilt that step before is ridiculous; one of which had a fillfactor of 0/100. The longer serving members of staff insist they're needed, as the main table reference for the select part of the insert is the table that gets inserted into!
I'll do some checking next week, with index hints, to see whether the degradation of performance experienced by removing the index is gained back by inserting into a heap with none.
It is likely because the query is so complex that you really need to break it down. Small anomalies in row count estimates can get compounded into gross mis-matches, leading to suboptimal plans and significantly longer runtimes and total effort.
I just posted to another thread on SSC.com about how very few times I have found indexes on temp tables to be more efficient for the entire process than having them without. I also question, like you do, having the indexes on the table in the first place before putting in the millions of rows of data.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
February 27, 2014 at 9:49 am
In case anyone's still interested (!) I've pretty much worked out what the issue was; after fully dissecting the code (as I say - there's a lot; 65 steps, with some of the SPs running into 2k lines) I worked out one of the main tables was deleting 38m rows (out of of 41m), then reinserting them again, having been recalculated.
Obviously the SP that referenced the [now completely new] table had been optimized for different data. Adding the Recompile - either as "with recompile", or "option (recompile)" - for the complicated ones - does seem to have made all difference. It also explains why the scripts ran so fast when taking the SPs apart.
So far all seems okay. We're four days into the new regime and the job that had been taking up to 9.5 hours is now back under 7. Amazing difference.
Course, still savings to be made on other areas.
Thanks all though
February 27, 2014 at 9:52 am
peter.cox (2/27/2014)
In case anyone's still interested (!) I've pretty much worked out what the issue was; after fully dissecting the code (as I say - there's a lot; 65 steps, with some of the SPs running into 2k lines) I worked out one of the main tables was deleting 38m rows (out of of 41m), then reinserting them again, having been recalculated.Obviously the SP that referenced the [now completely new] table had been optimized for different data. Adding the Recompile - either as "with recompile", or "option (recompile)" - for the complicated ones - does seem to have made all difference. It also explains why the scripts ran so fast when taking the SPs apart.
So far all seems okay. We're four days into the new regime and the job that had been taking up to 9.5 hours is now back under 7. Amazing difference.
Course, still savings to be made on other areas.
Thanks all though
Congrats - good to hear it is better.
Jason...AKA CirqueDeSQLeil
_______________________________________________
I have given a name to my pain...MCM SQL Server, MVP
SQL RNNR
Posting Performance Based Questions - Gail Shaw[/url]
Learn Extended Events
February 27, 2014 at 10:00 am
peter.cox (2/27/2014)
In case anyone's still interested (!) I've pretty much worked out what the issue was; after fully dissecting the code (as I say - there's a lot; 65 steps, with some of the SPs running into 2k lines) I worked out one of the main tables was deleting 38m rows (out of of 41m), then reinserting them again, having been recalculated.Obviously the SP that referenced the [now completely new] table had been optimized for different data. Adding the Recompile - either as "with recompile", or "option (recompile)" - for the complicated ones - does seem to have made all difference. It also explains why the scripts ran so fast when taking the SPs apart.
So far all seems okay. We're four days into the new regime and the job that had been taking up to 9.5 hours is now back under 7. Amazing difference.
Course, still savings to be made on other areas.
Thanks all though
For the 38/41M row delete, I would drop all indexes, do the delete, add data, recreate indexes if possible. That could substantially improve overall performance and tlog size too.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply