December 1, 2011 at 12:37 pm
I am a very large database specialist and came across very peculiar SQL Server behavior on which I'd like your input.
The following query runs for 9 hours and 40 minutes.
set nocount on
DECLARE @Time datetime2
Set @Time = '2009-01-01'
select case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end as BinMean,
COUNT(r.DeviceTimeStamp) AS Occurrences FROM RawIMC r JOIN FileSummaries f ON r.DeviceTimeStamp BETWEEN f.StartTime AND f.EndTime
WHERE r.mshtq_in_cml = 1 AND f.StartTime < @Time
GROUP BY case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end
ORDER BY BinMean
Table RawIMC contains 420 million rows, FileSummaries 14,100 and FileSummaries.mshtq_in_cml=1 returns 2,500 rows. The join returns about 70 million rows in RawIMC. DeviceTimestamp is a clustered primary key on RawIMC. All rows in RawIMC.DeviceTimeStmap are < @time.
The execution plan is attached.
When I add the following hint, the execution time is reduced to 1 minute 30 seconds.
OPTION (OPTIMIZE FOR (@Time = '2008-09-03 16:10:00'))
I ran both and compared the outputs: they are identical.
Ordinarily I don't use the OPTIMIZE hint because I don't see any benefit materializing by overriding SQL's design of the execution plan perhaps because my databases' architecture has been proven over the years to be very efficient, but why OPTIMIZE would have this huge impact under any circumstances is not clear to me.
Repeated runs on the same box without the HINT range between 7 hours+ and 9 hours 40 minutes and with the HINT from 33 seconds to 1 min 30 seconds. If I can learn something from this I might be able to improve the performance of my 3 billion+ row tables and multi-terabyte databases.
Any ideas?
Thanks
Nico
December 1, 2011 at 1:13 pm
One of the (many) problems in your execution plan provided is that the optimizer is expecting to get 707 records from FileSummaries but is instead getting 28,296. Because it's expecting 707 records it anticipates being able to do things in a certain way that doesn't work so well when there are 40 times as many records as it was expecting. Putting in the hint helps it to have a more realistic estimate and therefore a more realistic plan. It would be possible to be more specific if you also attach the second plan so they can be compared and contrasted.
As a side issue, I would highly suggest looking at some warehousing or pre-aggregation strategies so you're not reading hundreds of millions of records repeatedly. With the correct architecture you're query would be capable of running in a time much less than the 30+ seconds if your hardware is any good.
December 1, 2011 at 1:14 pm
nico van niekerk (12/1/2011)
I ran both and compared the outputs: they are identical.
Did you compare the two final execution plans as well? That's where the keys to the puzzle will be more visible. I'd like to see the actual plan for the 'optimized for' plan as well. This one has it commented at the tail of the statement.
To help you nail this down, are you allowed to post the schema and index definitions to these two tables as they exist currently?
One of the key things here is off the lazy spool. It's estimating 707 rows, it's coming back with just under a trillion. Let's just say this is not going to go well, particularly for the memory allocation. Have you monitored the perfmon while this is running? Do you have a tremendous volume of swap file work going on?
My first guess would be it's generating a re-usable plan and is ending up with some bad parameter sniffing, but usually a local variable and this scripted wouldn't cause the level of problems you're seeing for that (local variable in the proc is a 'fix' for parameter sniffing). What is your usual maintenance look like and when was the last time statistics were updated on this table? Auto-update stats only kick off when 20% of the table has been affected and that's a LOT of rows on something this size. When you see gross miscalculations in the estimated vs. the actuals, that's usually a hint to go do a full_scan on your statistics.
Ordinarily I don't use the OPTIMIZE hint because I don't see any benefit materializing by overriding SQL's design of the execution plan perhaps because my databases' architecture has been proven over the years to be very efficient, but why OPTIMIZE would have this huge impact under any circumstances is not clear to me.
Without getting more into your data's heuristics and seeing both plans, it'll be hard to give you a particular reason in this case. Optimize For I've found is usually more useful in more complex queries, where there's too many moving parts for the optimizer to make all the 'best' guesses. It's not so much an override in my mind (like INNER HASH JOIN would be) as it is a way of telling the optimizer what it's most common use case will be.
Repeated runs on the same box without the HINT range between 7 hours+ and 9 hours 40 minutes and with the HINT from 33 seconds to 1 min 30 seconds. If I can learn something from this I might be able to improve the performance of my 3 billion+ row tables and multi-terabyte databases.
The only part of this that's this shocking to me is that a 3 month difference would produce such odd results. It tells me that the optimizer is planning on a full dump from the parameters, where the date you're optimizing for probably restricts that dataset down to a more reasonable level. However, it's hard to know for sure without knowing more about the data-subsets.
The Optimization Level is full, so it's not like this is timing out determining the best internal optimization path. My personal method of approach would start with a statistics update on all the involved tables (with Full_Scan) and then a direct review of the two actual execution plans after that activity.
From there, we might dig into the indexes directly, data distribution, etc.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 1, 2011 at 1:35 pm
Nico,
I would also like to see the actual query plan with the OPTIMIZE FOR not commented out. I'm with Craig on this. With such a dramatic difference between them, it's probably using a cached plan that is not optimum.
See if you get the same results with OPTION (RECOMPILE).
Todd Fifield
December 1, 2011 at 1:37 pm
Of course, I should have attached the OPTIMIZED execution plan, too. Just an oversight, but here it is. This completed in 38 seconds.
The almost 1 trillion rows that it expected is perhaps the key to the solution.
I don't have history on these tables as I was called in to optimize the hours-long run and was stunned with the impact of the HINT. There are no other indexes on RawIMC but the clustered primary key, so the execution plan generator should have very little to choose from.
I am just waiting for a conclusion to the feedback from the forum and then I'd like to play with the statistics.
Thanks for the feedback so far.
Nico
December 1, 2011 at 1:58 pm
nico van niekerk (12/1/2011)
Of course, I should have attached the OPTIMIZED execution plan, too. Just an oversight, but here it is. This completed in 38 seconds.
Been there, no worreis.
I am just waiting for a conclusion to the feedback from the forum and then I'd like to play with the statistics.
I AM THE FORUM HEAR ME... err, wait. I'm just a voice lost in the woods too. 😉
Alright, a couple of things offhand. Your estimates are still way off, in particular with the Clustered index seek on FileSummaries. Estimated: 3.2 Actual: 2358. This compound quickly, but we'll get back to that.
The Clustered index seek in the non-optimized version has a very different output volume with the equivalent predicates and seek predicates against the same index (2358 vs. 28296). Something's off here. This is more than just the optimize for, that modifies the estimates. The ACTUALS should not be changing.
The question then comes down to why are we getting 12 rows in one of the compute scalar pieces and 1 row in the 'optimized' version. It makes sense as you get more into understanding how the nested loops operator works, but that's the direct difference in # of rows as a multiplier.
So, how the heck do the two queries come up with different #'s of rows for a constant scan/compute scalar? The Table Spool has 12 rebinds. The optimized query does not require the table spool because it approaches the query in a different order off the nested loop involving RawIMC. This will in turn reduce the number of data re-approaches it has to take.
So, amusing as that distraction is, the primary cause is the Clustered Index Seek vs. the Index Scan on RawIMC. This is seeking correctly on index PK__RawIMC_Devicetimestamp_<randomGUID>. In the non-optimized version it's going against index idx_rawimc(mshtq).
All of this is generated because of the estimated number of rows from FileSummaries. In the optimized version it's 3.3. In the non-optimized it's 707. It's making its decisions on who goes first from there.
To avoid this hint necessity, You might think of doing a 'divide and conquer' routine. Bring the necessary FileSummaries data into a local #tmp table for your proc, and then from there go after the RawIMC data for your data return. You won't need to hint and you can make sure that things work in a logical progression. I do it pretty regularly (along with cursor-optimization to force multi-seeks instead of single scan) in huge DBs when it makes sense. This is one of those times.
I'd still get your statistics updated though.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 1, 2011 at 6:16 pm
Cool stuff, Craig.
I'd like to delve deeper into your observations before I update the statistics thereby destroying this particular scenario. Earlier I did pull FileSummaries into a table variable with only those rows that were required, but that didn't change anything. However, with a different perspective of the problem, I'll try that approach again.
What is certain at this point, is that I didn't pay much attention to anything with as little as 2,358 rows because it will cache it into memory anyway rather than subjecting it to I/O, was the thinking. That it would consider those rows as a multiplier is something I missed completely.
I'd like to work on the query construct to see whether a different query would prevent SQL Server from stepping through the 2,358 rows 423,960,000 times. (2,358 x 423,960,000 = 9.99698 x 10 ^ 11) If OPTIMIZE UNKNOWN would rectify the execution plan, the query might be off the hook. It's a very interesting execise, however.
I'll publish my findings.
Thanks.
December 1, 2011 at 6:46 pm
nico van niekerk (12/1/2011)
Cool stuff, Craig.I'd like to delve deeper into your observations before I update the statistics thereby destroying this particular scenario. Earlier I did pull FileSummaries into a table variable with only those rows that were required, but that didn't change anything. However, with a different perspective of the problem, I'll try that approach again.
Odd, did you index the #tmp?
I'd like to work on the query construct to see whether a different query would prevent SQL Server from stepping through the 2,358 rows 423,960,000 times. (2,358 x 423,960,000 = 9.99698 x 10 ^ 11) If OPTIMIZE UNKNOWN would rectify the execution plan, the query might be off the hook. It's a very interesting execise, however.
True, I enjoy puzzles like this, especially if you can avoid them blowing up production for a bit.
I'll publish my findings.
Cool.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
December 2, 2011 at 5:20 am
The optimization on the plan is Full, so this is a good plan. That suggests that that principal problem is with your statistics, not the structure of the query (to start with). I think it's exasperbated by the calculations in the group by routine. I'd look at pre-aggregating.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 2, 2011 at 1:22 pm
The original query's construct, somehow, allowed SS to rely on the statistics alone and not look at the query from a new perspective. The OPTIMIZE hint steered it in the right direction. The object of this exercise was to see if a query with a different construct would steer SS to look at the query rather than the statistics.
Just some comments on the feedback: Pre-aggregation wasn't an option, although it's a good strategy of which I am very fond, but we were chasing an execution time of around 10 seconds and pre-aggregating 420 million rows would trash that expectation before we even got close to the resultset.
The temp table was not #tmp but @Tmp, a table definition and one cannot create indexes on table definitions. Its residence in memory makes it faster than an indexed table, which is why I favored it in the final solution, and because it is very narrow and only contains about 2,500 rows. My previous use of a table definition to host FileSummaries, didn't change the query's construct and still left SS in the dark as to the intent of the query.
Here is the solution that I came up with which runs in between 9 and 12 seconds without using OPTIMIZE FOR with the same statistics.
DECLARE @Time datetime2
Set @Time = '2009-01-01'
declare @filesummaries table (starttime datetime null, [endtime] datetime null,mshtq_in_cml bit not null)
insert into @filesummaries ([starttime],[endtime],mshtq_in_cml) select starttime, endtime,mshtq_in_cml from filesummaries where mshtq_in_cml & 1 = 1 and StartTime < @time
select count(*) as Occurences,case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end as BinMean from rawimc r join (select [starttime],[endtime] from @filesummaries) f on r.devicetimestamp between f.[starttime] and f.[endtime]
group by case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end
order by BinMean
My best guess is that the JOINs in the first query didn't give SS a clear indication of its intent and it reverted back to the statistics, while this query is less ambiguous.
Thanks to everybody who contributed to this issue. It's much appreciated.
Thanks
Nico
December 2, 2011 at 1:24 pm
Table variables, not definitions. I age quicker than originally expected. 🙂
December 2, 2011 at 3:58 pm
nico van niekerk (12/2/2011)
The original query's construct, somehow, allowed SS to rely on the statistics alone and not look at the query from a new perspective. The OPTIMIZE hint steered it in the right direction. The object of this exercise was to see if a query with a different construct would steer SS to look at the query rather than the statistics.
as said before really the last, last, last, after the headbanging last resort option and with a gun to your head.
nico van niekerk (12/2/2011)
The temp table was not #tmp but @Tmp, a table definition and one cannot create indexes on table definitions. Its residence in memory makes it faster than an indexed table, which is why I favored it in the final solution, and because it is very narrow and only contains about 2,500 rows. My previous use of a table definition to host FileSummaries, didn't change the query's construct and still left SS in the dark as to the intent of the query.
So wrong it's not even funny. Do yourself a huge favor and read this amazing articles.
http://www.sqlservercentral.com/articles/Temporary+Tables/66720/
All tables can spill on disks, no expections.
Yes you can create an index in a table variable. You can create multiple unique constraints. You just need to reinclude the PK explicitely (it's there anyways). That allows you to build that extra index(es).
2500 rows is not narrow in any way. The consequence here always lies in how you use that table. If you do simple in flight logging, or return all rows at the end of the proc then no problem. But as soon as that table is used in a join, where, group by & many more cases you're dead to get a bad plan most of the time. Getting a good plan is actually more lucky than anything else.
nico van niekerk (12/2/2011)
Here is the solution that I came up with which runs in between 9 and 12 seconds without using OPTIMIZE FOR with the same statistics.
DECLARE @Time datetime2
Set @Time = '2009-01-01'
declare @filesummaries table (starttime datetime null, [endtime] datetime null,mshtq_in_cml bit not null)
insert into @filesummaries ([starttime],[endtime],mshtq_in_cml) select starttime, endtime,mshtq_in_cml from filesummaries where mshtq_in_cml & 1 = 1 and StartTime < @time
select count(*) as Occurences,case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end as BinMean from rawimc r join (select [starttime],[endtime] from @filesummaries) f on r.devicetimestamp between f.[starttime] and f.[endtime]
group by case when r.mshtq < 0 then cast(r.mshtq/10 as int)*10-5 else cast(r.mshtq/10 as int)*10+5 end
order by BinMean
What you have here my friend is the ONE single exception where a table variable in a join is beneficial. Assuming you only have pairs of begin/enddates that will <almost> NEVER overlap, then the BEST plan is to assume that the minimum # of rows will be returned, which is exactly what a table variable does. No amount of stats or unique constraints will let the optimizer know that you expect only 0 to 1 row in that join.
nico van niekerk (12/2/2011)
My best guess is that the JOINs in the first query didn't give SS a clear indication of its intent and it reverted back to the statistics, while this query is less ambiguous.
Thanks to everybody who contributed to this issue. It's much appreciated.
Thanks
Nico
No the difference is that with any other table stats will be taken and the estimates will be wrong, causing a bad plan.
I've wanted to toy with this for a while... maybe now's the time.
What I had tried in the past is a cross/outer apply with TOP 1 + order by PK on a temp table. The estimates where correct however the sort came with a huge cost. Maybe you've hit a gem on the head with this one.
Congrats!
December 4, 2011 at 7:05 am
nico van niekerk (12/2/2011)
The temp table was not #tmp but @Tmp, a table definition and one cannot create indexes on table definitions. Its residence in memory makes it faster than an indexed table....
This may not contribute to your solution, but the statements above can't remain unchallenged.
Table variables live in temdb just like temporary tables.
http://msdn.microsoft.com/en-us/library/ms190768.aspx
Although indexes cannot be explicitly created for table variables (using CREATE INDEX), table variables can have indexes to support PRIMARY KEY and UNIQUE constraints.
http://msdn.microsoft.com/en-us/library/ms175010.aspx
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
December 4, 2011 at 7:07 am
Whups, sorry Ninja. I just read back and saw where you had already dealt with it.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
December 4, 2011 at 7:24 am
The Dixie Flatline (12/4/2011)
Whups, sorry Ninja. I just read back and saw where you had already dealt with it.
You should be ashamed. It's the first time this happens to me on this site. Well today on this thread anyways :-D.
Tx for helping out.
Viewing 15 posts - 1 through 15 (of 46 total)
You must be logged in to reply to this topic. Login to reply