August 9, 2014 at 2:01 pm
GilaMonster (8/9/2014)
faisalfarouqi (8/9/2014)
I would also request that if some suggestions around refactoring or maybe reordering of joins could be made to retrieve data quickly.Reordering of joins won't do a thing, the order of joins does not matter. Refactoring's not currently indicated, the obvious problem needs to be addressed first.
I know this isn't supposed to matter, but I recently had to deal with a query where this actually did matter. The plan said it was FULL (instead of TIMEOUT), but I doubt it. The only difference was moving an INNER JOIN around in the FROM clause and the query plan came out wildly different, as did the timings on the query (10 seconds vs. 5 minutes).
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 9, 2014 at 2:36 pm
Your move of the join changed the way the optimiser 'wandered' through the search space, resulting in it finding a different 'good enough' plan.
I say not to do it, because it's in no way deterministic, you could change something else (swap two columns in the group by) and have the same effect. If the optimiser did a complete search of the plan space, this would never happen, but since no one wants to spend days waiting for a plan, it can.
Remember the optimiser is NEVER looking for the best plan, it's looking for a good plan fast. In one case if found a good plan that was actually crap, in the other it found a good plan that was good. In that case, I'd be looking over the query to see if there's anything in it which is prone to cause the optimiser problems.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 9, 2014 at 2:54 pm
GilaMonster (8/9/2014)
Remember the optimiser is NEVER looking for the best plan, it's looking for a good plan fast. In one case if found a good plan that was actually crap, in the other it found a good plan that was good. In that case, I'd be looking over the query to see if there's anything in it which is prone to cause the optimiser problems.
An excellent reminder, thanks. Due to the nature of the concern I actually took the query apart completely and rebuilt it using a temp table mid-point with indexes to control the optimizations and expectations.
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
August 11, 2014 at 3:29 am
GilaMonster (8/9/2014)
Update the stats, run the query again and post the new actual exec plan please. Currently the incorrect row estimations are messing everything up, it's not going to be possible to provide useful advice until that glaring problem is fixed and the execution plan is sensible for the rows involved.According to the plan you posted, the query returned 280093 rows in total. That 8.3 million was the actual rows on the index seek and the number across the plan before the aggregation.
[font="Comic Sans MS"]Hi Gail, As per your suggestion I've updated the stats for that particular table and I've seen tremendous improvements. The estimates are now pretty good, and the query is returned in almost a minute whereas it used to take around 21 minutes prior to updating stats. I've attached the actual exec plan after updating the stats. Now, the other problem is I cannot frequently update stats for this db, as these approx 5 TB in size and is busy with transactions. Is there a way I can deal with this VLDB, so I can keep stats up to date and defrag the indexes.
Regards,
Faisal
[/font]
August 11, 2014 at 4:20 am
You *need* to run frequent stats updates. There's no way around that, you can either run frequent stats updates or have terrible performance. Index defrags, not such a critical issue, but the stats updates are essential. Daily would be good. Is the server active 24x7?
Do you want the query tuned further or is the minute execution acceptable?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
August 11, 2014 at 4:40 am
In your very first post you mentioned the result is used by people looking at it.
It's rather unusual to present 280k rows to a user...
So I need to ask: what are the 280k rows used for? How are those rows further processed before they're presented to the end user?
August 11, 2014 at 5:01 am
GilaMonster (8/11/2014)
You *need* to run frequent stats updates. There's no way around that, you can either run frequent stats updates or have terrible performance. Index defrags, not such a critical issue, but the stats updates are essential. Daily would be good. Is the server active 24x7?Do you want the query tuned further or is the minute execution acceptable?
[font="Comic Sans MS"]Thanks! a lot Gail. It's acceptable as of now at least I could say it is far more better than a 24 minute run, but in case if something can be tweaked to reduce it from a minute to secs then it will be much appreciated, and the end users especially the CEO's and VP's will be very happy and this could impact me in many ways.
In case if you can guide me to reduce it to secs then that would be fantastic...totally awesome.
Regards,
Faisal
[/font]
August 11, 2014 at 5:05 am
LutzM (8/11/2014)
In your very first post you mentioned the result is used by people looking at it.It's rather unusual to present 280k rows to a user...
So I need to ask: what are the 280k rows used for? How are those rows further processed before they're presented to the end user?
[font="Comic Sans MS"]Its not pretty unusual to fulfill the reporting needs from the end user perspective. I am not sure how they do it from the application side of it, but they run some reporting tool to fetch these data. Thanks! LutzM, but I believe my issue is fixed, so I only require to reduce the execution time from a minute or two to secs now, and if that happens then it would be totally awesome.
Regards,
Faisal
[/font]
August 11, 2014 at 5:22 am
Sometimes it's essential to know what the application will do with the data.
Let's just assume a scenario where they would just filter out 10% of the data and throw the remaining rows away.
In that case it'll be much more efficient to just return those 10%.
This would result in a WHERE clause provided by the application to know what filter needs to b applied.
You might also want to update the stats for the other tables involved (especially dbo.DATE_DIM) since the estimated and actual data size are drifting away, too.
Other than that, processing about 6GB of data takes some time...
Would it be an option to pre-aggregate the for [CDW].[dbo].[AGGR_ACTIVE_EE_SUBSCRIPTION_METRICS] before joining all those other tables, maybe even in a separate staging table?
And what I just noticed: Why is that huge table a HEAP? Is there any particular reason not providing a clustered index?
Viewing 9 posts - 16 through 23 (of 23 total)
You must be logged in to reply to this topic. Login to reply