May 18, 2010 at 12:45 pm
I have two queries.
Both queries have the exact same FROM and WHERE clauses, the only difference between the two is the SELECT list of columns. Query 1 is pulling a bunch of columns that aren't pertinent to the report that they're for, while Query 2 has been slimmed down to only the columns which will be used in the report. Again, exact same FROM and WHERE statements.
Here's the kicker: When I run Query 1, it takes a whopping 1 second to run. However, when I run the slimmed-down Query 2, it takes 20 seconds!
The only thing I can see in the select lists between the two is that Query 1 happens to have included in it's select list both Primary Keys on one of the major tables it's querying from. So the question is, should I start including primary keys in the SELECT list on queries in order to improve performance of that query, even if the column isn't being used in the report it was created for? Or am I just making assumptions here?
Many thanks!
May 18, 2010 at 12:54 pm
I would take a look at the executiuon plan in management studio and see what is happening. I suspect that for some reason Query 2 is doing a table scan. are there any parameters in Query 1 that are not in Query 2?
Dan
If only I could snap my figures and have all the correct indexes apear and the buffer clean and.... Start day dream here.
May 18, 2010 at 1:04 pm
No parameters in either query (both are just using the same hardcoded date). The execution plan for Query 1, the quick one, shows a clustered Index seek, cost 86% while Query 2 shows the same but cost 68%. There's an additional Hash Match Aggregate in #1 that's shown as just a "sort" in #2, both have cost 0%.
(I don't have much experience reading/understanding EPs very well, as I'm sure you can tell).
I've actually narrowed down the list of SELECT items so that these two columns are the culprit. Like I said, they're both the PKs for one of the major tables being queried. If I add these two columns to the slow query, it runs in 1 second. (It has to be both, having just one still has the query running at 20 seconds.) Does this make any sense?
May 18, 2010 at 1:07 pm
Can you post the plans? Saved as .SQLPlan and Zipped together, we might be able to see something. Maybe you get a covering index for the first query.
Cheers,
J-F
May 18, 2010 at 1:09 pm
I wouldn't be surprised if you see an operator in the execution plan called Key Lookup. If it's there, you're looking at an instance of a covering index and a non-covering index. These are absolutely predicated on the columns in the query, including in the SELECT list.
But, that's a guess without seeing the execution plans. If you want to post those, I can be more specific.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 18, 2010 at 1:10 pm
Ok, here are the plans.
Hey, thanks for checking this out, by the way. I'm just a little confused by this. Glad to know someone out there might have an inkling of what may be going on.
🙂
May 18, 2010 at 1:23 pm
Are those queries on the same server? If no, then I'm guessing 1 of the server needs to have its statistics updated. You can see 1 of the plan seeks on the "Events" table 2 500 000 times, where the other only about 600. This is definitely your problem, and I think an Update statistics might help.
Therefore, I'm saying this silently because I've seen a post from Grant just there, and I just hope I don't get flamed for saying this! 😉
I'd use the following
UPDATE STATISTICS YourTable with FULLSCAN
See what this does.
Cheers,
J-F
May 18, 2010 at 1:27 pm
So...if I confirm that both queries are running on the same server, does that alter what I'd need to do next? (Because they are both running on the same server)
Same exact queries, only Query1 runs 1 second and has 2 columns which happen to be primary keys on the Events table, Query2 runs 20 seconds and does not include those 2 columns. Everything else is the same.
Is this just a strange occurrence or does it make any kind of sense?
May 18, 2010 at 1:30 pm
Are the columns SchID and EVId? Because for those plans, they were both commented out when you ran the query. Can we have the plans when those columns are not commented out?
Cheers,
J-F
May 18, 2010 at 1:33 pm
Those are indeed the columns. They were just commented out in Query2 to demonstrate that if i don't pull both columns, it takes 20 seconds. (I could've just removed them altogether, but just commented out so i could go back and forth on reviewing performance more easily).
Query 1 has all these non-essential columns, but when I pare it down to the essentials (in #2), it takes longer. If i add back in just those two columns to #2, it works great.
May 18, 2010 at 1:36 pm
So sorry, I copy pasted the same query twice in SSMS, it is my error, you are right, on the first query, the columns are in the select list. Sorry bout that,
Cheers,
J-F
May 18, 2010 at 1:39 pm
So now, we know that adding columns to the column list, the Optimizer chose a whole different approach to seek the Events table, can you update the stats on this table, and also the Booking table?
If it's a production server, it might slow things down a bit. If so, I propose you work in a testing environment.
Cheers,
J-F
May 18, 2010 at 1:47 pm
The difference I'm seeing is in where the DISTINCT operation is applied. Of those two plans, the faster plan has a HASH aggregate while the other does a completely useless sort and then does a DISTINCT sort right before the end. I think that's the key. That very complex select list can't be aggregated through the DISTINCT until the end, but the simpler list is aggregated sooner, and on a smaller set of data.
I'd guess the first question is, do you really need the DISTINCT? If so, why? DISTINCT is frequently a crutch to cover up bad data structures or a bad habit that people get into or a way to cover up missing or incomplete JOIN criteria (occasionally, you do need it). The other thing I'd do is add the Booking. SchId column to the idx_BookingData index as an INCLUDE column. that would eliminate the key lookup operation, quick & easy. That might help (but it's not a major concern with the query).
I'm pretty sure the problem comes down to the size of the SELECT list and the DISTINCT operation.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 18, 2010 at 1:51 pm
J-F Bergeron (5/18/2010)
Are those queries on the same server? If no, then I'm guessing 1 of the server needs to have its statistics updated. You can see 1 of the plan seeks on the "Events" table 2 500 000 times, where the other only about 600. This is definitely your problem, and I think an Update statistics might help.Therefore, I'm saying this silently because I've seen a post from Grant just there, and I just hope I don't get flamed for saying this! 😉
I'd use the following
UPDATE STATISTICS YourTable with FULLSCAN
See what this does.
ARE YOU CRAZY?!?!? UPDATING STATISTICS WITH FULL SCAN...
Actually, that's not a bad idea, at all. As you noticed, estimates & actuals are more than a bit off. The last hash match, for example, has an estimate of 50000+ and an actual of 58 on the good query, and it's 180000+ to 22000 on the bad one. That really could be stats.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
May 18, 2010 at 1:54 pm
Grant Fritchey (5/18/2010)
ARE YOU CRAZY?!?!? UPDATING STATISTICS WITH FULL SCAN...
Actually, that's not a bad idea, at all. As you noticed, estimates & actuals are more than a bit off. The last hash match, for example, has an estimate of 50000+ and an actual of 58 on the good query, and it's 180000+ to 22000 on the bad one. That really could be stats.
Phew, you had me for a minute there... Well, Glad I'm on the right track.
I agree the distinct burned my eyes there, but I'm curious why adding more fields to a distinct keyword could help the optimizer make his choice? I would've thought it would be even harder to choose the distinct values with more selected fields.
Cheers,
J-F
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply