October 18, 2016 at 7:21 am
Hi guys
Similar to a previous thread I started last week about query performance I need to figure out why 2 queries, designed to do the same thing, are both returning in or around the same time as the other. One is using an Index Scan, the other is using an Index seek so I was expecting the 2nd to be quick.
The first takes in or around 51 seconds and returns 3.5m rows
The second takes around 47 seconds.
The queries and execution plans are attached along with indexes I've created on the table. If anyone can weigh in with any suggestions as to what else I can do to improve the performance please let me know.
Note: I've already reorganized the indexes on the tables
October 18, 2016 at 8:40 am
mitzyturbo (10/18/2016)
Hi guysSimilar to a previous thread I started last week about query performance I need to figure out why 2 queries, designed to do the same thing, are both returning in or around the same time as the other. One is using an Index Scan, the other is using an Index seek so I was expecting the 2nd to be quick.
The first takes in or around 51 seconds and returns 3.5m rows
The second takes around 47 seconds.
The queries and execution plans are attached along with indexes I've created on the table. If anyone can weigh in with any suggestions as to what else I can do to improve the performance please let me know.
Note: I've already reorganized the indexes on the tables
1) I do not see query plans. Just 3 text files and the second two don't show in my browser.
2) If you are really returning 3.5M rows to the client you cannot expect quick performance. Declare variables of the proper type for your 3 fields and then SELECT the fields into those variables. A neat trick to just time server performance on large row counts since the data never leaves the server.
3) There is nothing that says a very large iterative seek is going to be faster than a scan. In fact I can easily construct seek plans that will perform WAY WORSE than scan plans.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
October 18, 2016 at 8:46 am
I attached the two exec plans (Query1 and 2) as txt files as the UI wouldn't allow me to attach XML files
If I've done that wrong let me know.
This is a sub-query that I need my main query to select off, I won't be returning all 3.5 million rows. Potentially they could select ALL records requiring the query to search against the full 3.5 million rows
October 18, 2016 at 8:52 am
Sorry Kevin, I've deleted those txt's and added the sqlplan files now
October 18, 2016 at 9:08 am
The plan that uses seeks is actually quite a bit less efficient than the one that uses a scan, for a couple reasons.
First, the seek operator that is the outer input to the nested loop join (the one on IX_EntityLogUser) gets to seek to the start of the range of rows meeting the predicate, but then scans the range.
That's cheaper than scanning the entire index, but in this case nearly half the rows in the table are being returned anyway, so it's not so dramatically cheaper as you would expect if you were thinking singleton seek vs full scan.
See http://sqlblog.com/blogs/paul_white/archive/2011/02/17/so-is-it-a-seek-or-a-scan.aspx for a brief investigation of how the seek operator can be either a singleton seek or a seek+range scan.
Second, for each of the 3.6 million rows returned by that operator, a seek is performed against the IX_LogDate index. Spinning up 3.6 million seeks is quite a bit more expensive than just scanning the table in this case.
The only reason you're seeing roughly similar durations is that the inefficient plan described above is using parallelism, while the more efficient plan with the scan is running serially.
If the inefficient plan stayed the same except for running serially, you'd see it take substantially longer than the plan using a single scan of the table.
Cheers!
October 18, 2016 at 9:19 am
That makes sense Jacob
I had the idea that the index on the logdate field would improve performance but since it's running a select against the whole table it makes little difference compared to a table scan
Realistically it's unlikely that users running the report that this query will be used in will use such parameters I just need to be able to gauge just how long they could be looking at if they go with the full select.
October 18, 2016 at 9:21 am
I can't see your indexes, can you confirm that this is one of them? If not, create it and test with both queries.
(entity, LogTxt, entityRowID) INCLUDE (LogDate, userName)
There's a row_number version you might like to try too...
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 18, 2016 at 9:25 am
Sorry Chris, forgot to attach the indexes
Happy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway
October 18, 2016 at 9:44 am
mitzyturbo (10/18/2016)
Sorry Chris, forgot to attach the indexesHappy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway
Sure no probs - but do try that index 😉
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 18, 2016 at 9:53 am
mitzyturbo (10/18/2016)
Sorry Chris, forgot to attach the indexesHappy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway
Here's the ROW_NUMBER version. I think the index I posted earlier would be good for any of these queries.
Edit: Lost the query somewhere...
SELECT entityRowID, userName, logDate
FROM (
SELECT entityRowID, userName, logDate,
rn = ROW_NUMBER() OVER(PARTITION BY entityRowID ORDER BY logDate)
FROM dbo.ActivityLog
WHERE entity = 'INBOX'
AND LogTxt = 'C'
)b
WHERE rn = 1
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
October 18, 2016 at 10:12 am
ChrisM@Work (10/18/2016)
mitzyturbo (10/18/2016)
Sorry Chris, forgot to attach the indexesHappy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway
Here's the ROW_NUMBER version. I think the index I posted earlier would be good for any of these queries.
Where?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
October 18, 2016 at 10:19 am
Phil Parkin (10/18/2016)
ChrisM@Work (10/18/2016)
mitzyturbo (10/18/2016)
Sorry Chris, forgot to attach the indexesHappy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway
Here's the ROW_NUMBER version. I think the index I posted earlier would be good for any of these queries.
Where?
Thanks Phil, got interrupted twice whilst posting that!
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply