Query Performance - MIN vs Top 1

Question

Query Performance - MIN vs Top 1

mitzyturbo

Hall of Fame

Points: 3452
More actions
October 18, 2016 at 7:21 am

#318157

Hi guys
Similar to a previous thread I started last week about query performance I need to figure out why 2 queries, designed to do the same thing, are both returning in or around the same time as the other. One is using an Index Scan, the other is using an Index seek so I was expecting the 2nd to be quick.
The first takes in or around 51 seconds and returns 3.5m rows
The second takes around 47 seconds.
The queries and execution plans are attached along with indexes I've created on the table. If anyone can weigh in with any suggestions as to what else I can do to improve the performance please let me know.
Note: I've already reorganized the indexes on the tables

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 1

mitzyturbo (10/18/2016)
Hi guys
Similar to a previous thread I started last week about query performance I need to figure out why 2 queries, designed to do the same thing, are both returning in or around the same time as the other. One is using an Index Scan, the other is using an Index seek so I was expecting the 2nd to be quick.
The first takes in or around 51 seconds and returns 3.5m rows
The second takes around 47 seconds.
The queries and execution plans are attached along with indexes I've created on the table. If anyone can weigh in with any suggestions as to what else I can do to improve the performance please let me know.
Note: I've already reorganized the indexes on the tables

1) I do not see query plans. Just 3 text files and the second two don't show in my browser.

2) If you are really returning 3.5M rows to the client you cannot expect quick performance. Declare variables of the proper type for your 3 fields and then SELECT the fields into those variables. A neat trick to just time server performance on large row counts since the data never leaves the server.

3) There is nothing that says a very large iterative seek is going to be faster than a scan. In fact I can easily construct seek plans that will perform WAY WORSE than scan plans.

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 2

I attached the two exec plans (Query1 and 2) as txt files as the UI wouldn't allow me to attach XML files

If I've done that wrong let me know.

This is a sub-query that I need my main query to select off, I won't be returning all 3.5 million rows. Potentially they could select ALL records requiring the query to search against the full 3.5 million rows

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 3

Sorry Kevin, I've deleted those txt's and added the sqlplan files now

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 4

The plan that uses seeks is actually quite a bit less efficient than the one that uses a scan, for a couple reasons.

First, the seek operator that is the outer input to the nested loop join (the one on IX_EntityLogUser) gets to seek to the start of the range of rows meeting the predicate, but then scans the range.

That's cheaper than scanning the entire index, but in this case nearly half the rows in the table are being returned anyway, so it's not so dramatically cheaper as you would expect if you were thinking singleton seek vs full scan.

See http://sqlblog.com/blogs/paul_white/archive/2011/02/17/so-is-it-a-seek-or-a-scan.aspx for a brief investigation of how the seek operator can be either a singleton seek or a seek+range scan.

Second, for each of the 3.6 million rows returned by that operator, a seek is performed against the IX_LogDate index. Spinning up 3.6 million seeks is quite a bit more expensive than just scanning the table in this case.

The only reason you're seeing roughly similar durations is that the inefficient plan described above is using parallelism, while the more efficient plan with the scan is running serially.

If the inefficient plan stayed the same except for running serially, you'd see it take substantially longer than the plan using a single scan of the table.

Cheers!

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 5

That makes sense Jacob

I had the idea that the index on the logdate field would improve performance but since it's running a select against the whole table it makes little difference compared to a table scan

Realistically it's unlikely that users running the report that this query will be used in will use such parameters I just need to be able to gauge just how long they could be looking at if they go with the full select.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 6

I can't see your indexes, can you confirm that this is one of them? If not, create it and test with both queries.

(entity, LogTxt, entityRowID) INCLUDE (LogDate, userName)

There's a row_number version you might like to try too...

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

mitzyturbo Hall of Fame Points: 3452 More actions · Answer 7

Sorry Chris, forgot to attach the indexes

Happy enough to go with the TOP 1 in a CROSS APPLY

Thanks anyway

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 8

mitzyturbo (10/18/2016)
Sorry Chris, forgot to attach the indexes
Happy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway

Sure no probs - but do try that index 😉

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 9

mitzyturbo (10/18/2016)
Sorry Chris, forgot to attach the indexes
Happy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway

Here's the ROW_NUMBER version. I think the index I posted earlier would be good for any of these queries.

Edit: Lost the query somewhere...

SELECT entityRowID, userName, logDate

FROM (

SELECT entityRowID, userName, logDate,

rn = ROW_NUMBER() OVER(PARTITION BY entityRowID ORDER BY logDate)

FROM dbo.ActivityLog

WHERE entity = 'INBOX'

AND LogTxt = 'C'

)b

WHERE rn = 1

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Phil Parkin SSC Guru Points: 246962 More actions · Answer 10

ChrisM@Work (10/18/2016)
mitzyturbo (10/18/2016)
Sorry Chris, forgot to attach the indexes
Happy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway
Here's the ROW_NUMBER version. I think the index I posted earlier would be good for any of these queries.

Where?

The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 11

Phil Parkin (10/18/2016)
ChrisM@Work (10/18/2016)
mitzyturbo (10/18/2016)
Sorry Chris, forgot to attach the indexes
Happy enough to go with the TOP 1 in a CROSS APPLY
Thanks anyway
Here's the ROW_NUMBER version. I think the index I posted earlier would be good for any of these queries.
Where?

Thanks Phil, got interrupted twice whilst posting that!

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden