trying to reduce the amount of logical reads

Question

trying to reduce the amount of logical reads

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 1

The Dixie Flatline (8/11/2011)
If the ID is included as part of the index, the optimizer can find MAX by matching on [field_name1] and [field_name2], then reading the last value of ID. There are pointers in the index going both forwards and backwards so this is basically one read after [field_name1] and [field_name2] are found. AND this one read is not being done at the leaf level.
If you make it an included column, then it has to scan all leaves match for [field_name1] and [field_name2] from start to finish. This could be inconsequential or it could involve many, many pages having to be read to perform the scan.
Basically, including it as an indexed column can take away a lot of the I/O for this particular query.

This. Exactly.

In the key, it's a backward range-scan that terminates after one page. In the include, it's a complete range-scan. (note I said range scan, not scan)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 2

GilaMonster (8/12/2011)
The Dixie Flatline (8/11/2011)
If the ID is included as part of the index, the optimizer can find MAX by matching on [field_name1] and [field_name2], then reading the last value of ID. There are pointers in the index going both forwards and backwards so this is basically one read after [field_name1] and [field_name2] are found. AND this one read is not being done at the leaf level.
If you make it an included column, then it has to scan all leaves match for [field_name1] and [field_name2] from start to finish. This could be inconsequential or it could involve many, many pages having to be read to perform the scan.
Basically, including it as an indexed column can take away a lot of the I/O for this particular query.
This. Exactly.
In the key, it's a backward range-scan that terminates after one page. In the include, it's a complete range-scan. (note I said range scan, not scan)

Nod nod. Sorry Dixie, missed your answer on this thread, got distracted. Thank you both.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

The Dixie Flatline SSC Guru Points: 53354 More actions · Answer 3

Not being a problem, Evil Kraig. 😉

What made you decide to change your name?

__________________________________________________

Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

SQLHeap Hall of Fame Points: 3572 More actions · Answer 4

Ninja's_RGR'us (8/11/2011)
SQLHeap (8/11/2011)
GilaMonster (8/11/2011)
Add ID as a 3rd key column to this index:
CREATE NONCLUSTERED INDEX [N2] ON [dbo].table
(
field_name2 ASC,
field_name1 ASC
)
In other words, change it to this:
CREATE NONCLUSTERED INDEX [N2] ON [dbo].table
(
field_name2 ASC,
field_name1 ASC,
ID
)
That's what I would've guessed, but the DAT didn't recommend it so I didn't do it. I will try that tomorrow thanks!
~~DAT~~ DTA is IM<NS>HO a very dangerous tool. Especially if you don't know what you are doing. And when you know what you are doing you don't need that tool. Yes I know there's a catch 22 somewhere in there.
Please read this to rid yourself of that limitation : http://www.sqlservercentral.com/articles/books/65831/

Adding the ID to the index dropped the logical reads from 211938 to 3! Thanks to all, much appreciated!

There is an exception to every rule, except this one...

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 5

Sounds about right ;-).

Thanks for the feedback.

Anything else?

Paul White SSC Guru Points: 150467 More actions · Answer 6

GilaMonster (8/12/2011)
The Dixie Flatline (8/11/2011)
If the ID is included as part of the index, the optimizer can find MAX by matching on [field_name1] and [field_name2], then reading the last value of ID. There are pointers in the index going both forwards and backwards so this is basically one read after [field_name1] and [field_name2] are found. AND this one read is not being done at the leaf level.
If you make it an included column, then it has to scan all leaves match for [field_name1] and [field_name2] from start to finish. This could be inconsequential or it could involve many, many pages having to be read to perform the scan.
Basically, including it as an indexed column can take away a lot of the I/O for this particular query.
This. Exactly.
In the key, it's a backward range-scan that terminates after one page. In the include, it's a complete range-scan. (note I said range scan, not scan)

To be precise, it is a backward range scan that terminates after one row. The seek operation is a multi-key equality search on the first two keys only. Since the third key is ordered ascending in the index, SQL Server starts at the end of the range defined by the first two keys. Starting at the end of the range is why the scan direction is shown as BACKWARD in the plan.

The storage engine positions the seek at end of the range, and that row is returned to the query processor. The next row from the range scan is never asked for, thanks to the Top iterator in the plan. The Stream Aggregate is reduced to the role of a Compute Scalar (defining an expression that is the MAX of one row).

The transformation from Index Seek + Stream Aggregate to Index Seek + Top is performed by the optimizer, but writing the query explicitly as a TOP (1) ... ORDER BY is slightly more efficient, since there is no Stream Aggregate. If the optimizer did not perform this trick, the index seek would return all rows from the index that match on the first two keys, ordered by the third key. The stream aggregate would then examine every row to determine which has the highest third-key value. It's easy to see that this will always be the first row, making the transformation a good one.

If the majority of the queries that use this index will be looking for the highest ID value, I would define the third key in the index as DESC, allowing a FORWARD range scan. The storage engine is unable to perform a backward parallel range scan, so some queries can benefit from allowing a parallel forward scan (which requires an appropriately sorted index).

Some people find it helpful to visualize the difference between a BACKWARD and FORWARD range scan, as seeking down one 'edge' of the index range that matches the equality predicates, and then following leaf-level page pointers as necessary (previous page pointers for backward scans, next page pointers for forward scans). The key point (ha!) is to think about the starting point of the scan - if we are moving backward, the highest key is the one found first. Equally, if we are moving forward, the lowest third-level key is encountered first.

Providing both ASC and DESC indexes, the optimizer correctly prefers the DESC index. The script below illustrates most of the points made above:

USE Sandpit

CREATE TABLE dbo.Example

(

pkINTEGER IDENTITY PRIMARY KEY,

field1VARCHAR(20) NOT NULL,

field2VARCHAR(10) NOT NULL,

some_idINTEGER NOT NULL

)

INSERT dbo.Example WITH (TABLOCKX)

(field1, field2, some_id)

SELECT TOP (1234567)

'DOWNLOAD',

'XYZ',

CHECKSUM(NEWID())

FROM master.sys.all_columns AS ac

CROSS JOIN master.sys.all_columns AS ac2

CROSS JOIN master.sys.all_columns AS ac3

CREATE NONCLUSTERED INDEX i

ON dbo.Example (field1, field2, some_id)

-- BACKWARD range scan

-- MAX transformed to seek + top + max

SELECT

MAX(e.some_id)

FROM dbo.Example AS e

WHERE

e.field1 = 'DOWNLOAD'

AND e.field2 = 'XYZ'

CREATE NONCLUSTERED INDEX i2

ON dbo.Example (field1, field2, some_id DESC)

-- FORWARD range scan

-- MAX transformed to seek + top + max

SELECT

MAX(e.some_id)

FROM dbo.Example AS e

WHERE

e.field1 = 'DOWNLOAD'

AND e.field2 = 'XYZ'

-- Optimizer prefers index i2

-- FORWARD range scan

-- Seek + Top (1), no Stream Aggregate

SELECT TOP (1)

e.some_id

FROM dbo.Example AS e

WHERE

e.field1 = 'DOWNLOAD'

AND e.field2 = 'XYZ'

ORDER BY

e.some_id DESC

DROP TABLE dbo.Example

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi