OFFSET/FETCH with covering Nonclustered Index on partitioned table

Question

OFFSET/FETCH with covering Nonclustered Index on partitioned table

Matthew Darwin

SSCertifiable

Points: 5513
More actions
April 25, 2016 at 3:57 am

#313445

Morning all,
Having a look at OFFSET/FETCH and how it performs for pagination, and coming across some behaviour that I wouldn't expect. I have a partitioned table with a sequential ID (EventLogID) and a EventID column that's fairly randomly distributed. We have a nonclustered index on the EventID column, and this is where things are not running as I might expect: when I write an OFFSET/FETCH query ordered by EventID, it's performing an Index Scan and sort on the whole NCI, rather than the number of rows specified in the offset and fetch, even though that index is already sorted in that order.
Code to set everything up, with number of rows reduced to a million for the moment (the real table has some 80 million rows in, but the behaviour appears to be the same regardless of the rowcount):-
/* drop object if it already exists */
IF EXISTS (SELECT 1
FROM sys.objects AS ob
INNER JOIN sys.schemas AS sc
ON ob.schema_id = sc.schema_id
WHERE ob.name = 'EventLog'
AND sc.name = 'Demo')
DROP TABLE Demo.EventLog;
IF EXISTS (SELECT 1 FROM sys.partition_schemes AS PS
WHERE name = 'psEventLogBands')
DROP PARTITION SCHEME psEventLogBands;
IF EXISTS (SELECT 1 FROM sys.partition_functions AS PF
WHERE name = 'pfEventLogBands')
DROP PARTITION FUNCTION pfEventLogBands;
IF EXISTS (SELECT 1 FROM sys.schemas AS S
WHERE name = 'Demo')
SET NOEXEC ON;
GO
CREATE SCHEMA Demo AUTHORIZATION dbo;
GO
SET NOEXEC OFF;
GO
CREATE PARTITION FUNCTION [pfEventLogBands](int) AS RANGE LEFT FOR VALUES (0, 200000, 400000, 600000, 800000, 1000000);
CREATE PARTITION SCHEME [psEventLogBands] AS PARTITION [pfEventLogBands] ALL TO (PRIMARY);
CREATE TABLE [Demo].[EventLog](
[EventLogID] [int] IDENTITY(1,1) NOT NULL,
[EventID] [tinyint] NOT NULL
CONSTRAINT [PK_dbo:EventLog:EventID] PRIMARY KEY CLUSTERED
(
[EventLogID] ASC
)WITH (FILLFACTOR = 100) ON psEventLogBands(EventLogID)
) WITH (DATA_COMPRESSION = PAGE);
DECLARE @Nums TABLE
(Number tinyint NOT NULL)
INSERT INTO @Nums
SELECT TOP 50 ROW_NUMBER() OVER (ORDER BY (SELECT 1))
FROM sys.columns
INSERT INTO Demo.EventLog WITH (TABLOCK)
(EventID)
SELECT TOP 1000000 N.Number
FROM @Nums AS N
CROSS JOIN @Nums AS N2
CROSS JOIN @Nums AS N3
CROSS JOIN @Nums AS N4
CREATE INDEX IDX_EventID
ON demo.EventLog (EventID, EventLogID) WITH (DATA_COMPRESSION = PAGE);
Using the clustered index, we see that offset fetch is only scanning 10 rows of the table:-
SET STATISTICS IO ON
SELECT EL.EventLogID
,EL.EventID
FROM Demo.EventLog AS EL
ORDER BY EventLogID OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
(10 row(s) affected)
Table 'EventLog'. Scan count 2, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
The statistics IO results confirm this, and the query plan (attached) shows estimated and actual rows as 10.
However, when I switch the sort to use the EventID column, I end up with a parallel scan on the IDX_EventID index returning all one million rows and then sorting them, before applying the top operator to reduce down to ten rows:-
SELECT EL.EventLogID
,EL.EventID
FROM Demo.EventLog AS EL
ORDER BY EventID, EventLogID OFFSET 0 ROWS FETCH NEXT 10 ROWS ONLY;
(10 row(s) affected)
Table 'EventLog'. Scan count 8, logical reads 972, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
As you can see from the statistics IO output, all 972 pages are being read and from the query plan (attached) that the actual and estimated rows are one million, coupled with the sort operator.
This is clearly a problem arising with the introduction of partitioning into the equation; if the table is built without it, then everything works exactly as expected.
Is there any way to get this to work on a partitioned table?
Regards
Matthew
Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

Viewing 5 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 1

That's because the non-clustered index was created as partitioned (that is the default behavior when you create an index on a partitioned table).

With it partitioned on eventlogID, that means it will have to pull rows from every partition (each of which is its own B-Tree) and then sort all those rows to figure out which are the top 10 based on EventID.

The implementation in this case is a bit disappointing, because from a high-level perspective it could pull just the top 10 rows by EventID from each partition, and sort that much smaller result set. Alas, it doesn't do that.

You can get the behavior you were expecting by preventing the default partitioning behavior (just specify ON [PRIMARY] for the index creation, and then you'll get a non-partitioned index that will behave as expected).

Of course, that comes at a cost, as now you have a non-aligned index, which will prevent partition switching.

Cheers!

Matthew Darwin SSCertifiable Points: 5513 More actions · Answer 2

Jacob Wilkins (4/25/2016)
The implementation in this case is a bit disappointing, because from a high-level perspective it could pull just the top 10 rows by EventID from each partition, and sort that much smaller result set. Alas, it doesn't do that.

Shame, that's what I was hoping might be possible. Unfortunately in this case, the partition switch functionality is the driver for the table to be partitioned (it churns through several million rows a week as a horrible implementation of a log), so I wouldn't be able to remove the partition alignment from the index.

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 3

Ah, my condolences.

There's a really old and still active connect item about this limitation.

You can at least upvote it and hope. 🙂

https://connect.microsoft.com/SQLServer/fedback/details/240968/partition-table-using-min-max-functions-and-top-n-index-selection-and-performance

I thought I remembered there being some ugly ways to work around the issue, and the workarounds listed on that connect item confirm that.

Not sure if they will be of any use to you, but worth a shot.

Cheers!

Matthew Darwin SSCertifiable Points: 5513 More actions · Answer 4

Thanks for the connect link; upvoted it, for whatever good that will do!

Fortunately in this case, I was only using the table in question as a demo for using the OFFSET FETCH and I don't actually need to have any code like that running against it!

Follow me on twitter @EvoDBACheck out my blog Natural Selection DBA[/url]