December 28, 2014 at 11:18 am
This is my table:
use tempdb
go
if object_id('Data', 'u') is not null drop table Data
go
with temp as (
select top 10000 row_number() over (order by c1.object_id) Id
from sys.columns c1 cross join sys.columns c2
)
select
Id,
((Id - 1) / 5000) + 1 as StateId,
((Id - 1) / 1000) + 1 as PalletId,
((Id - 1) / 100) + 1 as BoxId
into Data
from temp
order by Id
In reality the number of rows is a bit more than 100M. Those are the only queries that will be used on the table:
select StateId, min(Id), max(Id)
from Data
group by StateId
order by StateId, min(Id)
select StateId, PalletId, min(Id), max(Id)
from Data
group by StateId, PalletId
order by StateId, PalletId, min(Id)
select StateId, PalletId, BoxId, min(Id), max(Id)
from Data
group by StateId, PalletId, BoxId
order by StateId, PalletId, BoxId, min(Id)
What index would be best for these three queries? With best I mean the execution time, I don't care about additional space.
This is the index I currently use:
create nonclustered index Ix_Data on Data (StateId, PalletId, BoxId, Id)
The execution plan is SELECT (0%) - Stream Aggregate (10%) - Index Scan (90%). Can this be optimized (maybe to use Index Seek method)?
December 28, 2014 at 11:50 am
That is about the best index. I'd put ID as an include, as it's a bit wasted in the key.
An index seek operation requires a filter predicate. None of your queries have one, so it is not possible to search. In all cases, you're asking for all the rows in the table, so an operation which returns all the rows in the table (scan) is indicated
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 28, 2014 at 12:02 pm
Thanks!
December 28, 2014 at 12:07 pm
To seek, you need a predicate, something to seek for. A seek is defined as a navigation down the index tree to find a value. You do not have anything that can be used as a predicate. Your query requires every row of the table, therefore a scan (in this case of the index) is required.
Yes, it requires just the index, but that's exactly what you're getting, an index scan.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
December 29, 2014 at 9:29 am
You should make it a clustered index. You should also explicitly specify the appropriate FILLFACTOR and filegroup (even if it's PRIMARY, as that may not be the current default fg in the db).
I can't tell you what the best fillfactor would be without more details about the table data. If you're unsure, start with 95 (=95%) and work your way up or down from there as best suits that particular table.
CREATE CLUSTERED INDEX CL_Data ON Data ( StateId, PalletId, BoxId )
WITH ( FILLFACTOR = 95 /*assign better value if you can determine it!*/ )
ON [PRIMARY]
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
December 30, 2014 at 2:04 am
@scottpletcher - I tried it now and it seems that both clustered and nonclustered indexes are running for about the same time.
December 30, 2014 at 6:53 am
_simon_ (12/30/2014)
@ScottPletcher - I tried it now and it seems that both clustered and nonclustered indexes are running for about the same time.
I found the same, with the test data scaled up to match your real table:
IF object_id('Data', 'u') IS NOT NULL DROP TABLE Data;
CREATE TABLE Data (Id INT NOT NULL, StateId INT NOT NULL, PalletId INT NOT NULL, BoxId INT NOT NULL);
GO
WITH temp AS (
SELECT TOP 100000000
Id = CAST(ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS INT)
FROM sys.columns c1 CROSS JOIN sys.columns c2 CROSS JOIN sys.columns c3
)
INSERT INTO Data (Id, StateId, PalletId, BoxId)
SELECT
Id,
[StateId]= ((Id - 1) / 5000) + 1,
[PalletId]= ((Id - 1) / 1000) + 1,
[BoxId]= ((Id - 1) / 100) + 1
FROM temp
ORDER BY Id;
The simplest query returns 20,000 rows in about 5s:
SELECT StateId, MIN_Id = MIN(Id), MAX_Id = MAX(Id) FROM Data GROUP BY StateId ORDER BY StateId; -- (20000 row(s) affected) / 00:00:05
That's a relatively small number of discrete values for StateId and reminded me of this little gem, so I did a little testing. If you cluster the table on just Id
CREATE UNIQUE CLUSTERED INDEX ucx_Data ON Data (Id)
then you have some flexibility on the non-clustered indexes. I created a simple one, expecting it to match the query I'll show in a minute:
CREATE UNIQUE INDEX ix_StateId ON Data (StateId, Id)
Note that the query runs only marginally faster if Id is a key in this index, compared to INCLUDEing it.
Here's the query:
WITH RecursiveCTE
AS (
SELECT TOP 1 StateId, Id
FROM dbo.Data
ORDER BY StateId, Id
UNION ALL
SELECT r.StateId, r.Id
FROM (
SELECT t.StateId, t.Id,
rn = ROW_NUMBER() OVER (ORDER BY t.StateId, t.Id)
FROM dbo.Data t
JOIN RecursiveCTE ri
ON ri.StateId < t.StateId
) r
WHERE r.rn = 1
)
SELECT r.StateId, r.Id, x.Id
FROM RecursiveCTE r
CROSS APPLY (
SELECT TOP 1 Id
FROM dbo.Data t
WHERE t.StateId = r.StateId
ORDER BY Id DESC
) x
OPTION (MAXRECURSION 0);
Note that it uses a scan to determine the anchor row of the recursive CTE, the remaining rows are returned using seeks. On our little play server it runs in about 450ms, ten times faster than the original query.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 30, 2014 at 8:57 am
_simon_ (12/30/2014)
@ScottPletcher - I tried it now and it seems that both clustered and nonclustered indexes are running for about the same time.
Keep in mind that using the clustered index also means you don't have to store all the data twice, as you do with a nonclustered index over a heap or other cluster.
I should have included ID in the index, because of the queries being run. That's the best times you can reasonably expect without creating customized, covering non-clustered index(es), and I'm not sure those will really help much either.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply