Best column choice for clustered index?

Question

Post reply

Best column choice for clustered index?

Marios Philippopoulos

SSC Guru

Points: 57190
More actions
August 27, 2008 at 10:19 am

#192959

I need people's input on creating a clustered index for this query:
SELECT
MAX( IDt ) AS MaxDt
,CID
FROM
dbo.TIA
WHERE TN = 'TA'
GROUP BY CID
IDt is a DATETIME column; CID is an INT.
Table has approx. 1.5 million records.
Which is a better choice for a clustered index? IDt or CID?
__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 10 posts - 1 through 10 (of 10 total)

You must be logged in to reply to this topic. Login to reply

GSquared SSC Guru Points: 260824 More actions · Answer 1

If that query is the only thing being considered, how about both?

create clustered index CIX_Table on dbo.Table (CID, IDt)

But, as always, things like inserts, updates, deletes, and other queries, all have to be considered when creating a clustered index. All else being equal, a compound, clustered index on both columns is going to be good for that query.

Any reason it has to be clustered? You'll get about the same query performance out of a regular index with those columns.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 2

Why does it have to be clustered? If it was me, I'd suggest a nonclustered index for that, (TN, CID) INCLUDE (IDt)

What's the selectivity of TN? How many rows of the 1.5 million will the filter TN = 'TA' return?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 3

GSquared (8/27/2008)
If that query is the only thing being considered, how about both?
create clustered index CIX_Table on dbo.Table (CID, IDt)
But, as always, things like inserts, updates, deletes, and other queries, all have to be considered when creating a clustered index. All else being equal, a compound, clustered index on both columns is going to be good for that query.
Any reason it has to be clustered? You'll get about the same query performance out of a regular index with those columns.

Thanks for the reply.

I'm considering a clustered index because of the GROUP BY and MAX clauses. Clustered indexes usually perform better in queries where ranges of rows are involved. Nonclustered indexes are best in point queries.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 4

GilaMonster (8/27/2008)
Why does it have to be clustered? If it was me, I'd suggest a nonclustered index for that, (TN, CID) INCLUDE (IDt)
What's the selectivity of TN? How many rows of the 1.5 million will the filter TN = 'TA' return?

Selectivity of the TN column is very low: 27 distinct values out of a total of 1,676,301 rows.

Selectivities of the other 2 columns:

IDt: 1,437,346 unique values

CID: 389,644 unique values

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

GSquared SSC Guru Points: 260824 More actions · Answer 5

I missed the TN column in the Where clause. That would be needed too.

Because of the B-tree nature of indexes, a non-clustered index should do just fine for this.

Here's a test I did:

set nocount on;

create table #T1 (

ID int identity primary key,

TN char(2),

CID int,

IDt datetime);

create index IDX_T1_Cover on #T1 (TN, CID, IDt);

insert into #T1 (TN, CID, IDt)

select top 1000000

substring('abcdefghijklmnopqrstuvwxyz', abs(checksum(newid()))%26 + 1, 1),

checksum(newid())%1000,

dateadd(day, abs(checksum(newid()))%36525, '1/1/2000')

from dbo.numbers n1

cross join dbo.numbers n2;

create table #T2 (

ID int identity primary key nonclustered,

TN char(2),

CID int,

IDt datetime);

create clustered index CID_T2 on #T2(TN, CID, IDt);

insert into #T2 (TN, CID, IDt)

select TN, CID, IDt

from #T1;

alter index all on #t1 rebuild;

alter index all on #t2 rebuild;

set statistics time on;

SELECT

MAX( IDt ) AS MaxDt

, CID

FROM

#T1

WHERE TN = 'T'

GROUP BY CID;

print replicate('X', 100);

SELECT

MAX( IDt ) AS MaxDt

, CID

FROM

#T2

WHERE TN = 'T'

GROUP BY CID;

Of course, my temp tables don't have the same selectivity as your real tables, but it should give a reasonable approximation of how the code will work.

In my tests, the non-clustered index took between 80 and 102 milliseconds, while the clustered index took between 95 and 127.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 6

GSquared (8/27/2008)
I missed the TN column in the Where clause. That would be needed too.
Because of the B-tree nature of indexes, a non-clustered index should do just fine for this.
Here's a test I did...

Thank you for taking the time to do this. It falls in line with a DTA recommendation I just got:

CREATE NONCLUSTERED INDEX [_dta_index_TIA_10_1088500894__K9_K2_K18] ON [dbo].[TIA]
(
[TN] ASC,
[CID] ASC,
[IDt] ASC
)

The dilemma I was facing was whether the existing clustered index, defined on an IDENTITY column, which is also the PK of the table, should be removed and defined instead on one or more of the columns in the query.

This query accounts for a sizable part of the overall workload of our system. It's being called multiple times in a loop once an hour. Every hour, right on the dot, I see a spike in the Memory Pages/sec perfmon counter. I ran Profiler earlier today and narrowed it down to this query (and a larger query that it is embedded in).

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Gail Shaw SSC Guru Points: 1004514 More actions · Answer 7

Marios Philippopoulos (8/27/2008)
I'm considering a clustered index because of the GROUP BY and MAX clauses. Clustered indexes usually perform better in queries where ranges of rows are involved. Nonclustered indexes are best in point queries.

A properly constructed (covering) nonclustered index will do a range as well or better than a cluster. Sometimes better because the NC index will have fewer leaf pages and hence a partial scan affects fewer pages.

If you take the suggestion I gave, what SQL will do with that is first find the start of the rows that have TN = 'TA'. Because the group by column is the second column in the index, the index rows are ordered first by TN and then by CID, SQL can just read along the index rows, picking out the max value for IDt for each value of CID.

I will guess that the execution plan will be a simple index seek with a stream aggregate.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

GSquared SSC Guru Points: 260824 More actions · Answer 8

GilaMonster (8/27/2008)
Marios Philippopoulos (8/27/2008)
I'm considering a clustered index because of the GROUP BY and MAX clauses. Clustered indexes usually perform better in queries where ranges of rows are involved. Nonclustered indexes are best in point queries.
A properly constructed (covering) nonclustered index will do a range as well or better than a cluster. Sometimes better because the NC index will have fewer leaf pages and hence a partial scan affects fewer pages.
If you take the suggestion I gave, what SQL will do with that is first find the start of the rows that have TN = 'TA'. Because the group by column is the second column in the index, the index rows are ordered first by TN and then by CID, SQL can just read along the index rows, picking out the max value for IDt for each value of CID.
I will guess that the execution plan will be a simple index seek with a stream aggregate.

Exactly. That's what it did in my test, and it was about 20% more efficient than the clustered index version.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 9

Thanks both for your input, it's much appreciated.

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]