Best column choice for clustered index?

  • I need people's input on creating a clustered index for this query:

    SELECT

    MAX( IDt ) AS MaxDt

    ,CID

    FROM

    dbo.TIA

    WHERE TN = 'TA'

    GROUP BY CID

    IDt is a DATETIME column; CID is an INT.

    Table has approx. 1.5 million records.

    Which is a better choice for a clustered index? IDt or CID?

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • If that query is the only thing being considered, how about both?

    create clustered index CIX_Table on dbo.Table (CID, IDt)

    But, as always, things like inserts, updates, deletes, and other queries, all have to be considered when creating a clustered index. All else being equal, a compound, clustered index on both columns is going to be good for that query.

    Any reason it has to be clustered? You'll get about the same query performance out of a regular index with those columns.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Why does it have to be clustered? If it was me, I'd suggest a nonclustered index for that, (TN, CID) INCLUDE (IDt)

    What's the selectivity of TN? How many rows of the 1.5 million will the filter TN = 'TA' return?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GSquared (8/27/2008)


    If that query is the only thing being considered, how about both?

    create clustered index CIX_Table on dbo.Table (CID, IDt)

    But, as always, things like inserts, updates, deletes, and other queries, all have to be considered when creating a clustered index. All else being equal, a compound, clustered index on both columns is going to be good for that query.

    Any reason it has to be clustered? You'll get about the same query performance out of a regular index with those columns.

    Thanks for the reply.

    I'm considering a clustered index because of the GROUP BY and MAX clauses. Clustered indexes usually perform better in queries where ranges of rows are involved. Nonclustered indexes are best in point queries.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • GilaMonster (8/27/2008)


    Why does it have to be clustered? If it was me, I'd suggest a nonclustered index for that, (TN, CID) INCLUDE (IDt)

    What's the selectivity of TN? How many rows of the 1.5 million will the filter TN = 'TA' return?

    Selectivity of the TN column is very low: 27 distinct values out of a total of 1,676,301 rows.

    Selectivities of the other 2 columns:

    IDt: 1,437,346 unique values

    CID: 389,644 unique values

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • I missed the TN column in the Where clause. That would be needed too.

    Because of the B-tree nature of indexes, a non-clustered index should do just fine for this.

    Here's a test I did:

    set nocount on;

    create table #T1 (

    ID int identity primary key,

    TN char(2),

    CID int,

    IDt datetime);

    create index IDX_T1_Cover on #T1 (TN, CID, IDt);

    insert into #T1 (TN, CID, IDt)

    select top 1000000

    substring('abcdefghijklmnopqrstuvwxyz', abs(checksum(newid()))%26 + 1, 1),

    checksum(newid())%1000,

    dateadd(day, abs(checksum(newid()))%36525, '1/1/2000')

    from dbo.numbers n1

    cross join dbo.numbers n2;

    create table #T2 (

    ID int identity primary key nonclustered,

    TN char(2),

    CID int,

    IDt datetime);

    create clustered index CID_T2 on #T2(TN, CID, IDt);

    insert into #T2 (TN, CID, IDt)

    select TN, CID, IDt

    from #T1;

    alter index all on #t1 rebuild;

    alter index all on #t2 rebuild;

    set statistics time on;

    SELECT

    MAX( IDt ) AS MaxDt

    , CID

    FROM

    #T1

    WHERE TN = 'T'

    GROUP BY CID;

    print replicate('X', 100);

    SELECT

    MAX( IDt ) AS MaxDt

    , CID

    FROM

    #T2

    WHERE TN = 'T'

    GROUP BY CID;

    Of course, my temp tables don't have the same selectivity as your real tables, but it should give a reasonable approximation of how the code will work.

    In my tests, the non-clustered index took between 80 and 102 milliseconds, while the clustered index took between 95 and 127.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • GSquared (8/27/2008)


    I missed the TN column in the Where clause. That would be needed too.

    Because of the B-tree nature of indexes, a non-clustered index should do just fine for this.

    Here's a test I did...

    Thank you for taking the time to do this. It falls in line with a DTA recommendation I just got:

    CREATE NONCLUSTERED INDEX [_dta_index_TIA_10_1088500894__K9_K2_K18] ON [dbo].[TIA]

    (

    [TN] ASC,

    [CID] ASC,

    [IDt] ASC

    )

    The dilemma I was facing was whether the existing clustered index, defined on an IDENTITY column, which is also the PK of the table, should be removed and defined instead on one or more of the columns in the query.

    This query accounts for a sizable part of the overall workload of our system. It's being called multiple times in a loop once an hour. Every hour, right on the dot, I see a spike in the Memory Pages/sec perfmon counter. I ran Profiler earlier today and narrowed it down to this query (and a larger query that it is embedded in).

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Marios Philippopoulos (8/27/2008)


    I'm considering a clustered index because of the GROUP BY and MAX clauses. Clustered indexes usually perform better in queries where ranges of rows are involved. Nonclustered indexes are best in point queries.

    A properly constructed (covering) nonclustered index will do a range as well or better than a cluster. Sometimes better because the NC index will have fewer leaf pages and hence a partial scan affects fewer pages.

    If you take the suggestion I gave, what SQL will do with that is first find the start of the rows that have TN = 'TA'. Because the group by column is the second column in the index, the index rows are ordered first by TN and then by CID, SQL can just read along the index rows, picking out the max value for IDt for each value of CID.

    I will guess that the execution plan will be a simple index seek with a stream aggregate.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (8/27/2008)


    Marios Philippopoulos (8/27/2008)


    I'm considering a clustered index because of the GROUP BY and MAX clauses. Clustered indexes usually perform better in queries where ranges of rows are involved. Nonclustered indexes are best in point queries.

    A properly constructed (covering) nonclustered index will do a range as well or better than a cluster. Sometimes better because the NC index will have fewer leaf pages and hence a partial scan affects fewer pages.

    If you take the suggestion I gave, what SQL will do with that is first find the start of the rows that have TN = 'TA'. Because the group by column is the second column in the index, the index rows are ordered first by TN and then by CID, SQL can just read along the index rows, picking out the max value for IDt for each value of CID.

    I will guess that the execution plan will be a simple index seek with a stream aggregate.

    Exactly. That's what it did in my test, and it was about 20% more efficient than the clustered index version.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Thanks both for your input, it's much appreciated.

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply