Clustered Indexes

  • Okay, one more and I am done for the day lol. . .

    When creating a Clustered Index, I know the index defines the physical layout of the table. I have heard two different arguements as to how the index should be ordered. If the point is to get to the unique row(s) the fastest, then I imagine the first column specified is the one that should be the most unique that is queried on the most.

    So if a table has a identity column ID, Country, Account No, and County and Account must be combined to make a unique record without the ID (such as there being an account number 1000 in both Canada and Mexico) that would be queried on, should my Clustered Index be

    #1 CREATE CLUSTERED INDEX PKey ON DATA (COUNTRY, ACCOUNT)

    like the country THEN the account

    or

    #2 CREATE CLUSTERED INDEX PKey ON DATA (ACCOUNT, COUNTRY)

    meaning get all of the accounts, then the country I need

    It seems the second is more logical, but I keep bumping into developers that do it #1.

    If I say, SELECT * FROM DATA WHERE COUNTRY = 'Mexico' and ACCOUNT = 1000, #1 would hit COUNTRY first (which could be hundreds), get everything for Mexico (which could thousands), then look for 1000. If I chose #2, I would get all of the accounts numbered 1000 (which could be hundreds), then just the one for Mexico which would only be one).

    Is this right?

  • knowlton (5/21/2008)


    If I say, SELECT * FROM DATA WHERE COUNTRY = 'Mexico' and ACCOUNT = 1000, #1 would hit COUNTRY first (which could be hundreds), get everything for Mexico (which could thousands), then look for 1000. If I chose #2, I would get all of the accounts numbered 1000 (which could be hundreds), then just the one for Mexico which would only be one).

    Nope. In both cases SQL will go straight to the rows WHERE COUNTRY = 'Mexico' and ACCOUNT = 1000. It's capable of doing a multi-column seek in 1 operation, providing the index key has all the required columns in it.

    The difference comes when you have multiple different types of queries. Let's take your example.

    An index on (Country, Account) is fully seekable for queries that filter on Country and Account, or just on country. A query that filters on just account cannot seek on that index. It must scan.

    An index on (Account, Country) is the opposite. It's fine for queries that filter on Country and Account, or just on Account. A query that filters on just Country cannot seek on that index. It must scan.

    The order of the columns in the index depend on the types of queries you are running, not the selectiveness of the data.

    Does this make any sense?

    I did a long test on this in a previous thread. I'll see if I can locate it.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Many thanks for your reply. . .

    So then in addition to my Clustered of (COUNTRY, ACCOUNT), then I would need a non-clustered of (ACCOUNT, COUNTRY). . .

  • Just on account should be fine. The clustering key is always present in all nonclustered indexes as the rows 'address'. Queries that filter on both, or just on country can use the cluster, queries that filter on just account can use the NC.

    If the combination of country and account is unique, make sure to make the clustered index a unique one.

    I found the thread with testing done on this:

    http://www.sqlservercentral.com/Forums/Topic488867-65-1.aspx

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • In addition, if your "other" queries only needed the two specified columns, creating a "covering index" - which is an index that completely encompases the fields your query will need - will mean your query will not need to do a lookup on the clustered index to return its results. You can also add leaf-level data to an index to produce these same results (the INCLUDES list on an index).

  • Michael Earl (5/21/2008)


    In addition, if your "other" queries only needed the two specified columns, creating a "covering index" - which is an index that completely encompases the fields your query will need - will mean your query will not need to do a lookup on the clustered index to return its results.

    Michael ,

    what do you mean by "your query will not need to do a lookup on the clustered index ". Do you mean to say that

    apart from the clustered index on Country, Account you need

    a covering index on these 2 columns.

    "Keep Trying"

  • GilaMonster (5/21/2008)


    The difference comes when you have multiple different types of queries. Let's take your example.

    An index on (Country, Account) is fully seekable for queries that filter on Country and Account, or just on country. A query that filters on just account cannot seek on that index. It must scan.

    An index on (Account, Country) is the opposite. It's fine for queries that filter on Country and Account, or just on Account. A query that filters on just Country cannot seek on that index. It must scan.

    The order of the columns in the index depend on the types of queries you are running, not the selectiveness of the data.

    Does this make any sense?

    I did a long test on this in a previous thread. I'll see if I can locate it.

    Gail,

    Do the scans you mention above still have improved performance?

    Let me give you another example -

    Say I create an index:

    CREATE INDEX IX_example on table (bit_field1, bit_field2, unique_row_id)

    since the first two columns only have 2 possible values, theoretically the following query should perform like 4 different seeks. So my question is this index performance on this query:

    select * from table where unique_row_id = 100

    the same as running this query:

    select * from table where bit_field1 = 0 and bit_field_2 = 0 and unique_row_id = 100

    UNION ALL

    select * from table where bit_field1 = 0 and bit_field_2 = 1 and unique_row_id = 100

    UNION ALL

    select * from table where bit_field1 = 1 and bit_field_2 = 0 and unique_row_id = 100

    UNION ALL

    select * from table where bit_field1 = 1 and bit_field_2 = 1 and unique_row_id = 100

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply