Non-clustered index question

  • Hi,

    I tried to search for this but I haven't seen the question asked explicitly--and that may be because it's kind of dumb, but I'm interested in knowing the answer.

    Let's say I have a table with the following columns:

    ID, first name, last name, middle name

    If I create a non-clustered index on ID and include the rest of the columns (first name, last name, and middle name), is that effectively the exact same as if I had created a clustered index on ID?

    Thanks,

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Mike

    No, because if you only have a non-clustered index, your table is a heap. Also, in the non-clustered index, the key is only ID instead of all of the columns. This means that because the key is narrower, the index is easier to traverse, improving performance. But you wouldn't be able to filter on, say, ID and first name - included columns are only useful to stop you from needing to do a lookup in the clustered index or heap.

    John

  • Hi John,

    Thanks for the reply. I have a few comments:

    No, because if you only have a non-clustered index, your table is a heap.

    I know technically a table without a clustered index is considered a heap, but I believe that most people would assume that it has no order to it then, right? However, if I add only a non-clustered index with ID as the key, then the table becomes ordered, much like it would if I created a clustered index on ID. I just tested it, and it seemed to work this way, unless I'm missing something.

    Also, in the non-clustered index, the key is only ID instead of all of the columns. This means that because the key is narrower, the index is easier to traverse, improving performance.

    In my scenario, I'm including all of the other fields in the non-clustered index, so how is that not the same as creating a clustered index, which, by its very nature, contains all of the table data in addition to the key?

    But you wouldn't be able to filter on, say, ID and first name - included columns are only useful to stop you from needing to do a lookup in the clustered index or heap.

    Right, so with a clustered index, I would have all of the information I could want at the leaf level. With a non-clustered index that includes all of the non-key columns I would also have all of the information I could want at the leaf level too, no?

    This is just the way I understand it, but if I'm not thinking about it correctly, please let me know where the misunderstanding is.

    Thanks again,

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Looks like you have lots to learn about indexing, but I'm not the right person to do the teaching.

    ... I know technically a table without a clustered index is considered a heap, but I believe that most people would assume that it has no order to it then, right? However, if I add only a non-clustered index with ID as the key, then the table becomes ordered, much like it would if I created a clustered index on ID. I just tested it, and it seemed to work this way, unless I'm missing something. ...

    It may seem as is the table is ordered, but it still is not and it still lacks a clustered index.

    You're making making bad assumptions!



    Alvin Ramard
    Memphis PASS Chapter[/url]

    All my SSC forum answers come with a money back guarantee. If you didn't like the answer then I'll gladly refund what you paid for it.

    For best practices on asking questions, please read the following article: Forum Etiquette: How to post data/code on a forum to get the best help[/url]

  • Mike Scalise (9/20/2016)


    Hi,

    I tried to search for this but I haven't seen the question asked explicitly--and that may be because it's kind of dumb, but I'm interested in knowing the answer.

    Let's say I have a table with the following columns:

    ID, first name, last name, middle name

    If I create a non-clustered index on ID and include the rest of the columns (first name, last name, and middle name), is that effectively the exact same as if I had created a clustered index on ID?

    Thanks,

    Mike

    For this table, I'd want to see a clustered primary key on [ID] to prevent duplicates and facilitate efficient joins, and then non-clustered index(s) on whatever column(s) you may perform selective queries on.

    "Do not seek to follow in the footsteps of the wise. Instead, seek what they sought." - Matsuo Basho

  • Mike Scalise (9/20/2016)


    Hi John,

    Thanks for the reply. I have a few comments:

    Mike

    No, because if you only have a non-clustered index, your table is a heap.

    I know technically a table without a clustered index is considered a heap, but I believe that most people would assume that it has no order to it then, right? However, if I add only a non-clustered index with ID as the key, then the table becomes ordered, much like it would if I created a clustered index on ID. I just tested it, and it seemed to work this way, unless I'm missing something.

    No, the table doesn't become ordered - only your non-clustered index does. You now have a (unordered) heap with all the data in the table in it, along with a non-clustered index also with all the data in the table in it. On the other hand, a clustered index is the data - it's nothing more than a logical ordering of the rows by the index key.

    Also, in the non-clustered index, the key is only ID instead of all of the columns. This means that because the key is narrower, the index is easier to traverse, improving performance.

    In my scenario, I'm including all of the other fields in the non-clustered index, so how is that not the same as creating a clustered index, which, by its very nature, contains all of the table data in addition to the key?

    See above. And below - it's the same in some ways, but it's also different in the ways that I mentioned.

    But you wouldn't be able to filter on, say, ID and first name - included columns are only useful to stop you from needing to do a lookup in the clustered index or heap.

    Right, so with a clustered index, I would have all of the information I could want at the leaf level. With a non-clustered index that includes all of the non-key columns I would also have all of the information I could want at the leaf level too, no?

    Yes, it's all there, but it's just not searchable in the same way. Remember telephone directories? Imagine one that's ordered by last name, first name and address, and another that's ordered only by last name. They both have all the names and numbers in them, but if you wanted to find John Smith who lives at 9 Acacia Avenue, one of them is clearly going to be more useful than the other.

    John

    Edit - corrected typo

  • I know technically a table without a clustered index is considered a heap, but I believe that most people would assume that it has no order to it then, right? However, if I add only a non-clustered index with ID as the key, then the table becomes ordered, much like it would if I created a clustered index on ID. I just tested it, and it seemed to work this way, unless I'm missing something.

    The index is ordered, but the table (as in the physical data in the table) does not 'become ordered'.

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • A couple points are worth mentioning.

    1) From a "how it gets used in query plans" perspective, there will be little difference between a clustered index with the key on ID and a nonclustered index with the same key and all other columns added as INCLUDED columns.

    The main differences have to do with storing the data twice. Assuming we're starting with a heap and no nonclustered indexes, after defining a clustered index on ID, we will have one copy of the data, the clustered index.

    If instead we create the nonclustered on ID with every column INCLUDED, we will have two copies of the data, the heap and the nonclustered index. That overhead will get seen as space utilization and some extra penalty on writes, since now two copies of the data have to be maintained on any INSERT/UPDATE/DELETE.

    2) What exactly do you mean by "the table becomes ordered", and how are you testing it? The primary data structure for the table in the nonclustered case is the heap, which will still not have any particular logical order. If you're just running a SELECT * and observing whether the results seem ordered, then you're at the whim of a few things, not all of which have anything to do with how the data is stored (parallelism, for example, could lead to getting results out of logical order).

    Cheers!

  • John Mitchell-245523 (9/20/2016)


    Mike Scalise (9/20/2016)


    Hi John,

    Thanks for the reply. I have a few comments:

    Mike

    No, because if you only have a non-clustered index, your table is a heap.

    I know technically a table without a clustered index is considered a heap, but I believe that most people would assume that it has no order to it then, right? However, if I add only a non-clustered index with ID as the key, then the table becomes ordered, much like it would if I created a clustered index on ID. I just tested it, and it seemed to work this way, unless I'm missing something.

    No, the table doesn't become ordered - only your non-clustered index does. You now have a (unordered) heap with all the data in the table in it, along with a non-clustered index also with all the data in the table in it. On the other hand, a clustered index is the data - it's nothing more than a logical ordering of the rows by the index key.

    Also, in the non-clustered index, the key is only ID instead of all of the columns. This means that because the key is narrower, the index is easier to traverse, improving performance.

    In my scenario, I'm including all of the other fields in the non-clustered index, so how is that not the same as creating a clustered index, which, by its very nature, contains all of the table data in addition to the key?

    See above. And below - it's the same in some ways, but it's also different in the ways that I mentioned.

    But you wouldn't be able to filter on, say, ID and first name - included columns are only useful to stop you from needing to do a lookup in the clustered index or heap.

    Right, so with a clustered index, I would have all of the information I could want at the leaf level. With a non-clustered index that includes all of the non-key columns I would also have all of the information I could want at the leaf level too, no?

    Yes, it's all there, but it's just not searchable in the same way. Remember telephone directories? Imagine one that's ordered by last name, first name and address, and another that's ordered only by last name. They both have all the numbers in them, but if you wanted to find John Smith who lives at 9 Acacia Avenue, one of them is clearly going to be more useful than the other.

    John

    It seems you're suggesting that a clustered index adds all columns as keys.

    With a clustered index defined on ID, as in the OP's question, only the ID will be in the key. You don't get any extra advantage in searching on additional columns relative to a nonclustered index with the same key and all columns included.

    In both cases, clustered or nonclustered, if you filter on the index key and a column not in the index key, the query will seek on the index key and apply the non-key filter as a residual predicate.

    I'm guessing I'm misunderstanding what you're saying, though. 🙂

    Cheers!

  • Apologies - I misread. I answered as if the clustered index were on all columns instead of just on ID.

    John

  • Also remember that (except for filtered indexes) that every non-clustered index is a complete duplication of all the data in the columns in the index key and any INCLUDEs. If you have a HEAP of 4 columns and you create an NCI that includes all 4 columns (one way or the other), you have just duplicated every bit of data in the table and, depending on the data, can be quite costly when it comes to disk space, memory usage, backups, and restores.

    Clustered indexes are almost "free". The extra space used is tiny because data isn't duplicated in the Clustered Index.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Mike Scalise (9/20/2016)


    If I create a non-clustered index on ID and include the rest of the columns (first name, last name, and middle name), is that effectively the exact same as if I had created a clustered index on ID?

    No.

    While it's true that there's very little difference between a clustered index on ID, and a nonclustered index on ID include (first name, last name, and middle name) in terms of the index tree and the pages, you won't have an equivalent result.

    If you create a clustered index on ID, you have a single structure, a single copy of the data with the data pages logically (and ONLY logically) ordered by ID and a b-tree structure built on top with only ID at the higher levels.

    If you just create a nonclustered index on ID include (first name, last name, and middle name), then you have two structures, two copies of the data. The first is a heap, the second is a b-tree with ID, first name, last name, and middle name and the row's RID at the leaf level, logically ordered by ID and ID plus the row's RID at the higher levels of the index.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Mike Scalise (9/20/2016)


    However, if I add only a non-clustered index with ID as the key, then the table becomes ordered, much like it would if I created a clustered index on ID. I just tested it, and it seemed to work this way, unless I'm missing something.

    No it doesn't. The heap is still an unordered heap. The index is logically ordered by the ID. What you're missing is that your testing query likely used the nonclustered index, which is logically ordered. Force the query to use the heap ( WITH (INDEX = 0)) and you won't see results ordered by the index key.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Thank you all for the responses. This is not a real-life scenario--it was just out of my general curiosity. I understand that best practice would obviously tell us to create a clustered index before creating a NC one and INCLUDE-ing all non-key fields.

    With that said, it sounds like, from a strictly querying perspective, they are identical because, in both cases, the query would need to hit only one B-Tree (either the clustered index or the nonclustered index).

    I know there may be disk and CPU implications. To me, it's the difference between the overhead to maintain a clustered index when rows are INSERTED/UPDATED vs the overhead to maintain the copy of the data (the non-clustered index) and not the underlying heap since it doesn't have an order.

    Tom Cooper's comment in this thread seems to suggest that the overhead involved with each is nearly the same.

    https://social.msdn.microsoft.com/Forums/sqlserver/en-US/f011edf9-6026-4204-ae77-68023ddf04a1/clustered-vs-nc-index-storage-differences?forum=transactsql

    Again, I'm not looking to actually do this--just wanted to feed my curiosity and get some thoughts from the experts here.

    Mike

    Mike Scalise, PMP
    https://www.michaelscalise.com

  • Mike Scalise (9/20/2016)


    To me, it's the difference between the overhead to maintain a clustered index when rows are INSERTED/UPDATED vs the overhead to maintain the copy of the data (the non-clustered index) and not the underlying heap since it doesn't have an order.

    The nonclustered + heap option will be higher.

    The cost of maintaining the clustered index when rows are inserted, updated or deleted will be pretty much the same as the cost of maintaining the non-clustered index when rows are inserted, updated or deleted, since they're essentially the same, then you have the the cost of the insert, update or delete in the heap itself, and that's not free (plus nonclustered + heap doubles the disk space required)

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply