Indexing in SQL Server 7.0/2000

  • I want to make sure I have this right about indexing...

    For tables that have a clustered index: A non-clustered index is used to search down to the clustered index key. Then, using the clustered key, a completely new search must be performed on the clustered index to get to the data. Is this right so far?

    Now to search a clustered index, is it true that when the index search hits the leaf page (i.e. data), it will start at the logically first matching record in the chain, then it must sequentially search forward until finding an exact match. SQL Server tags non-unique cluster indexes with a "uniqueness" tag for this purpose.

    Hence, creating a clustered index with high density will be costly because non-clustered searches must sequentially search through "clusters" in the clustered index leaf pages.

    Do I have this right?

  • Hi

    Not quite (others please correct me!). The cluster index IS a b*tree index as the word "index" defines, but it also defines the storage of data in the table, hence data is stored ("clustered") around the key. The main benefit is the storage of data in the cluster, so you dont have to do another read to from the index leaf to the actual data as the leaf entry contains all the data. From what I understand, the leaf also stores a pointer to the next leaf in sequence for fast full table scans. Unlike other non-clustered indexes. As the clustered index defines the tables storage only 1 can be created against the table (so pick it carefully for maximum benefit).

    Also, dont expect all storage to be sequential.

    In the end, multiple index scans is only required if it really needs it, and creating a single cluster index on a table with no other indexes doesnt magically create another index in the back-ground.

    Cheers

    Chris


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • Hi there

    I just read a short paragraph from a performance tuning book by England. He basically says what you stated and I can see your point now. Basically, as the cluster index "is/defines" the storage of the table, a non-cluster index will then result in a series of reads on the non-cluster then even more reads down the cluster index to locate the data.

    I am not sure if this is correct though. I cant believe that this would happen. Surely the lookup from the non-clustered b*tree index at the leaf will point directly to the data and NOT the top of the clustered index (in ALL cases)..

    Can anyone enlighten me?

    Cheers

    Chris


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • The clusteered index is the key. Now in SQL 2000, the storage changed slightly for performance, but whether you search the clustered key (essentially data) or search the data, there isn't a difference. The real benefits of the clustered key are when you search for a range of data using the clustered key. If you have a non-clustered key that diverges widely from the ordering of the clustered key, there is nothing that will help that. Of course, not having a clustered key also will not help that.

    Steve Jones

    steve@dkranch.net

  • Steve - yep, no problem there, but when I search a non-clustered index and get the the leaf node, what does it point to? to start of the page in which the data is in, OR, the start of the clustered index (and scans again) ??

    Cheers

    Chris


    Chris Kempster
    www.chriskempster.com
    Author of "SQL Server Backup, Recovery & Troubleshooting"
    Author of "SQL Server 2k for the Oracle DBA"

  • Unfortunately, none of this chit-chat has either confirmed nor denied my affirmation. Perhaps I've poorly stated it, but I think Steve now understands what I'm trying to confirm. This has an impact on the best selection of a Clustered Index. My hypothesis: "High Density" makes a poor choice for a Clustered Index whenever direct access to a single data element is desired, because now non-clustered indices must sequence through the cluster chain. (I hope this makes it more clear, and not more confusing.)

  • Nonclustered indexes contain the value (for the nonclustered index) and a row locator to the data row having the data value (or rows if the data value is not unique).

    In the case of a nonclustered index built on a heap (no clustered index present), the row locator is a value built from the File ID, page number, and number of the row on the page.

    If there is a clustered index, here's the description of the row locator from Books Online:

    quote:


    If the table does have a clustered index, or the index is on an indexed view, the row locator is the clustered index key for the row. If the clustered index is not a unique index, SQL Server 2000 makes duplicate keys unique by adding an internally generated value. This value is not visible to users; it is used to make the key unique for use in nonclustered indexes. SQL Server retrieves the data row by searching the clustered index using the clustered index key stored in the leaf row of the nonclustered index.


    So the nonclustered key points to the clustered key which contains the data value. We've all got that (excuse the edit, mouse slip when flipping between screens).

    When it starts doing the scans from the intermediate tables, it's a little bit more than just go get this value, go get that. And with respect to nonclustered indexes, there is some pre-fetching going on as well. Here's the applicable link:

    http://msdn.microsoft.com/library/default.asp?url=/library/en-us/architec/8_ar_sa_7usz.asp

    K. Brian Kelley

    bkelley@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/bkelley/

    Edited by - bkelley on 01/29/2002 12:14:34 PM

    K. Brian Kelley
    @kbriankelley

  • Not sure what you mean by "density". High selectivity is a poor choice for the clustered index. In other words, a PK is a poor choice. Each item in the key is highly selective in that it returns one value. A date is often a better choice, even though these are often unique, lots of queries need items in a range.

    Steve Jones

    steve@dkranch.net

  • SHOW STATISTICS returns "... the selectivity of an index (the lower the density returned, the higher the selectivity) and provide the basis for determining whether or not an index is useful to the query optimizer"

    "Density" and "Selectivity" are terms having to do with the multiplicity of the key. The density/selectivity is a factor of the number of occurrences of the key compared to the number of entries.

    Let's take the phone book for example. It uses "last name, first name" for a key. A good density. If it used "city" for a key, entries would be "clustered" around city name. To find John Smith in Miami, all entries for Miami have to be scanned until we hit John Smith.

    Brian Kelly... the statement "Nonclustered indexes contain the value (for the nonclustered index) and a row locator to the data row having the data value (or rows if the data value is not unique). " For SQL Server 7.0/2000, my research indicates that [leaf] non-clustered indexes contain the clustered index key (and unique identifier if clustered key is non-unique). No longer (since SQL 6.5) are page and row IDs stored in the non-cluster index. This allows for fast updates and deletes of the data (i.e. clustered index) because non-cluster indexes will require fewer modifications.

  • Correct. Row locator is a generic term given that a non-clustered index could be build upon a heap or upon a clustered index. Row locator with regard to a clustered index is the clustered index key. However, for heaps, there's no clustered index key, therefore FileID, Page, and Row are required, and I did state that was in regards to heaps.

    In general, high density does cause some issues with clustered indexes because of the fact that clustered indexes must have unique keys and that means SQL Server is doing extra work. However, keep in mind that for any index, high density could result in table scans, whether the index is clustered or non-clustered. If selectivity isn't very high, SQL Server will go to a table scan.

    K. Brian Kelley

    bkelley@sqlservercentral.com

    http://www.sqlservercentral.com/columnists/bkelley/

    K. Brian Kelley
    @kbriankelley

  • If you are referring to a large number of unique values as high density. Then the answer is you always the most unique values as your cluster evn thou you non-clustered indexes will first be searched and then it will traverse the clustered index. The key to the clustered index is it sorts the table in that order and thus the leaf pages are sequential unlike non-clustered which can become fragmented thru page splits and other actions that take place internal to the database. From what I have learned in the past is that once the key for the clustered index is found in a non-clustered it starts looking in the pages this way.

    Think of the clustered leaf pages as marked like a rolladex. The egine looks at the beginning and ending of the page until it reaches the page that contains the proper range to search in. The when it hits that page(s) it searches thru the page itself to extract the data. Just like in a rolladex you would put Jones under J not A and thus SQL will file 10278 under 10100-10305 and search in between. (This is a very simplistic statement about). This is also why the highest number of uniqueness is best as you may have several pages of Jones' but only 1 10278 = Joy Jones. It actually helps to have the highest density possible and keep in mind the clustered index is the sorting of the table so overall the index takes little to no space. Hope this makes sense.

    If anyone sees a mistake here, please feel free to correct.

  • quote:


    ...A date is often a better choice, even though these are often unique, lots of queries need items in a range.

    Steve Jones

    steve@dkranch.net


    Date perhaps, but not datetime. Why not add columns "year" and "month" / "week" ? A default value week(date) can insert it (?), and a update trigger can update it.

    I don´t know, is it possible to use function in default value in create table -clause ?

    Anyway, insert-trigger can update it also (?) (does it run before or after real insert ?)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply