Index Creation Guidelines

  • Excellent Article.

    I like the thinking here ,as in all situation DBA wan't find enough time to check lot many queries in QA . Your Skill as DBA is reflected when you can trace something important without going into long and short of analysis.

     

    Thanks,

    Sameer Raval

     

     

    Kindest Regards,

    Sameer Raval [Sql Server DBA]
    Geico Insurance
    RavalSameer@hotmail.com

  • Thanks , your comment is true.

  • I already posted the comment that was here once I checked, but dissapeared in the meantime.

    Anyway, I like the article, it is pretty good beginners guide to index design.

    However, I dissagree with next point:

    • If a column in a table is not at least 80 - 85% unique values, then most likely the SQL Server Query Optimizer will not use a non-clustered index based on that column.

    For example: index on column gender with values ‘M’, ‘F’, and NULL is not a subject for the index creation. The same rule can be applied to the flag and status columns with 5-6 different values.

    Index usefulness doesn't depend on number of possible values. Instead data distribution and query usage pattern should be considered. For example:

    If military database is frequently searched for females (let's say lest then 5% soldiers are females) index on gender column may be used. Also, condition Status = "in process" in order processing system would benefit from index on Status column even if there are only a few possible statuses.

  • Can you explain a little more about not needing indexes on date columns?

    The majority of my searches use a date parameter and look at small portions of the data based on date.

    As far as i can tell the oprimizer is using the index

  • Actually not sure about the 80% value but there is a threshold in which statistically the index will not be a value. But in your example you mention M, F, null and a military database. No matter the percentage of F in the system the statistics are still based on the overall uniqueness of the data in the index. So if you have said index and it has a million rows then the statistical uniqueness is .0003 percent. Due to this, more often than not it will choose a table scan or clustered index scan no matter how many times you run it. That said there are some conditional situations where things will behave differently than the statement leads to believe.

    Ex. -- Try building an querying against this table.

    CREATE TABLE dbo.MilPer (idx int identity(1,1) not null primary key,

    sex char(1) null

    )

    go

    CREATE NONCLUSTERED INDEX IX_MilPer ON dbo.MilPer

     (

     sex

    &nbsp

    GO

    set nocount on

    --Insert 1,000,000 records all M

    declare @x int

    set @x = 0

    while @x < 1000000

    begin

     insert dbo.MilPer (sex) values ('M')

     set @x = @x + 1

    end

    go

    --50000 updated to F

    UPDATE

     M

    SET

     sex = 'F'

    FROM

     dbo.MilPer M

    INNER JOIN

     (SELECT TOP 50000 X.idx FROM dbo.MilPer X ORDER BY NewId()) Y

    ON

     M.idx = Y.idx

    go

    --20000 updated to null

    UPDATE

     M

    SET

     sex = Null

    FROM

     dbo.MilPer M

    INNER JOIN

     (SELECT TOP 20000 X.idx FROM dbo.MilPer X WHERE sex != 'F' ORDER BY NewId()) Y

    ON

     M.idx = Y.idx

    go

    --Verify counts

    select sex, count(*) from dbo.MilPer  group by sex

    GO

    -----Do your testing here.--------

    --Now change the table to this.

    ALTER TABLE dbo.MilPer ADD

     [alt] char(1) NOT NULL CONSTRAINT DF_MilPer_at DEFAULT 'A'

    GO

    -----Do your testing here.--------

    You'll find that when sex and idx were the only columns it will use the non-clustered index. This is because it also happens that the non-clustered index will work as a covering index for SELECT * in that case because the idx column is tucked inside the non-clustered index because of the way clustered indexes relate to non-clustered.

    However once you add the alt column you will find it will do a clustered index seek. This is because the non-clustered index no longer will work as a covering index and the statistic density (based on uniqueness) isn't going to pose high enough to have the system consdier it. If you instead do SELECT idx, sex you will get the non-clustered seek again. That said however neither method executes faster than the other because the data itself in the case isn't dense enough for thorough testing.

  • You can use indexes on columns like OrderStatus that have a small number of values.  Hopefully 99% of your orders are completed, but if you need to check the sub-1% of orders that have status of HOLD, ERROR, INQUEUE, PROCESSING, you want to use an index. 

    You can have an index with highly selective values and low-selectivity values.  Don't generalize, know your selectivity!

    I had a client that needed to work on orders that were status=QUEUE.  The order status index worked great even though it would have hindered a search on status=COMPLETED.  The searches that would return less than 1% of the rows used the index. 

    Bill Twomey

    Microsoft SQL DBA

  • Good article.  I have a question about the comment not to index free form text fields, though.  Does this mean not to index a name field at all (a concept I don't understand if a search is to be done on a name) or the name field should be a char type and not a varchar type?

    I also would be interested in more information on indexing datetime fields, as I do that quite often.

    John Deupree

  • I didn't agree with that part of the article, but I think that it is the way that the article is worded. It says:

    "If the search will use a datetime function in ‘where’ clause or some conversion mechanism, most likely the index will not be used by the Query Optimizer but performance will suffer."

    This, I think, is actually the point, that DATETIME columns are often misused in queries. If you have an index on a DATETIME column, and you join or query on that column using a function like DATEPART or something like that, then SQL Server cannot use the index.

    However, that is not a good reason not to index DATETIME columns. If your queries are using functions, the answer is to index the column if it is otherwise warranted, and then rewrite the offending queries so that they don't use functions.

    Personally, I have found that indexes on DATETIME columns are very useful when querying on date ranges. In fact, in databases with poor indexes where I don't have rights to change anything, I will often pull DATETIME and PK columns into a temporary table and index them, then use that in the join. On a large table, the overhead of creating the temp table and index is more than made up for by time saved in querying the main table.

  • It is related  to the dates where portion for time is not used. For example: '10/21/2005 00:00:00:000' and many records have the same date. Otherwise index is fine

  • An index on datetime which only has a calendar date might be more useful than you think.  1 day out of 365 is 0.27%.  0.27% is selective.  I expect the optimizer to use an index in this case.  If you have a few years of data, the index would be used for a weeks worth of data. 

    The only problem with indexes on datetime, is sometimes they don't get used.  Coders defeat the index by using datepart on the indexed column.  But that's a problem with wetware, not the data type, the index or the optimizer. 

     

     

  • wetware??

  • Nice work Leo, good article on indexes, but should also include a general definition of what a clustered vs non-clustered, but this may be in the book. Also, indexes will change over time since how data is acessed will change, queries change as the database and apps over the db update as business domain rules change. Bottom line is to use indexes when necesary to minimixe table scans, and remove them when no longer needed.

    Simple/general definition of cluster and non_cluster.

    cluster: like the table of contents of a book in some common grouping.

    non cluster: like the index in the back ordered in some general order.

    Cheers!

  • In most cases when you searching a name application iis searching with like criteria. And if it is '%company%' index will not be used.

  • We look for names of people (last first).  We never use the leading %, only the trailing %.  So an index is very useful.  I just wonder if using a varchar instead of a char field has an effect on performance.

    John Deupree

  • Varchar is always slower, but char data type for free form columns will creates much more problems. So, stay with varchar. 

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply