Clustered or non Clustered index??

  • Hi All,

    I need one help in deciding index to be used in my table. I have a table with around 200,000 records and there is a column 'FileName'(datatype: nvarchar(50)).As of now there are no indexes defined on this table.I use the column 'FileName' in filtering the records as well as deleting the records at later stage in my SSIS application.

    The values in the 'FileName' column are not unique i.e in the table there would be 100 records with one file name and another 100 records with another file name.

    I need to define an index for this table on 'FileName' column.I'm not sure whether to go for a Clustered index or non clustered index or not.

    Please advice which one to go for and why.

    Thanks,

    Sam

  • Hi Sam,

    Could you please post your DDL?

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Hi Welsh,

    Here is my DDL statement:

    CREATE TABLE [dbo].[SOURCE_ACTUALS](

    [Entry Unit Code] [nvarchar](10) NOT NULL,

    [OU_Code] [nvarchar](10) NULL,

    [Currency Code] [nvarchar](4) NULL,

    [Year/Month] [nvarchar](7) NOT NULL,

    [Value Type] [nvarchar](3) NOT NULL,

    [SFS line item code] [nvarchar](16) NOT NULL,

    [Direct / Allocated] [nvarchar](1) NULL,

    [Profit centre code] [nvarchar](32) NOT NULL,

    [Unit of Measure] [nvarchar](3) NULL,

    [Quantity] [numeric](17, 2) NULL,

    [Amount] [numeric](20, 2) NULL,

    [USD amount] [numeric](20, 2) NULL,

    [USD Plan amount] [numeric](20, 2) NULL,

    [BASE_GOVSTR_CODE] [nvarchar](10) NOT NULL,

    [Aggr PC] [nvarchar](32) NULL,

    [Aggr SFS LI] [nvarchar](16) NULL,

    [Run Date] [datetime] NOT NULL,

    [FileName] [nvarchar](100) NOT NULL

    )

    And here are my DML statements:

    SELECT [Entry Unit Code],OU_Code,[Currency Code],[Year/Month],[Value Type],[SFS line item code],[Direct / Allocated],[Profit centre code],

    [Unit of Measure],Quantity,Amount,[USD amount],[USD Plan amount],BASE_GOVSTR_CODE,[Aggr PC],[Aggr SFS LI],[Run Date],FileName

    FROM SOURCE_ACTUALS WITH (NOLOCK) WHERE FileName=@FILE_NAME

    DELETE FROM SOURCE_ACTUALS WHERE [FileName] = @file_name

  • You need a Non Clustered Index on FileName.

    You should have a Primary Key. What columns make the record unique?

    The clustered index should be based on a column that you query on a range.

    You should have Foreign Keys defined on this table.

    Non Clustered Indexes are often created to support Foreign Keys (Relationships) to improve performance on Joins.

    Non Clustered Indexes should be created on columns where they are highly selective, i.e. they only return a small percentage of the records.

    Is this table in Production? If not I would recommend normalizing the table for it will make your development of code a lot easier.

    Regards. 🙂

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (9/3/2011)


    The clustered index should be based on a column that you query on a range.

    I would tend not to agree with that, nonclustered indexes are better for range queries in general.

    The column for the clustered index should (as far as possible) be narrow, unique, ever-increasing and unchanging

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/3/2011)


    Welsh Corgi (9/3/2011)


    The clustered index should be based on a column that you query on a range.

    I would tend not to agree with that, nonclustered indexes are better for range queries in general.

    The column for the clustered index should (as far as possible) be narrow, unique, ever-increasing and unchanging

    Thanks for the information.

    The reason that I stated that was because all of the documentation that I have read suggest creating the Clustered Index for Range queries.

    Per the following article; A clustered index is particularly efficient on columns that are often searched for ranges of values.

    http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx

    http://msdn.microsoft.com/en-us/library/ms190639.aspx

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • I found your article "Is a clustered index best for range queries?"

    http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/

    I also found an article written by Kimberly Tripp.

    http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx

    In my Microsoft study materials they suggest that a clustered index should be used on a range query.

    If Microsoft is putting out information that is questionable then perhaps they should be notified.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

  • Welsh Corgi (9/3/2011)


    If Microsoft is putting out information that is questionable then perhaps they should be notified.

    LoL, like it's so unusual for them to do that </sarcasm>

    http://sqlskills.com/BLOGS/PAUL/post/Do-yourself-a-favor-Trust-No-One.aspx

    There's a reason I don't make claims on my blog unless I can prove them. That MS info on 'clustered indexes best for range scans' is based on a faulty premise and has been repeated for every version that I can remember.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Grant Fritchey (9/4/2011)


    The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.

    If I'm going to make the cluster useful to queries (which I try to do, but I prefer it to organise the table first), I'll look for frequent queries that access a lot of the rows (where a lot > 1%) and a lot of columns. If 90% of the queries against a table are equality predicates on a unique column fetching 5 out of the 40 columns in the table, that's perfect for a nonclustered index. If the other 10% are a query on date that returns 10% of the table on average and need all but two columns, (and the date mostly satisfies the other useful properties for the clustered index), the cluster goes onto the date column. Not because it's a range query, but because it's returning a large portion of the table (both in terms of rows and columns)

    There's no single answer to this question, lots of different approaches that can be taken

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • GilaMonster (9/4/2011)


    Grant Fritchey (9/4/2011)


    The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.

    If I'm going to make the cluster useful to queries (which I try to do, but I prefer it to organise the table first), I'll look for frequent queries that access a lot of the rows (where a lot > 1%) and a lot of columns. If 90% of the queries against a table are equality predicates on a unique column fetching 5 out of the 40 columns in the table, that's perfect for a nonclustered index. If the other 10% are a query on date that returns 10% of the table on average and need all but two columns, (and the date mostly satisfies the other useful properties for the clustered index), the cluster goes onto the date column. Not because it's a range query, but because it's returning a large portion of the table (both in terms of rows and columns)

    There's no single answer to this question, lots of different approaches that can be taken

    Agreed. I probably should have been more clear about that. If it's not accessing a considerable portion of the columns in the table, no, it's better off as a nonclustered index with INCLUDE, absolutely.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • GilaMonster (9/4/2011)


    LoL, like it's so unusual for them to do that </sarcasm>

    http://sqlskills.com/BLOGS/PAUL/post/Do-yourself-a-favor-Trust-No-One.aspx

    This article reminds me of a very bad experience with the advise "Microsoft" provided me on a SharePoint Server. :w00t:

    I never consulted with Microsoft but I have some freebie calls via the Partner Program. In retrospect, I wish that I had not opened a call.

    I also came across some potentially bad advice from the White Papers from the Microsoft Advisory Team.

    For better, quicker answers on T-SQL questions, click on the following...
    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    For better answers on performance questions, click on the following...
    http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply