Clustered or non Clustered index??

Question

Post reply

Clustered or non Clustered index??

sam 55243

SSCrazy

Points: 2697
More actions
September 3, 2011 at 4:38 am

#256710

Hi All,
I need one help in deciding index to be used in my table. I have a table with around 200,000 records and there is a column 'FileName'(datatype: nvarchar(50)).As of now there are no indexes defined on this table.I use the column 'FileName' in filtering the records as well as deleting the records at later stage in my SSIS application.
The values in the 'FileName' column are not unique i.e in the table there would be 100 records with one file name and another 100 records with another file name.
I need to define an index for this table on 'FileName' column.I'm not sure whether to go for a Clustered index or non clustered index or not.
Please advice which one to go for and why.
Thanks,
Sam

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 1

Hi Sam,

Could you please post your DDL?

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

sam 55243 SSCrazy Points: 2697 More actions · Answer 2

Hi Welsh,

Here is my DDL statement:

CREATE TABLE [dbo].[SOURCE_ACTUALS](

[Entry Unit Code] [nvarchar](10) NOT NULL,

[OU_Code] [nvarchar](10) NULL,

[Currency Code] [nvarchar](4) NULL,

[Year/Month] [nvarchar](7) NOT NULL,

[Value Type] [nvarchar](3) NOT NULL,

[SFS line item code] [nvarchar](16) NOT NULL,

[Direct / Allocated] [nvarchar](1) NULL,

[Profit centre code] [nvarchar](32) NOT NULL,

[Unit of Measure] [nvarchar](3) NULL,

[Quantity] [numeric](17, 2) NULL,

[Amount] [numeric](20, 2) NULL,

[USD amount] [numeric](20, 2) NULL,

[USD Plan amount] [numeric](20, 2) NULL,

[BASE_GOVSTR_CODE] [nvarchar](10) NOT NULL,

[Aggr PC] [nvarchar](32) NULL,

[Aggr SFS LI] [nvarchar](16) NULL,

[Run Date] [datetime] NOT NULL,

[FileName] [nvarchar](100) NOT NULL

)

And here are my DML statements:

SELECT [Entry Unit Code],OU_Code,[Currency Code],[Year/Month],[Value Type],[SFS line item code],[Direct / Allocated],[Profit centre code],

[Unit of Measure],Quantity,Amount,[USD amount],[USD Plan amount],BASE_GOVSTR_CODE,[Aggr PC],[Aggr SFS LI],[Run Date],FileName

FROM SOURCE_ACTUALS WITH (NOLOCK) WHERE FileName=@FILE_NAME

DELETE FROM SOURCE_ACTUALS WHERE [FileName] = @file_name

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 3

You need a Non Clustered Index on FileName.

You should have a Primary Key. What columns make the record unique?

The clustered index should be based on a column that you query on a range.

You should have Foreign Keys defined on this table.

Non Clustered Indexes are often created to support Foreign Keys (Relationships) to improve performance on Joins.

Non Clustered Indexes should be created on columns where they are highly selective, i.e. they only return a small percentage of the records.

Is this table in Production? If not I would recommend normalizing the table for it will make your development of code a lot easier.

Regards. 🙂

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

Welsh Corgi (9/3/2011)
The clustered index should be based on a column that you query on a range.

I would tend not to agree with that, nonclustered indexes are better for range queries in general.

The column for the clustered index should (as far as possible) be narrow, unique, ever-increasing and unchanging

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 5

GilaMonster (9/3/2011)
Welsh Corgi (9/3/2011)
The clustered index should be based on a column that you query on a range.
I would tend not to agree with that, nonclustered indexes are better for range queries in general.
The column for the clustered index should (as far as possible) be narrow, unique, ever-increasing and unchanging

Thanks for the information.

The reason that I stated that was because all of the documentation that I have read suggest creating the Clustered Index for Range queries.

Per the following article; A clustered index is particularly efficient on columns that are often searched for ranges of values.

http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx

http://msdn.microsoft.com/en-us/library/ms190639.aspx

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 6

I found your article "Is a clustered index best for range queries?"

http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/

I also found an article written by Kimberly Tripp.

http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx

In my Microsoft study materials they suggest that a clustered index should be used on a range query.

If Microsoft is putting out information that is questionable then perhaps they should be notified.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

Welsh Corgi (9/3/2011)
If Microsoft is putting out information that is questionable then perhaps they should be notified.

LoL, like it's so unusual for them to do that </sarcasm>

http://sqlskills.com/BLOGS/PAUL/post/Do-yourself-a-favor-Trust-No-One.aspx

There's a reason I don't make claims on my blog unless I can prove them. That MS info on 'clustered indexes best for range scans' is based on a faulty premise and has been repeated for every version that I can remember.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 8

The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

Grant Fritchey (9/4/2011)
The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.

If I'm going to make the cluster useful to queries (which I try to do, but I prefer it to organise the table first), I'll look for frequent queries that access a lot of the rows (where a lot > 1%) and a lot of columns. If 90% of the queries against a table are equality predicates on a unique column fetching 5 out of the 40 columns in the table, that's perfect for a nonclustered index. If the other 10% are a query on date that returns 10% of the table on average and need all but two columns, (and the date mostly satisfies the other useful properties for the clustered index), the cluster goes onto the date column. Not because it's a range query, but because it's returning a large portion of the table (both in terms of rows and columns)

There's no single answer to this question, lots of different approaches that can be taken

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Grant Fritchey SSC Guru Points: 398886 More actions · Answer 10

GilaMonster (9/4/2011)
Grant Fritchey (9/4/2011)
The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.
If I'm going to make the cluster useful to queries (which I try to do, but I prefer it to organise the table first), I'll look for frequent queries that access a lot of the rows (where a lot > 1%) and a lot of columns. If 90% of the queries against a table are equality predicates on a unique column fetching 5 out of the 40 columns in the table, that's perfect for a nonclustered index. If the other 10% are a query on date that returns 10% of the table on average and need all but two columns, (and the date mostly satisfies the other useful properties for the clustered index), the cluster goes onto the date column. Not because it's a range query, but because it's returning a large portion of the table (both in terms of rows and columns)
There's no single answer to this question, lots of different approaches that can be taken

Agreed. I probably should have been more clear about that. If it's not accessing a considerable portion of the columns in the table, no, it's better off as a nonclustered index with INCLUDE, absolutely.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

Welsh Corgi SSC Guru Points: 116520 More actions · Answer 11

GilaMonster (9/4/2011)
LoL, like it's so unusual for them to do that </sarcasm>
http://sqlskills.com/BLOGS/PAUL/post/Do-yourself-a-favor-Trust-No-One.aspx

This article reminds me of a very bad experience with the advise "Microsoft" provided me on a SharePoint Server. :w00t:

I never consulted with Microsoft but I have some freebie calls via the Partner Program. In retrospect, I wish that I had not opened a call.

I also came across some potentially bad advice from the White Papers from the Microsoft Advisory Team.

For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/

For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/