September 3, 2011 at 4:38 am
Hi All,
I need one help in deciding index to be used in my table. I have a table with around 200,000 records and there is a column 'FileName'(datatype: nvarchar(50)).As of now there are no indexes defined on this table.I use the column 'FileName' in filtering the records as well as deleting the records at later stage in my SSIS application.
The values in the 'FileName' column are not unique i.e in the table there would be 100 records with one file name and another 100 records with another file name.
I need to define an index for this table on 'FileName' column.I'm not sure whether to go for a Clustered index or non clustered index or not.
Please advice which one to go for and why.
Thanks,
Sam
September 3, 2011 at 4:53 am
Hi Sam,
Could you please post your DDL?
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
September 3, 2011 at 5:19 am
Hi Welsh,
Here is my DDL statement:
CREATE TABLE [dbo].[SOURCE_ACTUALS](
[Entry Unit Code] [nvarchar](10) NOT NULL,
[OU_Code] [nvarchar](10) NULL,
[Currency Code] [nvarchar](4) NULL,
[Year/Month] [nvarchar](7) NOT NULL,
[Value Type] [nvarchar](3) NOT NULL,
[SFS line item code] [nvarchar](16) NOT NULL,
[Direct / Allocated] [nvarchar](1) NULL,
[Profit centre code] [nvarchar](32) NOT NULL,
[Unit of Measure] [nvarchar](3) NULL,
[Quantity] [numeric](17, 2) NULL,
[Amount] [numeric](20, 2) NULL,
[USD amount] [numeric](20, 2) NULL,
[USD Plan amount] [numeric](20, 2) NULL,
[BASE_GOVSTR_CODE] [nvarchar](10) NOT NULL,
[Aggr PC] [nvarchar](32) NULL,
[Aggr SFS LI] [nvarchar](16) NULL,
[Run Date] [datetime] NOT NULL,
[FileName] [nvarchar](100) NOT NULL
)
And here are my DML statements:
SELECT [Entry Unit Code],OU_Code,[Currency Code],[Year/Month],[Value Type],[SFS line item code],[Direct / Allocated],[Profit centre code],
[Unit of Measure],Quantity,Amount,[USD amount],[USD Plan amount],BASE_GOVSTR_CODE,[Aggr PC],[Aggr SFS LI],[Run Date],FileName
FROM SOURCE_ACTUALS WITH (NOLOCK) WHERE FileName=@FILE_NAME
DELETE FROM SOURCE_ACTUALS WHERE [FileName] = @file_name
September 3, 2011 at 6:13 am
You need a Non Clustered Index on FileName.
You should have a Primary Key. What columns make the record unique?
The clustered index should be based on a column that you query on a range.
You should have Foreign Keys defined on this table.
Non Clustered Indexes are often created to support Foreign Keys (Relationships) to improve performance on Joins.
Non Clustered Indexes should be created on columns where they are highly selective, i.e. they only return a small percentage of the records.
Is this table in Production? If not I would recommend normalizing the table for it will make your development of code a lot easier.
Regards. 🙂
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
September 3, 2011 at 9:00 am
Welsh Corgi (9/3/2011)
The clustered index should be based on a column that you query on a range.
I would tend not to agree with that, nonclustered indexes are better for range queries in general.
The column for the clustered index should (as far as possible) be narrow, unique, ever-increasing and unchanging
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 3, 2011 at 5:46 pm
GilaMonster (9/3/2011)
Welsh Corgi (9/3/2011)
The clustered index should be based on a column that you query on a range.I would tend not to agree with that, nonclustered indexes are better for range queries in general.
The column for the clustered index should (as far as possible) be narrow, unique, ever-increasing and unchanging
Thanks for the information.
The reason that I stated that was because all of the documentation that I have read suggest creating the Clustered Index for Range queries.
Per the following article; A clustered index is particularly efficient on columns that are often searched for ranges of values.
http://msdn.microsoft.com/en-us/library/aa933131(v=sql.80).aspx
http://msdn.microsoft.com/en-us/library/ms190639.aspx
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
September 3, 2011 at 7:21 pm
I found your article "Is a clustered index best for range queries?"
http://sqlinthewild.co.za/index.php/2011/02/01/is-a-clustered-index-best-for-range-queries/
I also found an article written by Kimberly Tripp.
http://www.sqlskills.com/blogs/kimberly/post/The-Clustered-Index-Debate-Continues.aspx
In my Microsoft study materials they suggest that a clustered index should be used on a range query.
If Microsoft is putting out information that is questionable then perhaps they should be notified.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
September 4, 2011 at 3:36 am
Welsh Corgi (9/3/2011)
If Microsoft is putting out information that is questionable then perhaps they should be notified.
LoL, like it's so unusual for them to do that </sarcasm>
http://sqlskills.com/BLOGS/PAUL/post/Do-yourself-a-favor-Trust-No-One.aspx
There's a reason I don't make claims on my blog unless I can prove them. That MS info on 'clustered indexes best for range scans' is based on a faulty premise and has been repeated for every version that I can remember.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 4, 2011 at 5:53 am
The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 4, 2011 at 6:08 am
Grant Fritchey (9/4/2011)
The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.
If I'm going to make the cluster useful to queries (which I try to do, but I prefer it to organise the table first), I'll look for frequent queries that access a lot of the rows (where a lot > 1%) and a lot of columns. If 90% of the queries against a table are equality predicates on a unique column fetching 5 out of the 40 columns in the table, that's perfect for a nonclustered index. If the other 10% are a query on date that returns 10% of the table on average and need all but two columns, (and the date mostly satisfies the other useful properties for the clustered index), the cluster goes onto the date column. Not because it's a range query, but because it's returning a large portion of the table (both in terms of rows and columns)
There's no single answer to this question, lots of different approaches that can be taken
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 4, 2011 at 6:24 am
GilaMonster (9/4/2011)
Grant Fritchey (9/4/2011)
The only deviation I would make on the clustered index over what Gail says is to look for the most frequently accessed path to the data. Usually, it's the primary key and most systems these days the PK is an identity which satisfies the narrow, monotonically increasing requirements. However, if, on a given table, 80-90% of the access is through a foreign key or a date or a couple of other fields (nothing crazy like a nvarchar), I'll put the cluster there. You only get one and that's where the data is stored. May as well take the most advantage of it you can.If I'm going to make the cluster useful to queries (which I try to do, but I prefer it to organise the table first), I'll look for frequent queries that access a lot of the rows (where a lot > 1%) and a lot of columns. If 90% of the queries against a table are equality predicates on a unique column fetching 5 out of the 40 columns in the table, that's perfect for a nonclustered index. If the other 10% are a query on date that returns 10% of the table on average and need all but two columns, (and the date mostly satisfies the other useful properties for the clustered index), the cluster goes onto the date column. Not because it's a range query, but because it's returning a large portion of the table (both in terms of rows and columns)
There's no single answer to this question, lots of different approaches that can be taken
Agreed. I probably should have been more clear about that. If it's not accessing a considerable portion of the columns in the table, no, it's better off as a nonclustered index with INCLUDE, absolutely.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
September 4, 2011 at 10:47 am
GilaMonster (9/4/2011)
LoL, like it's so unusual for them to do that </sarcasm>http://sqlskills.com/BLOGS/PAUL/post/Do-yourself-a-favor-Trust-No-One.aspx
This article reminds me of a very bad experience with the advise "Microsoft" provided me on a SharePoint Server. :w00t:
I never consulted with Microsoft but I have some freebie calls via the Partner Program. In retrospect, I wish that I had not opened a call.
I also came across some potentially bad advice from the White Papers from the Microsoft Advisory Team.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply