December 18, 2007 at 9:14 am
Hello,
I am having large tables that are typically 300-500 columns by 150-200mln records (~300-600GB). These are data extracts that I receive periodically and there are no updates to the data. I am typically adding a identity column to link derived data back to its source record.
MS SQL2005 recommends a clustered index followed by regular indexes approach. Adding a clustered index to such large table is difficult b/c of the disk space required to create the clustered index. I am wondering what the benefits and penalties would between the scenarios below.
Any insight would be appreciated.
Thanks!
Scenario 1
Create table
load data
create clustered index
create non-clustered indexes
Scenario2:
Create table
Create clustered index
Load data
Create non-clustered indexes
December 18, 2007 at 9:29 am
How are you loading the data, bcp, bulk insert, sql? How long do you have to complete the load? How is the data used after the load? How does the identity column allow you to link back to the source record?
I would do something close to scenario 2, create table, clustered index, and non-clustered indexes and then load the data. Since you are using an Identity Field as the clustered index having the clustered index on the table at insert should not affect load performance because Identity is sequential. Having the non-clustered indexes before the load may slow the performance as it created the index pages.
I would also do my insert in batches.
As usual you should test both and see which one is faster.
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
December 18, 2007 at 3:40 pm
Does a Clustered indenitiy really do anything for you? Have you experimented with the Cluster on a real SARG and just having the ident nonclust?
December 19, 2007 at 5:30 am
You really should have a clustered index, but since you get only one, you need to make sure you put it in the right place. The PK is clustered by default, but it doesn't have to be. How is your table accessed? What is the most common select statement run against the table? If it is a clean select from the PK, then leave the cluster there, but if some other column, a company name, a foreign key from another table, or some combination of columns is the more likely avenue for select statements, that's probably a better place for the clustered index. You need to evaluate the design as well as the query structure. The time to load the table and the disk space should be secondary considerations to getting that cluster right. Remember, if you put it in the wrong place, it not only wastes space & time, but it will hurt your performance since you'll see a lot more scans and lookups.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
December 20, 2007 at 3:09 am
you might like to scan through this analysis I did on adding clustered indexes to tables and the size/choice of clustered key
http://grumpyolddba.co.uk/sql2005/working%20with%20indexes%204.mht
the work I did was based upon tables having between 100 and 400 million rows.
[font="Comic Sans MS"]The GrumpyOldDBA[/font]
www.grumpyolddba.co.uk
http://sqlblogcasts.com/blogs/grumpyolddba/
December 20, 2007 at 7:42 am
Partitioning on the identity field (partitioning key) is a good choice for such a big table. You can put each partition on different file group and disk (or disk control) to increase I/O throughput. The primary key on this identity field will be with each partition since it is clustered index. To further improve I/O for data loading and indexing, you can put any non-clustered indexes into different file groups (I would suggest aligned partitioning rather than storage aligned).
Since this table has so many columns, you can also consider vertical partitioning.
Shawn Shi
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply