June 3, 2015 at 2:22 am
SQL Server 2008 R2
With the following example table
UploadID int IDENTITY(1,1) NOT NULL,
PeriodNo int NOT NULL,
UploadType tinyint NOT NULL,
col1 varchar(12) NULL,
col2 varchar(2) NULL,
col3 varchar(2) NULL,
col4 varchar(2) NULL,
col5 varchar(2) NULL,
col6 varchar(8) NULL,
col7 varchar(8) NULL,
col8 varchar(8) NULL,
col9 varchar(8) NULL,
col10 varchar(8) NULL,
col11 varchar(12) NULL,
col12 varchar(4) NULL,
col13 varchar(4) NULL,
col14 varchar(2) NULL,
col15 varchar(2) NULL,
col16 varchar(3) NULL,
col17 varchar(2) NULL,
col18 varchar(20) NULL,
col19 varchar(20) NULL,
col20 varchar(11) NULL
Insertions approx 1000-2000 per month
col1-col20 may or may not contain values
col1-col20 not suitable for primary key
Inserts by PeriodNo/UploadType
Deletes by PeriodNo, PeriodNo/UploadType or UploadID
Select by PeriodNo/UploadType
Rows can be deleted and reinserted many times and the data may or may not be the same
The question is what would you suggest for a primary key and indexes to reduce fragmentation and aid performance?
Far away is close at hand in the images of elsewhere.
Anon.
June 3, 2015 at 8:12 am
I could make a wild guess but my best answer, since you haven't provided enough information to do more than just guess, is "it depends".
Some of the questions that need answering to get more than a guess are:-
1) How frequent are deletes?
2) How frequent are updates? Or are there none?
3) What does Row mean in the statement "Rows can be deleted and reinserted many times and the data may or may not be the same"? The statement implies that there is some way of determining whether two inserts at different times are inserting (and reinserting) the same row, ie there is at least one natural key. So what natural keys are there? Is there only one, or more than one?
It might also be useful to know something about value ranges - for example does uploadType take on all 256 possible distinct values or just a dozen?
Tom
June 3, 2015 at 8:38 am
Since UploadID is an identity, I assume deleted rows won't be reinserted so pages will generally become empty instead of split.
Personally I wouldn't care about fragmentation on such a tiny table with tiny rows. I'd probably make it a heap table. Create some indexes to handle SELECTS on the two main columns. Defrag it once every few months if the empty pages bugs you.
Spend 80% of your time optimizing your largest and most active tables.
June 3, 2015 at 9:11 am
Thanks for the feedback Tom
1) How frequent are deletes?
Unknown as the user can run a process any number of times or not at all
2) How frequent are updates? Or are there none?
None, data is only inserted and deleted, ie delete rows for matching PeriodNo and UploadType then reinsert
3) What does Row mean in the statement "Rows can be deleted and reinserted many times and the data may or may not be the same"?
There can be any number of rows per PeriodNo/UploadType combination or none at all
PeriodNo indicates a fiscal month ie 12 per fiscal year
There are currently 4 Upload Types (eg 1,2,3,4) but this may increase in the future
This table is part of a monthly process, at the start of the process rows are deleted matching the Period being run
The user has the option of inserting data for each Upload Type,
this process deletes rows for the selected Upload Type for the Period being run
then inserts data from other tables for the selected Upload Type for the Period being run
col1 may have the same data repeated in several rows or could be null
this is the same for col2 to col20
it is most likely that the combination of col1 to col20 would be unique but cannot be guaranteed
(Note that the user will have the ability to delete individual rows or a selection of rows)
The reason for this is that if the user creates the data but finds a problem with it they would correct the originating data and recreate the data as outlined above
This can be reapeated any number of times at any frequency for any of the PeriodNo/UploadType combinations
Or the user may elect to start the whole process from the beginning
Far away is close at hand in the images of elsewhere.
Anon.
June 3, 2015 at 9:28 am
Thanks for teh extra informtion. Having seen that I think I would go for using UploadID as UNCLUSTERED primary key and have a CLUSTERED index with key (PeriodNo, UploadType). But there are people around here with far more expertise on choosing indexes than I have, so I'll ask one of them to step in and comment.
Tom
June 3, 2015 at 9:32 am
Thanks Tom 🙂
Far away is close at hand in the images of elsewhere.
Anon.
June 3, 2015 at 9:56 am
If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.
Or I might add a non-clustered Index which includes all three afore mentioned columns.
June 3, 2015 at 10:14 am
Brandie Tarvin (6/3/2015)
If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.Or I might add a non-clustered Index which includes all three afore mentioned columns.
Thanks Brandie
Because the nature of the data in the additional columns and the need to be able to delete a single row or a selection of rows I need UploadID to make rows unique
Far away is close at hand in the images of elsewhere.
Anon.
June 3, 2015 at 10:22 am
David Burrows (6/3/2015)
Brandie Tarvin (6/3/2015)
If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.Or I might add a non-clustered Index which includes all three afore mentioned columns.
Thanks Brandie
Because the nature of the data in the additional columns and the need to be able to delete a single row or a selection of rows I need UploadID to make rows unique
I know the feeling. We have the same issue in several of our databases. Sometimes we use the IDENTITY as a JOIN condition, sometimes it's just to keep things straight, especially when it comes to potentially duplicated data.
June 3, 2015 at 10:31 am
TomThomson (6/3/2015)
Thanks for teh extra informtion. Having seen that I think I would go for using UploadID as UNCLUSTERED primary key and have a CLUSTERED index with key (PeriodNo, UploadType). But there are people around here with far more expertise on choosing indexes than I have, so I'll ask one of them to step in and comment.
If the PeriodNo is ever increasing, then yeah, I agree.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 3, 2015 at 10:38 am
Jeff Moden (6/3/2015)
If the PeriodNo is ever increasing, then yeah, I agree.
Thanks Jeff
Yes PeriodNo will always be increasing
I will go with Tom's suggestion 🙂
Far away is close at hand in the images of elsewhere.
Anon.
June 3, 2015 at 8:57 pm
David Burrows (6/3/2015)
Jeff Moden (6/3/2015)
If the PeriodNo is ever increasing, then yeah, I agree.
Thanks Jeff
Yes PeriodNo will always be increasing
I will go with Tom's suggestion 🙂
Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.
--Jeff Moden
Change is inevitable... Change for the better is not.
June 4, 2015 at 1:53 am
Jeff Moden (6/3/2015)
David Burrows (6/3/2015)
Jeff Moden (6/3/2015)
If the PeriodNo is ever increasing, then yeah, I agree.
Thanks Jeff
Yes PeriodNo will always be increasing
I will go with Tom's suggestion 🙂
Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.
Thanks Jeff 🙂
Far away is close at hand in the images of elsewhere.
Anon.
June 4, 2015 at 4:47 am
Jeff Moden (6/3/2015)
David Burrows (6/3/2015)
Jeff Moden (6/3/2015)
If the PeriodNo is ever increasing, then yeah, I agree.
Thanks Jeff
Yes PeriodNo will always be increasing
I will go with Tom's suggestion 🙂
Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.
Which is what I was saying in my first post.
June 5, 2015 at 8:53 pm
Braindead comment removed. Must remember to switch brain on before applying fingers to keyboard.
Tom
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply