Primary Key / Index Question

Question

Primary Key / Index Question

David Burrows

SSC Guru

Points: 65144
More actions
June 3, 2015 at 2:22 am

#304566

SQL Server 2008 R2
With the following example table
UploadID int IDENTITY(1,1) NOT NULL,
PeriodNo int NOT NULL,
UploadType tinyint NOT NULL,
col1 varchar(12) NULL,
col2 varchar(2) NULL,
col3 varchar(2) NULL,
col4 varchar(2) NULL,
col5 varchar(2) NULL,
col6 varchar(8) NULL,
col7 varchar(8) NULL,
col8 varchar(8) NULL,
col9 varchar(8) NULL,
col10 varchar(8) NULL,
col11 varchar(12) NULL,
col12 varchar(4) NULL,
col13 varchar(4) NULL,
col14 varchar(2) NULL,
col15 varchar(2) NULL,
col16 varchar(3) NULL,
col17 varchar(2) NULL,
col18 varchar(20) NULL,
col19 varchar(20) NULL,
col20 varchar(11) NULL
Insertions approx 1000-2000 per month
col1-col20 may or may not contain values
col1-col20 not suitable for primary key
Inserts by PeriodNo/UploadType
Deletes by PeriodNo, PeriodNo/UploadType or UploadID
Select by PeriodNo/UploadType
Rows can be deleted and reinserted many times and the data may or may not be the same
The question is what would you suggest for a primary key and indexes to reduce fragmentation and aid performance?
Far away is close at hand in the images of elsewhere.
Anon.

Viewing 15 posts - 1 through 15 (of 16 total)

You must be logged in to reply to this topic. Login to reply

TomThomson SSC Guru Points: 104773 More actions · Answer 1

I could make a wild guess but my best answer, since you haven't provided enough information to do more than just guess, is "it depends".

Some of the questions that need answering to get more than a guess are:-

1) How frequent are deletes?

2) How frequent are updates? Or are there none?

3) What does Row mean in the statement "Rows can be deleted and reinserted many times and the data may or may not be the same"? The statement implies that there is some way of determining whether two inserts at different times are inserting (and reinserting) the same row, ie there is at least one natural key. So what natural keys are there? Is there only one, or more than one?

It might also be useful to know something about value ranges - for example does uploadType take on all 256 possible distinct values or just a dozen?

Tom

Bill Talada SSChampion Points: 11954 More actions · Answer 2

Since UploadID is an identity, I assume deleted rows won't be reinserted so pages will generally become empty instead of split.

Personally I wouldn't care about fragmentation on such a tiny table with tiny rows. I'd probably make it a heap table. Create some indexes to handle SELECTS on the two main columns. Defrag it once every few months if the empty pages bugs you.

Spend 80% of your time optimizing your largest and most active tables.

David Burrows SSC Guru Points: 65144 More actions · Answer 3

Thanks for the feedback Tom

1) How frequent are deletes?

Unknown as the user can run a process any number of times or not at all

2) How frequent are updates? Or are there none?

None, data is only inserted and deleted, ie delete rows for matching PeriodNo and UploadType then reinsert

3) What does Row mean in the statement "Rows can be deleted and reinserted many times and the data may or may not be the same"?

There can be any number of rows per PeriodNo/UploadType combination or none at all

PeriodNo indicates a fiscal month ie 12 per fiscal year

There are currently 4 Upload Types (eg 1,2,3,4) but this may increase in the future

This table is part of a monthly process, at the start of the process rows are deleted matching the Period being run

The user has the option of inserting data for each Upload Type,

this process deletes rows for the selected Upload Type for the Period being run

then inserts data from other tables for the selected Upload Type for the Period being run

col1 may have the same data repeated in several rows or could be null

this is the same for col2 to col20

it is most likely that the combination of col1 to col20 would be unique but cannot be guaranteed

(Note that the user will have the ability to delete individual rows or a selection of rows)

The reason for this is that if the user creates the data but finds a problem with it they would correct the originating data and recreate the data as outlined above

This can be reapeated any number of times at any frequency for any of the PeriodNo/UploadType combinations

Or the user may elect to start the whole process from the beginning

Far away is close at hand in the images of elsewhere.
Anon.

TomThomson SSC Guru Points: 104773 More actions · Answer 4

Thanks for teh extra informtion. Having seen that I think I would go for using UploadID as UNCLUSTERED primary key and have a CLUSTERED index with key (PeriodNo, UploadType). But there are people around here with far more expertise on choosing indexes than I have, so I'll ask one of them to step in and comment.

Tom

David Burrows SSC Guru Points: 65144 More actions · Answer 5

Thanks Tom 🙂

Far away is close at hand in the images of elsewhere.
Anon.

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 6

If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.

Or I might add a non-clustered Index which includes all three afore mentioned columns.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

David Burrows SSC Guru Points: 65144 More actions · Answer 7

Brandie Tarvin (6/3/2015)
If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.
Or I might add a non-clustered Index which includes all three afore mentioned columns.

Thanks Brandie

Because the nature of the data in the additional columns and the need to be able to delete a single row or a selection of rows I need UploadID to make rows unique

Far away is close at hand in the images of elsewhere.
Anon.

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 8

David Burrows (6/3/2015)
Brandie Tarvin (6/3/2015)
If UploadID is used for anything at all, I might make it part of a composite clustered index. But, as Tom suggested, make it a separate UNCLUSTERED PK first.
Or I might add a non-clustered Index which includes all three afore mentioned columns.
Thanks Brandie
Because the nature of the data in the additional columns and the need to be able to delete a single row or a selection of rows I need UploadID to make rows unique

I know the feeling. We have the same issue in several of our databases. Sometimes we use the IDENTITY as a JOIN condition, sometimes it's just to keep things straight, especially when it comes to potentially duplicated data.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 9

TomThomson (6/3/2015)
Thanks for teh extra informtion. Having seen that I think I would go for using UploadID as UNCLUSTERED primary key and have a CLUSTERED index with key (PeriodNo, UploadType). But there are people around here with far more expertise on choosing indexes than I have, so I'll ask one of them to step in and comment.

If the PeriodNo is ever increasing, then yeah, I agree.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David Burrows SSC Guru Points: 65144 More actions · Answer 10

Jeff Moden (6/3/2015)

If the PeriodNo is ever increasing, then yeah, I agree.

Thanks Jeff

Yes PeriodNo will always be increasing

I will go with Tom's suggestion 🙂

Far away is close at hand in the images of elsewhere.
Anon.

Jeff Moden SSC Guru Points: 1004432 More actions · Answer 11

David Burrows (6/3/2015)

Jeff Moden (6/3/2015)
If the PeriodNo is ever increasing, then yeah, I agree.
Thanks Jeff
Yes PeriodNo will always be increasing
I will go with Tom's suggestion 🙂

Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

David Burrows SSC Guru Points: 65144 More actions · Answer 12

Jeff Moden (6/3/2015)

David Burrows (6/3/2015)
Jeff Moden (6/3/2015)
If the PeriodNo is ever increasing, then yeah, I agree.
Thanks Jeff
Yes PeriodNo will always be increasing
I will go with Tom's suggestion 🙂
Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.

Thanks Jeff 🙂

Far away is close at hand in the images of elsewhere.
Anon.

Brandie Tarvin SSC Guru Points: 173080 More actions · Answer 13

Jeff Moden (6/3/2015)

David Burrows (6/3/2015)
Jeff Moden (6/3/2015)
If the PeriodNo is ever increasing, then yeah, I agree.
Thanks Jeff
Yes PeriodNo will always be increasing
I will go with Tom's suggestion 🙂
Just one more addition... if the two columns that Tom suggested do NOT make for a UNIQUE clustered index, consider adding the UploadID column as a 3rd column to the clustered index to make it unique and then define it that way. SQL Server has a mighty big love for unique clustered indexes.

Which is what I was saying in my first post.

Brandie Tarvin, MCITP Database AdministratorLiveJournal Blog: http://brandietarvin.livejournal.com/[/url]On LinkedIn!, Google+, and Twitter.Freelance Writer: ShadowrunLatchkeys: Nevermore, Latchkeys: The Bootleg War, and Latchkeys: Roscoes in the Night are now available on Nook and Kindle.

TomThomson SSC Guru Points: 104773 More actions · Answer 14

Braindead comment removed. Must remember to switch brain on before applying fingers to keyboard.

Tom