September 1, 2016 at 12:29 pm
I have a large table with duplicate values in all columns. I need to add a column, populate it with unique values (INT with values 1,2,3,etc is fine) so I can create a unique index.
What's the best way to go about this? This is SQL 2008.
September 1, 2016 at 12:40 pm
Why do you want to create a column with unique values to create a unique index on it? What do you expect to achieve?
September 1, 2016 at 12:55 pm
We have a new process to run updates on it that takes forever, so it needs a primary key or at least a unique index. One column already has a non-unique index if that's any help.
September 1, 2016 at 1:29 pm
There is still something missing here but why not just add an identity column? Then make it the primary key since it doesn't seem to matter what the clustered index might be.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 1, 2016 at 1:33 pm
This smells like RBAR. That would certainly be slow.
If you post DDL for your table and indexes, along with the UPDATE process we could give better advice on what to change to improve performance.
September 1, 2016 at 1:39 pm
dan-572483 (9/1/2016)
We have a new process to run updates on it that takes forever, so it needs a primary key or at least a unique index. One column already has a non-unique index if that's any help.
Unless your process is going to filter on that new column, adding it is a complete waste.
And updates don't need a primary key or unique index. An index on whatever the update filters on will be useful, doesn't matter if it's not unique.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
September 1, 2016 at 1:40 pm
Look at the missing index stats and the index usage stats to help determine a useful clustered index. Only if no useful clustering key exists, and remember to consider something like added datetime, then you can fall back to using identity.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 1, 2016 at 2:46 pm
This was the quick & easy solution. I assumed that an identity column had to be specified before you start adding records. No, it will create them on existing records when you create the column! Thanks!
September 1, 2016 at 11:33 pm
Yes, its always a best practice to have a primary key on table to avoid the heap. It can be either simple identity column or any business key.
Also as our experts suggested, its always suggested to have unique index on filtering column.
Thanks,
Durga Prasad.
September 2, 2016 at 8:00 am
A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 2, 2016 at 8:31 am
ScottPletcher (9/2/2016)
A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.
I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 2, 2016 at 9:53 am
Sean Lange (9/2/2016)
ScottPletcher (9/2/2016)
A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.
True. But, again, you don't need a primary key necessarily, just a clustered index. And a CI doesn't have to be unique on its own (SQL forces it to be unique "under the covers", of course, but that's irrelevant to the index create itself). Often a datetime is vastly more useful for lookup than an identity, and the fact that multiple key values can occur doesn't matter.
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
September 2, 2016 at 10:06 am
ScottPletcher (9/2/2016)
Sean Lange (9/2/2016)
ScottPletcher (9/2/2016)
A clustering index is enough to avoid having a heap, you don't have to have a primary key. In particular, don't slap an identity column on the table just to use it as the "primary key" if you have something like a datetime you can cluster on more usefully and naturally for that data.I totally agree with Scott on this. An identity just so you have a primary key is not a good approach in most situations. However, since in this case the OP stated they have lots of duplicates rows across all columns it seems that an identity is about the only way to make a primary key since none of the other columns would suffice.
True. But, again, you don't need a primary key necessarily, just a clustered index. And a CI doesn't have to be unique on its own (SQL forces it to be unique "under the covers", of course, but that's irrelevant to the index create itself). Often a datetime is vastly more useful for lookup than an identity, and the fact that multiple key values can occur doesn't matter.
I of course understand that part. My point is that the OP is talking about some new process they have which does updates to this and they wanted unique values. At this point we are speculating on what may or may not be the process the OP actually needs.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
September 15, 2016 at 11:37 pm
durga.palepu (9/1/2016)
Yes, its always a best practice to have a primary key on table to avoid the heap. It can be either simple identity column or any business key.Also as our experts suggested, its always suggested to have unique index on filtering column.
Thanks,
Durga Prasad.
Ah... you said "always". Are you a betting man? 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 14 posts - 1 through 13 (of 13 total)
You must be logged in to reply to this topic. Login to reply