July 13, 2020 at 8:23 am
Hi
I have a table that has a clustered index on it
The table does not have a primary key and the clustered index has not been defined as unique
There are approximately 3B rows in the table
It's been around for years and I'm wondering why neither a PK or a unique clustered index was set against it
Is there an easy way to interogate this table (say sysindexes or ...) such that I can see whether the columns that make up the clustered index are unique? Other than something like count distinct
I'm pretty certain they should be
Thanks
- Damian
July 13, 2020 at 11:17 am
Billions of rows over years and no constraint. I'd put good money down that there are non-unique values.
Easiest way I know to validate this would be to do two queries. One, COUNT, and the other COUNT DISTINCT. Compare those two values. That won't tell you what's duplicated, but it'll let you know whether or not your suspicions are correct.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 13, 2020 at 12:20 pm
Thanks for the response Grant
Think I'll go with the COUNT and COUNT DISTINCT then.
Just wasn't sure whether there's a quicker option (bit like doing a count using sys.table and sys.partition vs count(*))
- Damian
July 13, 2020 at 1:11 pm
You can get some estimates on row counts from the statistics. There's also DMVs that will show estimates on the row counts. However, nothing will show if there's duplicate data except going and checking.
Another way to check, and I wouldn't recommend this, you could just try creating a constraint. However, with a table that big, you're likely to see some blocking & locking, so it might be a very bad choice. Just thought I'd pass it along regardless.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 13, 2020 at 3:25 pm
--if you just want to see if a dup key exists:
SELECT TOP (1) key_col1, key_col2, COUNT(*) - 1 AS duplicate_count
FROM dbo.table_name
GROUP BY key_col1, key_col2
HAVING COUNT(*) > 1
--if you want to see all dup keys, remove the "TOP (1)" clause
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
July 20, 2020 at 2:52 pm
At a former employer, we had a number of very large tables that had absolutely no constraints on them, and occasionally wanted to clean up the duplicates. There were also not really any natural keys, and those duplicate rows would often be complete duplicates, i.e., every column value was the same across multiple rows. After several conversations about how to delete duplicated rows while leaving exactly one of them in place, we opted for creating a delete statement based on this sort of select:
;with cte as
( select FIRST_COLUMN, SECOND_COLUMN,
row_number() over(partition by FIRST_COLUMN, SECOND_COLUMN order by FIRST_COLUMN, SECOND_COLUMN) instance
from MyTable
)
select * from cte where instance > 1
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply