Find whether a clustered index is actually unique

Question

Find whether a clustered index is actually unique

DamianC

SSCrazy Eights

Points: 8023
More actions
July 13, 2020 at 8:23 am

#3769336

Hi

I have a table that has a clustered index on it
The table does not have a primary key and the clustered index has not been defined as unique
There are approximately 3B rows in the table
It's been around for years and I'm wondering why neither a PK or a unique clustered index was set against it
Is there an easy way to interogate this table (say sysindexes or ...) such that I can see whether the columns that make up the clustered index are unique? Other than something like count distinct
I'm pretty certain they should be

Thanks

- Damian

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 1

Billions of rows over years and no constraint. I'd put good money down that there are non-unique values.

Easiest way I know to validate this would be to do two queries. One, COUNT, and the other COUNT DISTINCT. Compare those two values. That won't tell you what's duplicated, but it'll let you know whether or not your suspicions are correct.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

DamianC SSCrazy Eights Points: 8023 More actions · Answer 2

Thanks for the response Grant

Think I'll go with the COUNT and COUNT DISTINCT then.

Just wasn't sure whether there's a quicker option (bit like doing a count using sys.table and sys.partition vs count(*))

- Damian

Grant Fritchey SSC Guru Points: 398882 More actions · Answer 3

You can get some estimates on row counts from the statistics. There's also DMVs that will show estimates on the row counts. However, nothing will show if there's duplicate data except going and checking.

Another way to check, and I wouldn't recommend this, you could just try creating a constraint. However, with a table that big, you're likely to see some blocking & locking, so it might be a very bad choice. Just thought I'd pass it along regardless.

"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt

Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning

ScottPletcher SSC Guru Points: 101096 More actions · Answer 4

--if you just want to see if a dup key exists:
SELECT TOP (1) key_col1, key_col2, COUNT(*) - 1 AS duplicate_count
FROM dbo.table_name
GROUP BY key_col1, key_col2
HAVING COUNT(*) > 1

--if you want to see all dup keys, remove the "TOP (1)" clause

SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

leward SSC Rookie Points: 33 More actions · Answer 5

At a former employer, we had a number of very large tables that had absolutely no constraints on them, and occasionally wanted to clean up the duplicates. There were also not really any natural keys, and those duplicate rows would often be complete duplicates, i.e., every column value was the same across multiple rows. After several conversations about how to delete duplicated rows while leaving exactly one of them in place, we opted for creating a delete statement based on this sort of select:

;with cte as
( select FIRST_COLUMN, SECOND_COLUMN,
  row_number() over(partition by FIRST_COLUMN, SECOND_COLUMN order by FIRST_COLUMN, SECOND_COLUMN) instance
  from MyTable
)
select * from cte where instance > 1

This reply was modified 4 years, 4 months ago by leward.