Find whether a clustered index is actually unique

  • Hi

     

    I have a table that has a clustered index on it

    The table does not have a primary key and the clustered index has not been defined as unique

    There are approximately 3B rows in the table

    It's been around for years and I'm wondering why neither a PK or a unique clustered index was set against it

    Is there an easy way to interogate this table (say sysindexes or ...) such that I can see whether the columns that make up the clustered index are unique? Other than something like count distinct

    I'm pretty certain they should be

     

    Thanks

     

    - Damian

  • Billions of rows over years and no constraint. I'd put good money down that there are non-unique values.

    Easiest way I know to validate this would be to do two queries. One, COUNT, and the other COUNT DISTINCT. Compare those two values. That won't tell you what's duplicated, but it'll let you know whether or not your suspicions are correct.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Thanks for the response Grant

    Think I'll go with the COUNT and COUNT DISTINCT then.

    Just wasn't sure whether there's a quicker option (bit like doing a count using sys.table and sys.partition vs count(*))

     

    - Damian

  • You can get some estimates on row counts from the statistics. There's also DMVs that will show estimates on the row counts. However, nothing will show if there's duplicate data except going and checking.

    Another way to check, and I wouldn't recommend this, you could just try creating a constraint. However, with a table that big, you're likely to see some blocking & locking, so it might be a very bad choice. Just thought I'd pass it along regardless.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • --if you just want to see if a dup key exists:
    SELECT TOP (1) key_col1, key_col2, COUNT(*) - 1 AS duplicate_count
    FROM dbo.table_name
    GROUP BY key_col1, key_col2
    HAVING COUNT(*) > 1

    --if you want to see all dup keys, remove the "TOP (1)" clause

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • At a former employer, we had a number of very large tables that had absolutely no constraints on them, and occasionally wanted to clean up the duplicates.  There were also not really any natural keys, and those duplicate rows would often be complete duplicates, i.e., every column value was the same across multiple rows.  After several conversations about how to delete duplicated rows while leaving exactly one of them in place, we opted for creating a delete statement based on this sort of select:

    ;with cte as
    ( select FIRST_COLUMN, SECOND_COLUMN,
    row_number() over(partition by FIRST_COLUMN, SECOND_COLUMN order by FIRST_COLUMN, SECOND_COLUMN) instance
    from MyTable
    )
    select * from cte where instance > 1

    • This reply was modified 4 years, 5 months ago by  leward.

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply