Selecting entries with non-distinct ID

  • I'm thinking there's a better way to write this query. Any ideas?

    IF OBJECT_ID('TempDB..#persons', 'U') IS NOT NULL

    DROP TABLE #persons

    GO

    CREATE TABLE #persons

    (

    person_identity INT ,

    person_id CHAR(10) ,

    person_data VARCHAR(20)

    )

    INSERT INTO #persons

    ( person_identity ,

    person_id ,

    person_data

    )

    SELECT 1 ,

    '123' ,

    'abcd'

    UNION ALL

    SELECT 1 ,

    '123' ,

    'xyz'

    UNION ALL

    SELECT 2 ,

    '987' ,

    'xyz'

    UNION ALL

    SELECT 2 ,

    '545' ,

    'nnn'

    UNION ALL

    SELECT 3 ,

    'MTN' ,

    'nothing'

    UNION ALL

    SELECT 3 ,

    '269' ,

    'nbdfjkl'

    UNION ALL

    SELECT 3 ,

    '76489' ,

    NULL

    UNION ALL

    SELECT 4 ,

    '8B937' ,

    'hello' ;

    WITH PersonDistinct

    AS ( SELECT pd.person_identity ,

    ROW_NUMBER() OVER ( PARTITION BY pd.person_identity ORDER BY pd.person_id ) AS RowNum

    FROM ( SELECT DISTINCT

    person_identity ,

    person_id

    FROM #persons AS p WITH ( NOLOCK )

    ) AS pd

    )

    SELECT person_identity ,

    RowNum

    FROM PersonDistinct

    WHERE RowNum = 2

    ORDER BY person_identity ,

    RowNum

    Person_identity is an FK to another table existing in a 1-many relationship with this table. Each person_identity should have only one person_id value associated with it. The difference between rows is the additional data (obviously much more/different than what I have posted in the sample - no realistic way to use these for testing anything here). So I need to identify the person_identities that have more than one person_id associated with them. From the above sample, these are the results I want:

    person_identityRowNum

    22

    32

    The query I've written works, but I can't help but suspect there's a better way to write this.

    (and no, I did NOT design this schema...)

  • Pam,

    Actually, I think you wrote a pretty good query.

    Wayne
    Microsoft Certified Master: SQL Server 2008
    Author - SQL Server T-SQL Recipes


    If you can't explain to another person how the code that you're copying from the internet works, then DON'T USE IT on a production system! After all, you will be the one supporting it!
    Links:
    For better assistance in answering your questions
    Performance Problems
    Common date/time routines
    Understanding and Using APPLY Part 1 & Part 2

  • I like to write it this way:

    SELECT

    person_identity,

    COUNT(*) AS Num

    FROM

    (

    SELECT

    person_identity,

    person_id

    FROM

    #persons

    GROUP BY

    person_identity,

    person_id

    ) AS PersonDistinct

    GROUP BY

    person_identity

    HAVING

    COUNT(*) > 1;

    But it does not seem to perform any better, query plans are almost the same.

  • Thanks, Wayne. I keep wanting simpler but I don't think it's possible. Mostly wanted a sanity check.

    Maxim, I thought about going that way as well but I like playing with the new toys :). Good to know they're performing about the same. Makes sense.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply