Finding Duplicates Across Columns

  • Hi,

    I have a table that contains animal records. I need to construct a "family tree" query that returns the various generations which is fine. I then want to highlight if the same animal appears more than once in the tree i.e. in-breeding.

    Here is a scaled-down version of the table:

    CREATE TABLE [dbo].[tblHierarchy](

    [ID] [int] IDENTITY(1,1) NOT NULL,

    [ID_Father] [int] NULL,

    [ID_Mother] [int] NULL,

    CONSTRAINT [PK_tblHierarchy] PRIMARY KEY CLUSTERED

    (

    [ID] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    ) ON [PRIMARY]

    GO

    I then create my tree query:

    SELECTtblHierarchy.ID

    ,Gen1Father.ID AS ID1_Father

    ,Gen2F_Father.ID AS ID2F_Father

    ,Gen2F_Mother.ID AS ID2F_Mother

    ,Gen1Mother.ID AS ID1_Mother

    ,Gen2M_Father.ID AS ID2M_Father

    ,Gen2M_Mother.ID AS ID2M_Mother

    FROM tblHierarchy AS Gen2M_Father RIGHT OUTER JOIN

    tblHierarchy AS Gen2M_Mother RIGHT OUTER JOIN

    tblHierarchy AS Gen1Mother ON Gen2M_Mother.ID = Gen1Mother.ID_Mother ON Gen2M_Father.ID = Gen1Mother.ID_Father RIGHT OUTER JOIN

    tblHierarchy ON Gen1Mother.ID = tblHierarchy.ID_Mother LEFT OUTER JOIN

    tblHierarchy AS Gen2F_Mother RIGHT OUTER JOIN

    tblHierarchy AS Gen1Father ON Gen2F_Mother.ID = Gen1Father.ID_Mother LEFT OUTER JOIN

    tblHierarchy AS Gen2F_Father ON Gen1Father.ID_Father = Gen2F_Father.ID ON tblHierarchy.ID_Father = Gen1Father.ID

    Is there any easy way of identifying duplicates among the IDs? The idea for me would be to add an extra bit column per animal and putting in 1 of it's duplicated.

    Any ideas?

  • Quick suggestion, unpivot the rows and apply the row_number with a partition on the ID

    😎

    USE TEEST;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'dbo.tblHierarchy') IS NOT NULL DROP TABLE dbo.tblHierarchy;

    CREATE TABLE [dbo].[tblHierarchy](

    [ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED ,

    [ID_Father] [int] NULL,

    [ID_Mother] [int] NULL

    )

    INSERT INTO dbo.tblHierarchy(ID_Father,ID_Mother)

    VALUES ( 1, 2)

    ,( 3, 3)

    ,( 4, 5)

    ;

    ;WITH BASE_DATA AS

    (

    SELECT

    TH.ID

    ,X.HID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY X.HID

    ORDER BY TH.ID

    ) AS RID

    FROM dbo.tblHierarchy TH

    CROSS APPLY

    (

    SELECT TH.ID_Father UNION ALL

    SELECT TH.ID_Mother

    ) X(HID)

    )

    SELECT

    BD.ID

    ,BD.HID

    FROM BASE_DATA BD

    WHERE BD.RID > 1;

    Output

    ID HID

    ----------- -----------

    2 3

  • Hi,

    Yes, that works thanks. I think I need to find out why though: I haven't used CROSS APPLY before.

  • dec_obrien (11/8/2016)


    Hi,

    Yes, that works thanks. I think I need to find out why though: I haven't used CROSS APPLY before.

    You are very welcome.

    😎

    Here is a brilliant article on the subject, Understanding and Using APPLY By Paul White[/url]

  • dec_obrien (11/8/2016)


    Hi,

    Yes, that works thanks. I think I need to find out why though: I haven't used CROSS APPLY before.

    It's basically the same thing as a correlated subquery in the SELECT list except that it can return more than one row for each row in the base query.

    Also, here's two really good articles on CROSS APPLY and OUTER APPLY.

    http://www.sqlservercentral.com/articles/APPLY/69953/

    http://www.sqlservercentral.com/articles/APPLY/69954/

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks: every day is a school day!

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply