Finding Duplicates Across Columns

Question

Post reply

Finding Duplicates Across Columns

dec_obrien

Default port

Points: 1486
More actions
November 8, 2016 at 3:47 am

#318774

Hi,
I have a table that contains animal records. I need to construct a "family tree" query that returns the various generations which is fine. I then want to highlight if the same animal appears more than once in the tree i.e. in-breeding.
Here is a scaled-down version of the table:
CREATE TABLE [dbo].[tblHierarchy](
[ID] [int] IDENTITY(1,1) NOT NULL,
[ID_Father] [int] NULL,
[ID_Mother] [int] NULL,
CONSTRAINT [PK_tblHierarchy] PRIMARY KEY CLUSTERED
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]
GO
I then create my tree query:
SELECTtblHierarchy.ID
,Gen1Father.ID AS ID1_Father
,Gen2F_Father.ID AS ID2F_Father
,Gen2F_Mother.ID AS ID2F_Mother
,Gen1Mother.ID AS ID1_Mother
,Gen2M_Father.ID AS ID2M_Father
,Gen2M_Mother.ID AS ID2M_Mother
FROM tblHierarchy AS Gen2M_Father RIGHT OUTER JOIN
tblHierarchy AS Gen2M_Mother RIGHT OUTER JOIN
tblHierarchy AS Gen1Mother ON Gen2M_Mother.ID = Gen1Mother.ID_Mother ON Gen2M_Father.ID = Gen1Mother.ID_Father RIGHT OUTER JOIN
tblHierarchy ON Gen1Mother.ID = tblHierarchy.ID_Mother LEFT OUTER JOIN
tblHierarchy AS Gen2F_Mother RIGHT OUTER JOIN
tblHierarchy AS Gen1Father ON Gen2F_Mother.ID = Gen1Father.ID_Mother LEFT OUTER JOIN
tblHierarchy AS Gen2F_Father ON Gen1Father.ID_Father = Gen2F_Father.ID ON tblHierarchy.ID_Father = Gen1Father.ID
Is there any easy way of identifying duplicates among the IDs? The idea for me would be to add an extra bit column per animal and putting in 1 of it's duplicated.
Any ideas?

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 1

Quick suggestion, unpivot the rows and apply the row_number with a partition on the ID

😎

USE TEEST;

GO

SET NOCOUNT ON;

IF OBJECT_ID(N'dbo.tblHierarchy') IS NOT NULL DROP TABLE dbo.tblHierarchy;

CREATE TABLE [dbo].[tblHierarchy](

[ID] [int] IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED ,

[ID_Father] [int] NULL,

[ID_Mother] [int] NULL

)

INSERT INTO dbo.tblHierarchy(ID_Father,ID_Mother)

VALUES ( 1, 2)

,( 3, 3)

,( 4, 5)

;

;WITH BASE_DATA AS

(

SELECT

TH.ID

,X.HID

,ROW_NUMBER() OVER

(

PARTITION BY X.HID

ORDER BY TH.ID

) AS RID

FROM dbo.tblHierarchy TH

CROSS APPLY

(

SELECT TH.ID_Father UNION ALL

SELECT TH.ID_Mother

) X(HID)

)

SELECT

BD.ID

,BD.HID

FROM BASE_DATA BD

WHERE BD.RID > 1;

Output

ID HID

----------- -----------

2 3

dec_obrien Default port Points: 1486 More actions · Answer 2

Hi,

Yes, that works thanks. I think I need to find out why though: I haven't used CROSS APPLY before.

Eirikur Eiriksson SSC Guru Points: 182915 More actions · Answer 3

dec_obrien (11/8/2016)
Hi,
Yes, that works thanks. I think I need to find out why though: I haven't used CROSS APPLY before.

You are very welcome.

😎

Here is a brilliant article on the subject, Understanding and Using APPLY By Paul White[/url]

Jeff Moden SSC Guru Points: 1004405 More actions · Answer 4

dec_obrien (11/8/2016)
Hi,
Yes, that works thanks. I think I need to find out why though: I haven't used CROSS APPLY before.

It's basically the same thing as a correlated subquery in the SELECT list except that it can return more than one row for each row in the base query.

Also, here's two really good articles on CROSS APPLY and OUTER APPLY.

http://www.sqlservercentral.com/articles/APPLY/69953/

http://www.sqlservercentral.com/articles/APPLY/69954/

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

dec_obrien Default port Points: 1486 More actions · Answer 5

dec_obrien

Default port

Points: 1486

November 9, 2016 at 2:08 am

#1911342

Thanks: every day is a school day!