Remove Duplicates in opposite columns

Question

Post reply

Remove Duplicates in opposite columns

keith.west-982594

SSC Eights!

Points: 965
More actions
January 22, 2015 at 8:22 am

#294639

I have a table containing the following data:
LinkingIDID1 ID2
166202180659253178
166202253178180659
166334180380253179
166334253179180380
166342180380180659
166342180659180380
166582253179258643
166582258643253179
264052258642258643
264052258643258642
264502258643258663
264502258643259562
Within the LinkingID, there are duplicates in ID1 and ID2 but just in opposite columns. I have been trying to figure out a way to remove these set based. It doesn't matter which duplicate is removed. Essentially these are just endpoints and I don't care which side they are on. The solution must recognize the duplicates and not just remove based on every 2nd row. It seems that this should be very simple, but I can't figure it out. Any help is greatly appreciated.

Viewing 9 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 1

SELECT

LinkingID,

ID1,

ID2,

rn = ROW_NUMBER() OVER (PARTITION BY LinkingID, IDMAX, IDMIN ORDER BY (SELECT NULL))

FROM #Sample s

CROSS APPLY (

SELECT

IDMAX = CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,

IDMIN = CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END

) x

-- sample data:

CREATE TABLE #Sample (LinkingID INT, ID1 INT, ID2 INT)

INSERT INTO #Sample (LinkingID, ID1, ID2) VALUES

(166202, 180659, 253178),

(166202, 253178, 180659),

(166334, 180380, 253179),

(166334, 253179, 180380),

(166342, 180380, 180659),

(166342, 180659, 180380),

(166582, 253179, 258643),

(166582, 258643, 253179),

(264052, 258642, 258643),

(264052, 258643, 258642),

(264502, 258643, 258663),

(264502, 258643, 259562)

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

keith.west-982594 SSC Eights! Points: 965 More actions · Answer 2

This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 3

keith.west-982594 (1/22/2015)
This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.

You're welcome. You don't have to use APPLY:

WITH Swapper AS (

SELECT *,

IDMAX = CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,

IDMIN = CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END

FROM #Sample

)

SELECT

LinkingID,

ID1,

ID2,

rn = ROW_NUMBER() OVER (PARTITION BY LinkingID, IDMAX, IDMIN ORDER BY (SELECT NULL))

FROM Swapper;

SELECT

LinkingID,

ID1,

ID2,

rn = ROW_NUMBER() OVER (PARTITION BY LinkingID,

CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,

CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END

ORDER BY (SELECT NULL))

FROM #Sample;

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Sean Lange SSC Guru Points: 286609 More actions · Answer 4

keith.west-982594 (1/22/2015)
This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.

Take a look at the two articles in my signature from Paul White. They provide an excellent look at APPLY.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Eirikur Eiriksson SSC Guru Points: 182951 More actions · Answer 5

Eirikur Eiriksson

SSC Guru

Points: 182951

January 22, 2015 at 2:34 pm

#1772410

Removed bad solution

😎

sqldriver SSChampion Points: 10440 More actions · Answer 6

Slightly different take, using ChrisM's sample table.

SELECT s.LinkingID, MaxID

FROM #Sample AS s

CROSS APPLY (SELECT TOP 1 [MaxID]

FROM (VALUES(s.ID1),

(s.ID2) )x([MaxID])

ORDER BY MaxID DESC) y

GROUP BY s.LinkingID, y.MaxID

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 7

For the long haul, I'd suggest modifying the code to always put the smaller of the two IDs in ID1, a check constraint on ID1 that enforces ID1 <= ID2, and then a unique constraint across the 3 IDs. Of course, there should be NOT NULL constraints on all 3 columns.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 8

Just for fun, using the SQL 2012 LAG function, this query matches ChrisM's first result:

SELECT LinkingID, ID1, ID2

,rn=CASE WHEN ID3=ID2 AND ID4=ID1 THEN 2 ELSE 1 END

FROM

(

SELECT LinkingID, ID1, ID2

,ID3=LAG(ID1) OVER (PARTITION BY LinkingID ORDER BY ID1)

,ID4=LAG(ID2) OVER (PARTITION BY LinkingID ORDER BY ID1)

FROM #Sample

) a;

Using his sample data of course.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St