January 22, 2015 at 8:22 am
I have a table containing the following data:
LinkingIDID1 ID2
166202180659253178
166202253178180659
166334180380253179
166334253179180380
166342180380180659
166342180659180380
166582253179258643
166582258643253179
264052258642258643
264052258643258642
264502258643258663
264502258643259562
Within the LinkingID, there are duplicates in ID1 and ID2 but just in opposite columns. I have been trying to figure out a way to remove these set based. It doesn't matter which duplicate is removed. Essentially these are just endpoints and I don't care which side they are on. The solution must recognize the duplicates and not just remove based on every 2nd row. It seems that this should be very simple, but I can't figure it out. Any help is greatly appreciated.
January 22, 2015 at 8:36 am
SELECT
LinkingID,
ID1,
ID2,
rn = ROW_NUMBER() OVER (PARTITION BY LinkingID, IDMAX, IDMIN ORDER BY (SELECT NULL))
FROM #Sample s
CROSS APPLY (
SELECT
IDMAX = CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,
IDMIN = CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END
) x
-- sample data:
CREATE TABLE #Sample (LinkingID INT, ID1 INT, ID2 INT)
INSERT INTO #Sample (LinkingID, ID1, ID2) VALUES
(166202, 180659, 253178),
(166202, 253178, 180659),
(166334, 180380, 253179),
(166334, 253179, 180380),
(166342, 180380, 180659),
(166342, 180659, 180380),
(166582, 253179, 258643),
(166582, 258643, 253179),
(264052, 258642, 258643),
(264052, 258643, 258642),
(264502, 258643, 258663),
(264502, 258643, 259562)
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 22, 2015 at 8:53 am
This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.
January 22, 2015 at 9:14 am
keith.west-982594 (1/22/2015)
This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.
You're welcome. You don't have to use APPLY:
WITH Swapper AS (
SELECT *,
IDMAX = CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,
IDMIN = CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END
FROM #Sample
)
SELECT
LinkingID,
ID1,
ID2,
rn = ROW_NUMBER() OVER (PARTITION BY LinkingID, IDMAX, IDMIN ORDER BY (SELECT NULL))
FROM Swapper;
SELECT
LinkingID,
ID1,
ID2,
rn = ROW_NUMBER() OVER (PARTITION BY LinkingID,
CASE WHEN ID1 > ID2 THEN ID1 ELSE ID2 END,
CASE WHEN ID1 > ID2 THEN ID2 ELSE ID1 END
ORDER BY (SELECT NULL))
FROM #Sample;
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
January 22, 2015 at 9:27 am
keith.west-982594 (1/22/2015)
This solution worked perfectly. Thanks for your help. I definitely need to study the CROSS APPLY feature more.
Take a look at the two articles in my signature from Paul White. They provide an excellent look at APPLY.
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
January 22, 2015 at 2:34 pm
Removed bad solution
😎
January 22, 2015 at 5:49 pm
Slightly different take, using ChrisM's sample table.
SELECT s.LinkingID, MaxID
FROM #Sample AS s
CROSS APPLY (SELECT TOP 1 [MaxID]
FROM (VALUES(s.ID1),
(s.ID2) )x([MaxID])
ORDER BY MaxID DESC) y
GROUP BY s.LinkingID, y.MaxID
January 22, 2015 at 10:22 pm
For the long haul, I'd suggest modifying the code to always put the smaller of the two IDs in ID1, a check constraint on ID1 that enforces ID1 <= ID2, and then a unique constraint across the 3 IDs. Of course, there should be NOT NULL constraints on all 3 columns.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 25, 2015 at 5:01 pm
Just for fun, using the SQL 2012 LAG function, this query matches ChrisM's first result:
SELECT LinkingID, ID1, ID2
,rn=CASE WHEN ID3=ID2 AND ID4=ID1 THEN 2 ELSE 1 END
FROM
(
SELECT LinkingID, ID1, ID2
,ID3=LAG(ID1) OVER (PARTITION BY LinkingID ORDER BY ID1)
,ID4=LAG(ID2) OVER (PARTITION BY LinkingID ORDER BY ID1)
FROM #Sample
) a;
Using his sample data of course.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply