May 16, 2013 at 1:23 am
Cadavre (5/15/2013)
ChrisM@Work (5/15/2013)
dwain.c (5/15/2013)
Sorry to disappoint you guys but I wasn't able to come up with anything.My gut was telling me an rCTE wouldn't do it and the set-based loop I tried wouldn't resolve it out properly either.
Of course, I could be wrong as I'm a bit out of practice writing rCTEs! π
I'm not surprised, Dwain. It's quite tricky. The more obvious methods like MAX() OVER() won't work when you expect them to, in the recursive part.
I haven't tried, as I said I'm pretty busy, but I thought a combination of a MAX() OVER() and a recursive CTE was going to do the business. Ah well.
Thanks for taking a look Dwain, be nice to see where your efforts took you.
There are at least two ways of approaching this problem, number chains and sets. I think the methods reviewed so far have employed number chains, so I had a play with sets.
Allocate each number pair a set identity, so the first row (ordered by whatever) has set identity = 1. Wherever either of the two members of set 1 appear elsewhere in the table, change the set identity for the pair to 1. Run again: wherever any of the (now many) members of set 1 appear elsewhere in the table in a different set, change the set id to 1. Rinse and repeat.
Here's the first attempt:
--DROP TABLE #Temp;
-- preprocess the sample data: there isn't a dupe and a master,
-- they're simply members of a set of two.
WITH SequencedData AS (
SELECT *, SetNo = ROW_NUMBER() OVER (ORDER BY Duplicate_ID, Master_ID)
FROM SCC
)
SELECT SetNo, Duplicate_ID
INTO #Temp
FROM SequencedData
UNION ALL
SELECT SetNo, Master_ID
FROM SequencedData;
-- (102 row(s) affected)
---------------------------------------------------------------------------
-- three different ways of performing the subquery for the update.
-- three updates have to be executed with the sample data to complete.
---------------------------------------------------------------------------
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT d.SetNo, newset = MIN(newset)
FROM (
SELECT Duplicate_ID, SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
GROUP BY d.SetNo
HAVING MIN(newset) <> d.SetNo
) c ON c.SetNo = t.SetNo;
-- (96 row(s) affected)
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT q.SetNo, q.newset
FROM (
SELECT
d.SetNo, d.newset, rn = ROW_NUMBER() OVER(PARTITION BY d.SetNo ORDER BY d.newset)
FROM (
SELECT SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
) q
WHERE rn = 1
AND q.SetNo <> q.newset
) c ON c.SetNo = t.SetNo;
-- (66 row(s) affected)
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT newset = MIN(t2.SetNo), t1.SetNo
FROM #Temp t1
INNER JOIN #Temp t2
ON t2.Duplicate_ID = t1.Duplicate_ID
AND t2.SetNo < t1.SetNo
GROUP BY t1.SetNo
) c ON c.SetNo = t.SetNo;
-- (2 row(s) affected)
---------------------------------------------------------------------------
-- Final result set
---------------------------------------------------------------------------
SELECT t.Duplicate, x.Duplicate_ID
FROM (
SELECT SetNo, Duplicate = MIN(Duplicate_ID)
FROM #Temp
GROUP BY SetNo
) t
CROSS APPLY (
SELECT DISTINCT *
FROM #Temp ti
WHERE ti.SetNo = t.SetNo
AND ti.Duplicate_ID <> t.Duplicate
) x;
Note that the subquery code for the three updates shown is logically the same and generates the same results with the sample data - I was playing with the subqueries in an attempt to find something which might be rCTE-compatible.
Three updates have to be performed, a fourth does no work. The results are the same as Abu Dina's and it's pretty darned quick.
I haven't yet discovered a rCTE to do the same process because you can't perform aggregations on the recursive part of a rCTE, which this process requires. You can chain ordinary CTE's though, and get the whole caboodle working in a single query like this:
;WITH DataSets AS (
SELECT
SetID = ROW_NUMBER() OVER(ORDER BY Master_ID, Duplicate_ID),
Master_ID, Duplicate_ID
FROM scc
),
FirstPass AS (
SELECT SetID = ISNULL(x.NewSetID, d.SetID), d.Master_ID, d.Duplicate_ID,
RowAffected = 0 + CASE WHEN x.NewSetID IS NOT NULL THEN 1 ELSE 0 END --
FROM DataSets d
OUTER APPLY ( -- find the lowest SetID that either number belongs to
SELECT NewSetID = MIN(di.SetID)
FROM DataSets di
WHERE di.SetID < d.SetID
AND (di.Master_ID IN (d.Master_ID, d.Duplicate_ID) OR di.Duplicate_ID IN (d.Master_ID, d.Duplicate_ID))
) x
),
SecondPass AS (
SELECT SetID = ISNULL(x.NewSetID, d.SetID), d.Master_ID, d.Duplicate_ID,
RowAffected = RowAffected + CASE WHEN x.NewSetID IS NOT NULL THEN 1 ELSE 0 END
FROM FirstPass d
OUTER APPLY (
SELECT NewSetID = MIN(di.SetID)
FROM FirstPass di
WHERE di.SetID < d.SetID
AND (di.Master_ID IN (d.Master_ID, d.Duplicate_ID) OR di.Duplicate_ID IN (d.Master_ID, d.Duplicate_ID))
) x
),
ThirdPass AS (
SELECT SetID = ISNULL(x.NewSetID, d.SetID), d.Master_ID, d.Duplicate_ID,
RowAffected = RowAffected + CASE WHEN x.NewSetID IS NOT NULL THEN 1 ELSE 0 END
FROM SecondPass d
OUTER APPLY (
SELECT NewSetID = MIN(di.SetID)
FROM SecondPass di
WHERE di.SetID < d.SetID
AND (di.Master_ID IN (d.Master_ID, d.Duplicate_ID) OR di.Duplicate_ID IN (d.Master_ID, d.Duplicate_ID))
) x
),
TweakedResults AS (
SELECT
SetID, Duplicate_ID, rn = ROW_NUMBER() OVER (PARTITION BY SetID ORDER BY Duplicate_ID)
FROM (
SELECT SetID, Duplicate_ID
FROM ThirdPass
UNION -- eliminate dupes
SELECT SetID, Master_ID
FROM ThirdPass
) d
)
SELECT
retained_id = t.Duplicate_ID,
dropped_id = x.Duplicate_ID
FROM TweakedResults t
CROSS APPLY (SELECT Duplicate_ID FROM TweakedResults ti WHERE ti.SetID = t.SetID AND ti.rn>1) x
WHERE t.rn = 1
- but the performance doesn't exactly shine. Have a look at the plan and you see why - the source table is read something like 8 times π
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
May 16, 2013 at 3:03 am
ChrisM@Work (5/16/2013)
... the source table is read something like 8 times π
Only 8 times? I'd be surprised if my contribution is less than that.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 24, 2013 at 8:47 am
This is still giving me problems! :unsure:
DECLARE @pairs TABLE (dropped_id INT, retained_id INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665
gives back: (using my method, Dwain's and chriM's)
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
June 24, 2013 at 8:59 am
Abu Dina (6/24/2013)
This is still giving me problems! :unsure:
DECLARE @pairs TABLE (dropped_id INT, retained_id INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665
gives back: (using my method, Dwain's and chriM's)
The last solution I posted uses a series of updates. Choose one of them and run it four times, something like this:
DECLARE @pairs TABLE (Duplicate_ID INT, Master_ID INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665;
-- preprocess the sample data: there isn't a dupe and a master,
-- they're simply members of a set of two.
IF object_id('TempDB..#Temp') IS NOT NULL DROP TABLE #Temp;
WITH SequencedData AS (
SELECT *, SetNo = ROW_NUMBER() OVER (ORDER BY Duplicate_ID, Master_ID)
FROM @pairs
)
SELECT SetNo, Duplicate_ID
INTO #Temp
FROM SequencedData
UNION ALL
SELECT SetNo, Master_ID
FROM SequencedData;
-- (102 row(s) affected)
---------------------------------------------------------------------------
-- FOUR updates have to be executed with the sample data to complete.
---------------------------------------------------------------------------
DECLARE @RowsUpdated INT
SET @RowsUpdated = 1
WHILE @RowsUpdated > 0
BEGIN
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT d.SetNo, newset = MIN(newset)
FROM (
SELECT Duplicate_ID, SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
GROUP BY d.SetNo
HAVING MIN(newset) <> d.SetNo
) c ON c.SetNo = t.SetNo;
SET @RowsUpdated = @@rowcount
END
---------------------------------------------------------------------------
-- Final result set
---------------------------------------------------------------------------
SELECT t.Duplicate, x.Duplicate_ID
FROM (
SELECT SetNo, Duplicate = MIN(Duplicate_ID)
FROM #Temp
GROUP BY SetNo
) t
CROSS APPLY (
SELECT DISTINCT *
FROM #Temp ti
WHERE ti.SetNo = t.SetNo
AND ti.Duplicate_ID <> t.Duplicate
) x;
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 24, 2013 at 9:33 am
ChrisM@Work (6/24/2013)
Abu Dina (6/24/2013)
This is still giving me problems! :unsure:
DECLARE @pairs TABLE (dropped_id INT, retained_id INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665
gives back: (using my method, Dwain's and chriM's)
The last solution I posted uses a series of updates. Choose one of them and run it four times, something like this:
DECLARE @pairs TABLE (Duplicate_ID INT, Master_ID INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665;
-- preprocess the sample data: there isn't a dupe and a master,
-- they're simply members of a set of two.
IF object_id('TempDB..#Temp') IS NOT NULL DROP TABLE #Temp;
WITH SequencedData AS (
SELECT *, SetNo = ROW_NUMBER() OVER (ORDER BY Duplicate_ID, Master_ID)
FROM @pairs
)
SELECT SetNo, Duplicate_ID
INTO #Temp
FROM SequencedData
UNION ALL
SELECT SetNo, Master_ID
FROM SequencedData;
-- (102 row(s) affected)
---------------------------------------------------------------------------
-- FOUR updates have to be executed with the sample data to complete.
---------------------------------------------------------------------------
DECLARE @RowsUpdated INT
SET @RowsUpdated = 1
WHILE @RowsUpdated > 0
BEGIN
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT d.SetNo, newset = MIN(newset)
FROM (
SELECT Duplicate_ID, SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
GROUP BY d.SetNo
HAVING MIN(newset) <> d.SetNo
) c ON c.SetNo = t.SetNo;
SET @RowsUpdated = @@rowcount
END
---------------------------------------------------------------------------
-- Final result set
---------------------------------------------------------------------------
SELECT t.Duplicate, x.Duplicate_ID
FROM (
SELECT SetNo, Duplicate = MIN(Duplicate_ID)
FROM #Temp
GROUP BY SetNo
) t
CROSS APPLY (
SELECT DISTINCT *
FROM #Temp ti
WHERE ti.SetNo = t.SetNo
AND ti.Duplicate_ID <> t.Duplicate
) x;
Huh... I must have screwed up the code when I added it to the sproc. I swear it wasn't working before!
Anyhoo, superb work Chris. Thanks so much! That's another drink I owe you π
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
June 25, 2013 at 2:09 am
Abu Dina (6/24/2013)
ChrisM@Work (6/24/2013)
Abu Dina (6/24/2013)
This is still giving me problems! :unsure:
DECLARE @pairs TABLE (dropped_id INT, retained_id INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665
gives back: (using my method, Dwain's and chriM's)
The last solution I posted uses a series of updates. Choose one of them and run it four times, something like this:
DECLARE @pairs TABLE (Duplicate_ID INT, Master_ID INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665;
-- preprocess the sample data: there isn't a dupe and a master,
-- they're simply members of a set of two.
IF object_id('TempDB..#Temp') IS NOT NULL DROP TABLE #Temp;
WITH SequencedData AS (
SELECT *, SetNo = ROW_NUMBER() OVER (ORDER BY Duplicate_ID, Master_ID)
FROM @pairs
)
SELECT SetNo, Duplicate_ID
INTO #Temp
FROM SequencedData
UNION ALL
SELECT SetNo, Master_ID
FROM SequencedData;
-- (102 row(s) affected)
---------------------------------------------------------------------------
-- FOUR updates have to be executed with the sample data to complete.
---------------------------------------------------------------------------
DECLARE @RowsUpdated INT
SET @RowsUpdated = 1
WHILE @RowsUpdated > 0
BEGIN
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT d.SetNo, newset = MIN(newset)
FROM (
SELECT Duplicate_ID, SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
GROUP BY d.SetNo
HAVING MIN(newset) <> d.SetNo
) c ON c.SetNo = t.SetNo;
SET @RowsUpdated = @@rowcount
END
---------------------------------------------------------------------------
-- Final result set
---------------------------------------------------------------------------
SELECT t.Duplicate, x.Duplicate_ID
FROM (
SELECT SetNo, Duplicate = MIN(Duplicate_ID)
FROM #Temp
GROUP BY SetNo
) t
CROSS APPLY (
SELECT DISTINCT *
FROM #Temp ti
WHERE ti.SetNo = t.SetNo
AND ti.Duplicate_ID <> t.Duplicate
) x;
Huh... I must have screwed up the code when I added it to the sproc. I swear it wasn't working before!
Anyhoo, superb work Chris. Thanks so much! That's another drink I owe you π
Thanks for the feedback, Abu Dina. Are there enough beers for a pi$$up yet? π
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
June 25, 2013 at 8:11 am
ChrisM@Work (6/25/2013)
Abu Dina (6/24/2013)
ChrisM@Work (6/24/2013)
Abu Dina (6/24/2013)
This is still giving me problems! :unsure:
DECLARE @pairs TABLE (dropped_id INT, retained_id INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665
gives back: (using my method, Dwain's and chriM's)
The last solution I posted uses a series of updates. Choose one of them and run it four times, something like this:
DECLARE @pairs TABLE (Duplicate_ID INT, Master_ID INT)
INSERT INTO @pairs
SELECT 16492, 23011 UNION ALL
SELECT 23011,24014 UNION ALL
SELECT 23011,24951 UNION ALL
SELECT 16492,24951 UNION ALL
SELECT 16492,29381 UNION ALL
SELECT 24951,29381 UNION ALL
SELECT 23011,29381 UNION ALL
SELECT 24014,66665 UNION ALL
SELECT 3344,66665;
-- preprocess the sample data: there isn't a dupe and a master,
-- they're simply members of a set of two.
IF object_id('TempDB..#Temp') IS NOT NULL DROP TABLE #Temp;
WITH SequencedData AS (
SELECT *, SetNo = ROW_NUMBER() OVER (ORDER BY Duplicate_ID, Master_ID)
FROM @pairs
)
SELECT SetNo, Duplicate_ID
INTO #Temp
FROM SequencedData
UNION ALL
SELECT SetNo, Master_ID
FROM SequencedData;
-- (102 row(s) affected)
---------------------------------------------------------------------------
-- FOUR updates have to be executed with the sample data to complete.
---------------------------------------------------------------------------
DECLARE @RowsUpdated INT
SET @RowsUpdated = 1
WHILE @RowsUpdated > 0
BEGIN
UPDATE t SET SetNo = c.newset
FROM #Temp t
INNER JOIN (
SELECT d.SetNo, newset = MIN(newset)
FROM (
SELECT Duplicate_ID, SetNo, newset = MIN(SetNo) OVER (PARTITION BY Duplicate_ID)
FROM #Temp
) d
GROUP BY d.SetNo
HAVING MIN(newset) <> d.SetNo
) c ON c.SetNo = t.SetNo;
SET @RowsUpdated = @@rowcount
END
---------------------------------------------------------------------------
-- Final result set
---------------------------------------------------------------------------
SELECT t.Duplicate, x.Duplicate_ID
FROM (
SELECT SetNo, Duplicate = MIN(Duplicate_ID)
FROM #Temp
GROUP BY SetNo
) t
CROSS APPLY (
SELECT DISTINCT *
FROM #Temp ti
WHERE ti.SetNo = t.SetNo
AND ti.Duplicate_ID <> t.Duplicate
) x;
Huh... I must have screwed up the code when I added it to the sproc. I swear it wasn't working before!
Anyhoo, superb work Chris. Thanks so much! That's another drink I owe you π
Thanks for the feedback, Abu Dina. Are there enough beers for a pi$$up yet? π
Haha.... plenty! I've lost count of how many time you've helped me!
---------------------------------------------------------
It takes a minimal capacity for rational thought to see that the corporate 'free press' is a structurally irrational and biased, and extremely violent, system of elite propaganda.
David Edwards - Media lens[/url]
Society has varying and conflicting interests; what is called objectivity is the disguise of one of these interests - that of neutrality. But neutrality is a fiction in an unneutral world. There are victims, there are executioners, and there are bystanders... and the 'objectivity' of the bystander calls for inaction while other heads fall.
Howard Zinn
Viewing 7 posts - 16 through 21 (of 21 total)
You must be logged in to reply to this topic. Login to reply