February 16, 2018 at 1:57 pm
I need a query that can get me (the ID of) consecutive duplicate rows, where I am grouping on some of the columns, don't care about rest.
I know I have to use row over to look for consecutive, and group by certain columns that make the dupes, but I am having a hard time with the grouping, and where to put the grouping (sub query, main query).
The following sets up my data in a temp table :
---------------------------------------------------------------
IF OBJECT_ID('TempDB..#Temp') IS NOT NULL
DROP TABLE #Temp;
CREATE TABLE #Temp
( ColA INT
,ColB INT
,ColC CHAR(1)
,ColD CHAR(1)
,ColE DECIMAL(3,2)
)
INSERT INTO #Temp ( ColA , ColB , ColC, ColD, ColE)
SELECT 0, 1, 'A', 'A',1.1
UNION ALL SELECT 1, 1, 'A', 'A',1.2
UNION ALL SELECT 2, 1, 'A', 'A',1.3
UNION ALL SELECT 3, 1, 'D', 'X',1.4
UNION ALL SELECT 4, 1, 'A', 'A',1.5
UNION ALL SELECT 5, 1, 'A', 'A',1.6
UNION ALL SELECT 6, 1, 'A', 'A',1.7
UNION ALL SELECT 7, 1, 'A', 'A',1.8
UNION ALL SELECT 8, 2, 'A', 'B',2.1
UNION ALL SELECT 9, 2, 'A', 'B',2.2
UNION ALL SELECT 10, 2, 'D', 'X',2.3
UNION ALL SELECT 11, 2, 'A', 'B',2.4
UNION ALL SELECT 12, 2, 'A', 'B',2.5
UNION ALL SELECT 13, 2, 'A', 'B',2.6
UNION ALL SELECT 14, 2, 'A', 'B',1.1
SELECT * FROM #Temp
-A dupe is only considering ColB, C and D (highlighted below). I Need the IDs (ColA)
ColA ColB ColC ColD ColE
0 1 A A 1.10
1 1 A A 1.20
2 1 A A 1.30
3 1 D X 1.40
4 1 A A 1.50
5 1 A A 1.60
6 1 A A 1.70
7 1 A A 1.80
8 2 A B 2.10
9 2 A B 2.20
10 2 D X 2.30
11 2 A B 2.40
12 2 A B 2.50
13 2 A B 2.60
14 2 A B 1.10
ColA
-----
1
2
5
6
7
9
12
13
14
Any help would be appreciated!
February 16, 2018 at 2:12 pm
Here is one way:
WITH AllVals
AS
(
SELECT
*
, PrevColB = LAG(t.ColB, 1) OVER (ORDER BY t.ColA)
, PrevColC = LAG(t.ColC, 1) OVER (ORDER BY t.ColA)
, PrevColD = LAG(t.ColD, 1) OVER (ORDER BY t.ColA)
FROM #Temp t
)
SELECT v.ColA
FROM AllVals v
WHERE
v.ColB = v.PrevColB
AND v.ColC = v.PrevColC
AND v.ColD = v.PrevColD;
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
February 16, 2018 at 2:49 pm
I know the post is in a 2012 forum, but just in case, here's another way that will work on versions pre-2012:
WITH
grouped AS
(
SELECT *, group_id=ColA-ROW_NUMBER() OVER (PARTITION BY ColB,ColC,ColD ORDER BY ColA ASC)
FROM #temp
)
,numbered AS
(
SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY ColB,ColC,ColD,group_id ORDER BY ColA ASC)
FROM grouped
)
SELECT ColA
FROM numbered
WHERE rn>1
ORDER BY ColA ASC;
Also, just for kicks and giggles, here's another way (I wouldn't actually use this one; I'm including it only because I tried my hand at finding the most concise query that satisfied the requirements, and this was the best I ended with):
SELECT ColA=y
FROM #Temp
CROSS APPLY
(VALUES(ColA),(ColA+1))x(y)
GROUP BY y,ColB,ColC,ColD
HAVING COUNT(*)=2;
Cheers!
February 16, 2018 at 4:07 pm
SSCoach:
Awesome, that did it. I had to convert your WITHs to use temp tables, but thank you x 100 !!
February 16, 2018 at 4:56 pm
I'm glad you got it sorted out!
I'm curious, though. What exactly was the change you had to make?
Cheers!
EDIT: Fixed some wording I didn't like.
February 17, 2018 at 6:58 am
SELECT *, group_id=ColA-ROW_NUMBER() OVER (PARTITION BY ColB,ColC,ColD ORDER BY ColA)
INTO #Temp2
FROM #Temp
February 17, 2018 at 10:40 am
Thanks!
Continuation of my curiosity: why was it that that method had to be used instead of the CTEs? Performance differences, or something else?
Cheers!
February 17, 2018 at 1:44 pm
Jacob Wilkins - Friday, February 16, 2018 2:49 PMI know the post is in a 2012 forum, but just in case, here's another way that will work on versions pre-2012:
WITH
grouped AS
(
SELECT *, group_id=ColA-ROW_NUMBER() OVER (PARTITION BY ColB,ColC,ColD ORDER BY ColA ASC)
FROM #temp
)
,numbered AS
(
SELECT *, rn=ROW_NUMBER() OVER (PARTITION BY ColB,ColC,ColD,group_id ORDER BY ColA ASC)
FROM grouped
)SELECT ColA
FROM numbered
WHERE rn>1
ORDER BY ColA ASC;Also, just for kicks and giggles, here's another way (I wouldn't actually use this one; I'm including it only because I tried my hand at finding the most concise query that satisfied the requirements, and this was the best I ended with):
SELECT ColA=y
FROM #Temp
CROSS APPLY
(VALUES(ColA),(ColA+1))x(y)
GROUP BY y,ColB,ColC,ColD
HAVING COUNT(*)=2;Cheers!
Careful, Jacob. There's no guarantee that ColA won't have gaps in it. Although it certainly does require another calculation, you might want to use the following to counter the eventuality of ColA having a gap in it.
Grp_ID = ROW_NUMBER() OVER (ORDER BY ColA)
- ROW_NUMBER() OVER (PARTITION BY ColB,ColC,ColD ORDER BY ColA)
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2018 at 2:33 pm
Jeff,
Well aware of that. It was not specified that an id of, say, 7 counts as consecutive with an id of 5 so long as there are no rows with an id of 6. If that's the case, then yes, a change on those lines is needed.
Then again, maybe the requirement is that 5 and 7 are not to be counted as consecutive in the scenario above.
The queries were provided to meet the specifications provided. If the specifications are different, then yes, the queries will have to be different 🙂
February 17, 2018 at 2:37 pm
Jacob Wilkins - Saturday, February 17, 2018 2:33 PMJeff,Well aware of that. It was not specified that an id of, say, 7 counts as consecutive with an id of 5 so long as there are no rows with an id of 6. If that's the case, then yes, a change on those lines is needed.
Then again, maybe the requirement is that 5 and 7 are not to be counted as consecutive in the scenario above.
The queries were provided to meet the specifications provided. If the specifications are different, then yes, the queries will have to be different 🙂
Thanks. The consecutiveness is after order by Col2, Col1, Then Cols3-5 are same, and consecutive. I hope that helps.
February 17, 2018 at 3:18 pm
Please describe why the solution i provided is unsatisfatory. It worked ok for me.
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
February 17, 2018 at 4:11 pm
Jacob Wilkins - Saturday, February 17, 2018 2:33 PMJeff,Well aware of that. It was not specified that an id of, say, 7 counts as consecutive with an id of 5 so long as there are no rows with an id of 6. If that's the case, then yes, a change on those lines is needed.
Then again, maybe the requirement is that 5 and 7 are not to be counted as consecutive in the scenario above.
The queries were provided to meet the specifications provided. If the specifications are different, then yes, the queries will have to be different 🙂
Understood but where did the OP state that the sequence numbers in ColA were guaranteed to not contain any gaps in the numeric sequence? My suggestion is to always assume the worst when such things aren't specified.
--Jeff Moden
Change is inevitable... Change for the better is not.
February 17, 2018 at 4:24 pm
Jeff Moden - Saturday, February 17, 2018 4:11 PMJacob Wilkins - Saturday, February 17, 2018 2:33 PMJeff,Well aware of that. It was not specified that an id of, say, 7 counts as consecutive with an id of 5 so long as there are no rows with an id of 6. If that's the case, then yes, a change on those lines is needed.
Then again, maybe the requirement is that 5 and 7 are not to be counted as consecutive in the scenario above.
The queries were provided to meet the specifications provided. If the specifications are different, then yes, the queries will have to be different 🙂
Understood but where did the OP state that the sequence numbers in ColA were guaranteed to not contain any gaps in the numeric sequence? My suggestion is to always assume the worst when such things aren't specified.
ColA would be Identity, but there could still be gaps (rollback on error, etc).
Let me explain the original problem:
Users could click many times, not realizing they were updating the item over and over (bad latency protection, I know). Each one of those updates caused a history snapshot in a history table (simulated in the table in my original post). Since the users are paid by the click, you can see the problem. I need to remove repetitive rows, indicated by exact ColC-D-E rows, in sequence one after another, with no changes in between, grouped by ColB. Column A is the IDENTITY pk.
I believe the solution put forth by Jacob Wilkins satisfies the goal.
February 17, 2018 at 4:36 pm
Nowhere. That's my point. In some common query scenarios you just want the "next" value, wherever that is, and that's when you need to be careful not to assume there are no gaps.
My reading of "consecutive" is stronger than that, i.e., to be consecutive the sequence must be unbroken.
On such a reading, it would actually be a mistake to treat 1 and 3 as consecutive just because there was no 2.
It's not that I assumed there are no gaps; it's that I read gaps as not allowing the values on either side to be "consecutive".
It's not the case that adding the row number and using that is something that works in all cases while mine fails in the "worst case".
They're completely different answers to different requirements. Clarifying whether the OP meant consecutive in the sense in which I took it, or just "next" as you took it is well worth it, so thanks for mentioning it 🙂
Cheers!
EDIT: Fixed a typo.
February 18, 2018 at 8:24 am
Understood and appreciated. But you're only focusing on what the OP asked. While that certainly sounds like the right thing to do, even the OP has made no guarantee that all of the integers in ColA will be consecutive. And it's not just the OP that I'm thinking of here. Someone else might read this post for a similar problem and not understand that the key to all of this working is that the values of ColA must actually be consecutive (i.e. no "gaps") in order for this to work without the extra ROW_NUMBER(), which would make your good solution virtually bullet proof both for the OP and anyone else with a similar problem to solve.
To be sure, I'm not bad mouthing you or your solution... I'm just suggesting a method to make it bullet-proof.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 29 total)
You must be logged in to reply to this topic. Login to reply