May 6, 2016 at 11:39 am
Trying to figure a way to update the 2nd of 2 Duplicate Records
REQUESTID OTHERSYSTEMID
951 454215
952 454215
I would like to first search for all duplicate OTHERSYSTEMID'S in the table and then append '_D' to the end of the 2nd Duplicate Record for All duplicates found.
ie. REQUESTID OTHERSYSTEMID
951 454215
952 454215_D
May 6, 2016 at 12:00 pm
have you considered using ROW_NUMBER() ?
________________________________________________________________
you can lead a user to data....but you cannot make them think
and remember....every day is a school day
May 6, 2016 at 12:03 pm
a really nice feature is you an update on a cte affects the base table, so if you generate an identifier for dupes, you update everything that is a dupe.
WITH MyCTE
AS
(
SELECT ROW_NUMBER() OVER (PARTITION BY REQUESTID,OTHERSYSTEMID ORDER BY REQUESTID,OTHERSYSTEMID) AS RW,*
FROM MyTable
)
--UPDATE MyTarget SET OTHERSYSTEMID = OTHERSYSTEMID + '_D' --what you asked
--UPDATE MyTarget SET OTHERSYSTEMID = OTHERSYSTEMID + '_D' + CONVERT(varchar,RW) --D2,D3,D4 etc
SELECT OTHERSYSTEMID , OTHERSYSTEMID + '_D' + CONVERT(varchar,RW) As NewID,* FROM MyTable AS MyTarget
WHERE RW >1 --is it ok that dupes 2 thru a million all have _D? maybe it shoudl count the dupes?
Lowell
May 8, 2016 at 8:31 pm
Depending on number of rows, and indexes etc... but I would usually do a "GROUP BY" to identify dupes together with MIN to identify original record, then joining back onto main table (ignoring the original record)
e.g.
DECLARE @Tbl TABLE ( REQUESTID int, OTHERSYSTEMID varchar(50) )
INSERT INTO @Tbl ( REQUESTID, OTHERSYSTEMID ) VALUES ( 951, '454215'), (952, '454215' ), ( 953, '454215' )
;
UPDATE Dat SET Dat.OTHERSYSTEMID = Dat.OTHERSYSTEMID + '_D'
FROM ( SELECT MIN( REQUESTID ) AS MIN_REQUESTID, OTHERSYSTEMID FROM @Tbl GROUP BY OTHERSYSTEMID HAVING COUNT(*) > 1 ) Dupes
INNER JOIN @Tbl Dat ON Dat.OTHERSYSTEMID = Dupes.OTHERSYSTEMID AND Dat.REQUESTID != Dupes.MIN_REQUESTID
SELECT * FROM @Tbl
May 9, 2016 at 8:13 am
Con Alexis (5/8/2016)
Depending on number of rows, and indexes etc... but I would usually do a "GROUP BY" to identify dupes together with MIN to identify original record, then joining back onto main table (ignoring the original record)e.g.
DECLARE @Tbl TABLE ( REQUESTID int, OTHERSYSTEMID varchar(50) )
INSERT INTO @Tbl ( REQUESTID, OTHERSYSTEMID ) VALUES ( 951, '454215'), (952, '454215' ), ( 953, '454215' )
;
UPDATE Dat SET Dat.OTHERSYSTEMID = Dat.OTHERSYSTEMID + '_D'
FROM ( SELECT MIN( REQUESTID ) AS MIN_REQUESTID, OTHERSYSTEMID FROM @Tbl GROUP BY OTHERSYSTEMID HAVING COUNT(*) > 1 ) Dupes
INNER JOIN @Tbl Dat ON Dat.OTHERSYSTEMID = Dupes.OTHERSYSTEMID AND Dat.REQUESTID != Dupes.MIN_REQUESTID
SELECT * FROM @Tbl
It is usually much more efficient to use a CTE with a ROW_NUMBER(), because you only need to read the records once instead of twice (once for the GROUP BY and once for the actual update). ROW_NUMBER() also has the advantage that it's more flexible in that you can specify any order within the partition.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 9, 2016 at 12:14 pm
update t
set t.id1=999
from t b join
(
select id1,max(id) as id from t
group by id1 ) a on a.id=b.id
select * from t
May 9, 2016 at 12:21 pm
prafuljemail (5/9/2016)
update tset t.id1=999
from t b join
(
select id1,max(id) as id from t
group by id1 ) a on a.id=b.id
select * from t
The group by is fine if you KNOW there is only one duplicate record per OtherSystemID.
if there is more than one duplicate per ID, group by logic will only fix one of them, you'd have to rerun it repetatively for N-1 times.
the row number version does not suffer from that issue.
try any of the above solutions in the situation where the data is like this:
REQUESTID OTHERSYSTEMID
951 454215
952 454215
953 454215
954 454215
955 454215
961 454216
962 454216
963 454217
964 454217
965 454217
;WITH MyCTE([REQUESTID],[OTHERSYSTEMID],[])
AS
(
SELECT '951','454215' UNION ALL
SELECT '952','454215' UNION ALL
SELECT '953','454215' UNION ALL
SELECT '954','454215' UNION ALL
SELECT '955','454215' UNION ALL
SELECT '961','454216' UNION ALL
SELECT '962','454216' UNION ALL
SELECT '963','454217' UNION ALL
SELECT '964','454217' UNION ALL
SELECT '965','454217'
)
SELECT * FROM MyCTE;
Lowell
May 9, 2016 at 2:28 pm
Lowell (5/9/2016)
prafuljemail (5/9/2016)
update tset t.id1=999
from t b join
(
select id1,max(id) as id from t
group by id1 ) a on a.id=b.id
select * from t
The group by is fine if you KNOW there is only one duplicate record per OtherSystemID.
if there is more than one duplicate per ID, group by logic will only fix one of them, you'd have to rerun it repetatively for N-1 times.
the row number version does not suffer from that issue.
try any of the above solutions in the situation where the data is like this:
REQUESTID OTHERSYSTEMID
951 454215
952 454215
953 454215
954 454215
955 454215
961 454216
962 454216
963 454217
964 454217
965 454217
;WITH MyCTE([REQUESTID],[OTHERSYSTEMID],[])
AS
(
SELECT '951','454215' UNION ALL
SELECT '952','454215' UNION ALL
SELECT '953','454215' UNION ALL
SELECT '954','454215' UNION ALL
SELECT '955','454215' UNION ALL
SELECT '961','454216' UNION ALL
SELECT '962','454216' UNION ALL
SELECT '963','454217' UNION ALL
SELECT '964','454217' UNION ALL
SELECT '965','454217'
)
SELECT * FROM MyCTE;
I disagree. The GROUP BY causes approximately twice as many reads as the ROW_NUMBER in the general case, and in this specific case, he is creating duplicates on id 999 that he is not accounting for, so he will run into problems as he finds more duplicates.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
May 9, 2016 at 11:44 pm
hey, do you get solution or not.
i have updated my query with example and sample data on winmilestone com forum database queries
May 19, 2016 at 2:50 pm
Id use rownumber.
its so flexible and powerful for when you have duplicate records
try the following.
CREATE TABLE ##X
(COL1 NVARCHAR(10),COL2 NVARCHAR(10))
INSERT INTO ##X(COL1,COL2)
SELECT
'951' AS COL1,'454215' AS COL2
UNION ALL
SELECT '952','454215' UNION ALL
SELECT '953','454215' UNION ALL
SELECT '954','454215' UNION ALL
SELECT '955','454215' UNION ALL
SELECT '961','454216' UNION ALL
SELECT '962','454216' UNION ALL
SELECT '963','454217' UNION ALL
SELECT '964','454217' UNION ALL
SELECT '965','454217'
UPDATE A
SET COL2=COL2+'_D'
FROM (
SELECT *,ROW_NUMBER () OVER (PARTITION BY COL2 ORDER BY COL2) NUM FROM ##X) AS a
WHERE NUM>1
SELECT * FROM ##X
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply