January 13, 2016 at 1:49 pm
it's helpful if you use a fixed font, so the Code=other or code=plain format to the left, when you post.
How are rows 4 and 5 dupes? Unless you mean that row 4 is a dupe of row 1.
If that's the case, then are there columns that you need to examine for potential different data? In other words, are these rows still dupes?
Header Col1 Col2 Col3 Col4 ... Coln IsDuplicate
Row 1 123 ABC XYZA 12/1/2015 ... qaz No
Row 5 123 ABCX XYZ 12/1/2015 ... qaz1 Yes
January 13, 2016 at 1:56 pm
Yes, you are correct; I meant Row 1 and Row 5 are dupes & Row 2 and Row 4 are dupes.
In your example Row 1 and Row 5 are not dupes because they have different data for other columns. In order to identify dupes there should be only one column mismatch and that also it should be additional data to original one.
January 13, 2016 at 5:01 pm
Using the harness from Phil's post:
if OBJECT_ID('tempdb..#dupes') is not null
drop table #dupes;
create table #dupes (
Id int identity (1,1) primary key clustered,
Col1 int,
Col2 Varchar(10),
Col3 Varchar(10),
Col4 Date,
Col5 Varchar(10)
)
insert #dupes ( Col1, Col2, Col3, Col4, Col5 )
values ( 123, 'ABC', 'XYZ', '20150112', 'qaz' )
, ( 345, 'XCV', 'NHT', '20150212', 'wsx' )
, ( 123, 'ABX', 'XYZ', '20150212', 'DFG' )
, ( 345, 'XCV', 'NHTXV', '20150212', 'wsx' )
, ( 123, 'ABCX', 'XYZ', '20150112', 'qaz' )
SELECT *
FROM #dupes dup
INNER JOIN #dupes nondup ON dup.Id <> nondup.Id
AND dup.Col1 = nondup.Col1
AND dup.Col2 LIKE nondup.Col2 + '%'
AND dup.Col3 LIKE nondup.Col3 + '%'
AND dup.Col4 = nondup.Col4
AND dup.Col5 LIKE nondup.Col5 + '%'
SELECT dup.*,
CASE WHEN nondup.id IS NULL THEN 'No' ELSE 'Yes' END [IsDuplicate]
FROM #dupes dup
LEFT JOIN #dupes nondup ON dup.Id <> nondup.Id
AND dup.Col1 = nondup.Col1
AND dup.Col2 LIKE nondup.Col2 + '%'
AND dup.Col3 LIKE nondup.Col3 + '%'
AND dup.Col4 = nondup.Col4
AND dup.Col5 LIKE nondup.Col5 + '%'
_____________
Code for TallyGenerator
January 13, 2016 at 8:27 pm
Thanks, this is exactly what I was looking for.
January 14, 2016 at 5:56 am
monilps (1/13/2016)
Thanks, this is exactly what I was looking for.
I disagree. Well, it may be what you are looking for, but it does not match your original requirement.
In order to flag as duplicate all the data should be same except one column and that column should have same data plus additional data.
Try adding this row to the values() statement and re-running:
, ( 123, 'ABCZ', 'XYZZ', '20150112', 'qazz' )
This row will be flagged as a duplicate, even though 3 or more of its columns differ from any one of the other rows.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
Viewing 5 posts - 16 through 19 (of 19 total)
You must be logged in to reply to this topic. Login to reply