April 8, 2014 at 10:05 am
Maybe it's just one of those days, but for some reason I can't figure out the code to get what I want.
An example of:
declare @Tmp table(ssn int, name varchar(50), number int)
INSERT INTO @Tmp VALUES(123456789, 'John Smith', 123456)
INSERT INTO @Tmp VALUES(123456789, 'John Q Smith', 234567)
INSERT INTO @Tmp VALUES(123456789, 'JohnSmith', 345689)
SELECT *
FROM @Tmp t1 join
@Tmp t2 ON t1.ssn = t2.ssn
WHERE t1.name <> t2.name
Returns rows of:
ssn name number name number
----------- --------------- ----------- --------------- -----------
123456789 John Smith 111111 John Q Smith 222222
123456789 John Smith 111111 JohnSmith 333333
123456789 John Q Smith 222222 John Smith 111111
123456789 John Q Smith 222222 JohnSmith 333333
123456789 JohnSmith 333333 John Smith 111111
123456789 JohnSmith 333333 John Q Smith 222222
For what I'm doing, I consider rows 1 & 3, 2 & 5, 4 & 6
I can't seem to figure out the code to just give me one each pair. This is just a sample table, the real table may have several name mismatches for a ssn.
TIA for your help,
Doug
April 8, 2014 at 10:59 am
For a start, you can just try to get an ordered list, and eliminate rows where its the same but just a different order ie.,
change
WHERE t1.name <> t2.name
to
WHERE t1.name > t2.name
edit:
You also could consider a more general solution, for instance, if you are trying to see if you have the same ssn with different names, you could just search for rows that have the same ssn.
SELECT ssn, COUNT(*) C
FROM @Tmp
GROUP BY ssn
HAVING COUNT(*) > 1
April 8, 2014 at 11:20 am
I'd like the number column included as it helps people research the case.
April 8, 2014 at 11:44 am
dough-378918 (4/8/2014)
I'd like the number column included as it helps people research the case.
If it were me, I'd use the generic group by and then using the gathered ssn's that were duplicated, rejoin that to the original list to select the rows I wanted to examine, I'm a pretty literal guy so I'll break the task into steps, ie., find the dupes (group by), then find the info associated with the dupes (join).
There could be more concise ways to do this but I tend toward the most obvious 🙂
SELECT ssn, name, number FROM
(
SELECT ssn selected_ssn, COUNT(*) C
FROM @Tmp
GROUP BY ssn
HAVING COUNT(*) > 1
) SELECTED
JOIN @Tmp
ON selected_ssn = ssn
April 8, 2014 at 12:10 pm
Would something like this help you?
SELECT ssn
,repeatedvalues = STUFF(( SELECT DISTINCT '; ' + name + '-' + CAST( number AS varchar(20))
FROM @Tmp x
WHERE x.ssn = t.ssn
FOR XML PATH('')), 1, 2, '')
FROM @Tmp t
GROUP BY ssn
HAVING COUNT(DISTINCT name + CAST( number AS varchar(20))) > 1
This will group all the distinct values in one row.
April 8, 2014 at 12:41 pm
And then, encrypt those SSNs and stop storing them in plain text immediately!!!
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
April 8, 2014 at 2:11 pm
Sean Lange (4/8/2014)
And then, encrypt those SSNs and stop storing them in plain text immediately!!!
Very good point, how would you do it and still allow him to find his dupes and fix them once their found? Does the data entry involve entering the encrypted value, or do we produce the list decrypted? Can we group by the encrypted value? What's his encryption choices in light of what he wants done? Where do we store the key to allow all of this to happen? If the column is encrypted with a salt, how in the heck do we dedupe on it?
April 8, 2014 at 2:21 pm
patrickmcginnis59 10839 (4/8/2014)
Sean Lange (4/8/2014)
And then, encrypt those SSNs and stop storing them in plain text immediately!!!Very good point, how would you do it and still allow him to find his dupes and fix them once their found? Does the data entry involve entering the encrypted value, or do we produce the list decrypted? Can we group by the encrypted value? What's his encryption choices in light of what he wants done? Where do we store the key to allow all of this to happen? If the column is encrypted with a salt, how in the heck do we dedupe on it?
I meant once the dedupe was complete. Get the data cleaned up and then get it encrypted. 😉
_______________________________________________________________
Need help? Help us help you.
Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.
Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.
Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/
Viewing 8 posts - 1 through 7 (of 7 total)
You must be logged in to reply to this topic. Login to reply