AutoNumber Integer to Varchar with min ID

Question

AutoNumber Integer to Varchar with min ID

DAVID.REDDEN

SSC Enthusiast

Points: 105
More actions
February 25, 2015 at 8:39 am

#294930

I have a table with fields: FirstName Last_Name Date_of_Birth and ID that could have many "flavors" to First_Name & Last_Name.
Currently the table has an ID which is an AutoNumber, Integer.
Example:
First_Name Last_Name Date_of_Birth ID
John Doe 1/1/1984 1
John J Doe 1/1/1984 2
John James Doe 1/1/1984 3
John Doe, J 1/1/1984 4
James Smith 12/1/1965 5
James K Smith 12/1/1965 6
James Smith 8/1/1973 7
I am looking/hoping to accomplish the following.
Keep all rows; but have the ID update to the minimum ID.
First_Name Last_Name Date_of_Birth ID
John Doe 1/1/1984 1
John J Doe 1/1/1984 1
John James Doe 1/1/1984 1
John Doe, J 1/1/1984 1
James Smith 12/1/1965 5
James K Smith 12/1/1965 5
James Smith 8/1/1973 7
My SQL skills drop off to figure out how to do this.
Any help would be greatly appreciated.

Viewing 12 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323755 More actions · Answer 1

its possible, but you have to have defined the matching criteria.

until you define logic for identifying the duplicates, you cannot consolidate duplicates. also, you probably need additional logic, since you might have addresses, email, phone sthat can be used to identify duplicates as well.

I've done this in a multiple phase approach:

exact matches first,

strip out whitespace and pucntuation as part of the match (so that Louis CK matches Louis C.K.)

charindex containing names

strip out + charindex.

here's a quick example of #3:

Create Table #dupes(

First_Name varchar(50),

Last_Name varchar(50),

Date_of_Birth date,

ID int

)

INSERT INTO #dupes

SELECT 'John','Doe','1/1/1984','1' UNION ALL

SELECT 'John J','Doe','1/1/1984','2' UNION ALL

SELECT 'John James','Doe','1/1/1984','3' UNION ALL

SELECT 'John','Doe, J','1/1/1984','4' UNION ALL

SELECT 'James','Smith','12/1/1965','5' UNION ALL

SELECT 'James','K Smith','12/1/1965','6' UNION ALL

SELECT 'James','Smith','8/1/1973','7'

select * from #dupes T1

inner join #dupes T2

ON T1.Date_of_Birth = T2.Date_of_Birth

AND CHARINDEX(T2.Last_Name,T1.Last_Name) >= 0

AND CHARINDEX(T2.First_Name,T1.First_Name) >= 0

WHERE T1.ID <> T2.ID

select * from #dupes T1

inner join #dupes T2

ON T1.Date_of_Birth = T2.Date_of_Birth

AND CHARINDEX(T1.Last_Name,T2.Last_Name) >= 0

AND CHARINDEX(T1.First_Name,T2.First_Name) >= 0

WHERE T1.ID <> T2.ID

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

DAVID.REDDEN SSC Enthusiast Points: 105 More actions · Answer 2

I messed up with my example.

It's more like this:

FIRST_NAME MIDDLE_INITIAL LAST_NAME DOB MEMBER_ID ID

John Doe 01/01/1984 AB123456 1

John J Doe 01/01/1984 AB123456 2

John James Doe 01/01/1984 AB123456 3

James Smith 12/01/1965 ZY987654 4

James K Smith 12/01/1965 ZY987654 5

James Smith 08/20/1973 BB754321 6

Where Member_ID is the unique key

so I would like it to end up like this:

FIRST_NAME MIDDLE_INITIAL LAST_NAME DOB MEMBER_ID ID

John Doe 01/01/1984 AB123456 1

John J Doe 01/01/1984 AB123456 1

John James Doe 01/01/1984 AB123456 1

James Smith 12/01/1965 ZY987654 4

James K Smith 12/01/1965 ZY987654 4

James Smith 08/20/1973 BB754321 6

Sean Lange SSC Guru Points: 286609 More actions · Answer 3

This is a very common issue. Unfortunately there is no magic button for this. It requires human levels of intelligence to sort this out. You can leverage sql to get some but it takes eyes on the data to get this correct.

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

Luis Cazares SSC Guru Points: 183706 More actions · Answer 4

If you have a Member_ID is your correct identifier for each person, then this should give you the correct information.

CREATE TABLE SampleData(

FIRST_NAME varchar(20),

MIDDLE_INITIAL varchar(20),

LAST_NAME varchar(20),

DOB date,

MEMBER_ID char(8),

ID int)

INSERT INTO SampleData

SELECT 'John','','Doe', '01/01/1984', 'AB123456', 1 UNION ALL

SELECT 'John','J','Doe', '01/01/1984', 'AB123456', 2 UNION ALL

SELECT 'John','James','Doe', '01/01/1984', 'AB123456', 3 UNION ALL

SELECT 'James','','Smith', '12/01/1965', 'ZY987654', 4 UNION ALL

SELECT 'James','K','Smith', '12/01/1965', 'ZY987654', 5 UNION ALL

SELECT 'James','','Smith', '08/20/1973', 'BB754321', 6

SELECT *, ID - ROW_NUMBER() OVER(PARTITION BY MEMBER_ID ORDER BY ID) + 1 MinID

FROM SampleData

ORDER BY MinID

GO

DROP TABLE SampleData

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sean Lange SSC Guru Points: 286609 More actions · Answer 5

I didn't look at the new sample data until after you posted Luis. I assumed this was the very common "I have data that is mangled all over the place that is easy to see with human eyes that needs to be sorted". However, looking at the new sample data it is a lot cleaner than most. Well done again sir!

_______________________________________________________________

Need help? Help us help you.

Read the article at http://www.sqlservercentral.com/articles/Best+Practices/61537/ for best practices on asking questions.

Need to split a string? Try Jeff Modens splitter http://www.sqlservercentral.com/articles/Tally+Table/72993/.

Cross Tabs and Pivots, Part 1 – Converting Rows to Columns - http://www.sqlservercentral.com/articles/T-SQL/63681/
Cross Tabs and Pivots, Part 2 - Dynamic Cross Tabs - http://www.sqlservercentral.com/articles/Crosstab/65048/
Understanding and Using APPLY (Part 1) - http://www.sqlservercentral.com/articles/APPLY/69953/
Understanding and Using APPLY (Part 2) - http://www.sqlservercentral.com/articles/APPLY/69954/

DAVID.REDDEN SSC Enthusiast Points: 105 More actions · Answer 6

DAVID.REDDEN

SSC Enthusiast

Points: 105

February 25, 2015 at 10:11 am

#1779985

Thank you Luis! That worked GREAT!!

Luis Cazares SSC Guru Points: 183706 More actions · Answer 7

Sean Lange (2/25/2015)
I didn't look at the new sample data until after you posted Luis. I assumed this was the very common "I have data that is mangled all over the place that is easy to see with human eyes that needs to be sorted". However, looking at the new sample data it is a lot cleaner than most. Well done again sir!

Thank you, I thought the same thing until the new data came along and made everything easier.

The main problem was to simulate the RANK() but using the ID order.

Luis C.
General Disclaimer:
Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

How to post data/code on a forum to get the best help: Option 1 / Option 2

Sigerson Hall of Fame Points: 3697 More actions · Answer 8

@luis

+1

A nice solution, very elegant!

Sigerson

"No pressure, no diamonds." - Thomas Carlyle

Jacob Wilkins One Orange Chip Points: 27976 More actions · Answer 9

Just to point out something for the more paranoid of us, I believe Luis' solution assumes that all rows with the same MEMBER_ID use consecutive IDs with no gaps.

If there are gaps, or rows with the same MEMBER_ID do not have consecutive IDs, then the results will not be what you expect, methinks.

For concreteness:

CREATE TABLE SampleData(

FIRST_NAME varchar(20),

MIDDLE_INITIAL varchar(20),

LAST_NAME varchar(20),

DOB date,

MEMBER_ID char(8),

ID int)

INSERT INTO SampleData

SELECT 'James','K','Smith', '12/01/1965', 'ZY987654', 1 UNION ALL

SELECT 'John','','Doe', '01/01/1984', 'AB123456', 2 UNION ALL

SELECT 'John','J','Doe', '01/01/1984', 'AB123456', 3 UNION ALL

SELECT 'John','James','Doe', '01/01/1984', 'AB123456', 4 UNION ALL

SELECT 'James','','Smith', '12/01/1965', 'ZY987654', 5 UNION ALL

SELECT 'James','','Smith', '08/20/1973', 'BB754321', 6

SELECT *, ID - ROW_NUMBER() OVER(PARTITION BY MEMBER_ID ORDER BY ID) + 1 MinID

FROM SampleData

ORDER BY MinID

GO

DROP TABLE SampleData

--The above will give different MinIDs for each James Smith with MEMBER_ID ZY987654

CREATE TABLE SampleData(

FIRST_NAME varchar(20),

MIDDLE_INITIAL varchar(20),

LAST_NAME varchar(20),

DOB date,

MEMBER_ID char(8),

ID int)

INSERT INTO SampleData

SELECT 'John','','Doe', '01/01/1984', 'AB123456', 1 UNION ALL

SELECT 'John','J','Doe', '01/01/1984', 'AB123456', 2 UNION ALL

SELECT 'John','James','Doe', '01/01/1984', 'AB123456', 4 UNION ALL

SELECT 'James','','Smith', '12/01/1965', 'ZY987654', 5 UNION ALL

SELECT 'James','K','Smith', '12/01/1965', 'ZY987654', 6 UNION ALL

SELECT 'James','','Smith', '08/20/1973', 'BB754321', 7

SELECT *, ID - ROW_NUMBER() OVER(PARTITION BY MEMBER_ID ORDER BY ID) + 1 MinID

FROM SampleData

ORDER BY MinID

GO

DROP TABLE SampleData

--This one will give a different MinID for John James Doe than for John Doe and John J Doe.

Perhaps the data is so arranged that this isn't a problem (or I'm misunderstanding the scenario entirely, which is all too possible on a day with no coffee!), but I thought I'd mention this just to be safe 🙂

MMartin1 One Orange Chip Points: 27867 More actions · Answer 10

Borrowing the table and data creation

CREATE TABLE SampleData(

FIRST_NAME varchar(20),

MIDDLE_INITIAL varchar(20),

LAST_NAME varchar(20),

DOB date,

MEMBER_ID char(8),

ID int)

INSERT INTO SampleData

SELECT 'John','','Doe', '01/01/1984', 'AB123456', 1 UNION ALL

SELECT 'John','J','Doe', '01/01/1984', 'AB123456', 2 UNION ALL

SELECT 'John','James','Doe', '01/01/1984', 'AB123456', 3 UNION ALL

SELECT 'James','','Smith', '12/01/1965', 'ZY987654', 4 UNION ALL

SELECT 'James','K','Smith', '12/01/1965', 'ZY987654', 5 UNION ALL

SELECT 'James','','Smith', '08/20/1973', 'BB754321', 6

You can also do

update t

set t.id = uv.minID

FROM sampleData t Join (

SELECT year(dob) as yr, min(id) as minId

from sampleData

group by dob

) uv

ON uv.yr = year(t.dob)

----------------------------------------------------

Sergiy SSC Guru Points: 110209 More actions · Answer 11

Perhaps this reading would be useful here:

http://databases.about.com/od/specificproducts/a/normalization.htm

2nd normal form is what you need to solve the problem.

All the name variations must be removed to a separate table.

_____________
Code for TallyGenerator