Query to remove duplicates from a table ? ?

Question

Post reply

Query to remove duplicates from a table ? ?

paul welsh

SSC Veteran

Points: 204
More actions
April 28, 2005 at 10:33 am

#164186

HI,

I have a table with 20,000 names and addresses and I need to ensure that there are no duplicates on it. I have been querying previously on firstname, surname, address1 and postcode. I have been trying to think of what the syntax could be to write this query - any ideas?

thanks,

Paul

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

ramses2nd Hall of Fame Points: 3720 More actions · Answer 1

If you have a unique id in the table you could try this:

DELETE FROM namesaddresses

WHERE id NOT IN (

SELECT id FROM (

(SELECT DISTINCT id, firstname, surname, address1, postcode

FROM namesaddresses)))

Or if there is no RI on the table you could even Try something like this.

SELECT DISTINCT firstname, surname, address1, postcode

INTO ##temp

FROM namesaddresses

TRUNCATE TABLE namesaddresses

INSERT INTO namesaddresses (firstname, surname, address1,postcode)

SELECT firstname, surname, address1, postcode

FROM ##temp

Peter Wright SSCommitted Points: 1966 More actions · Answer 2

I would use the REPLACE function to remove blanks from the address and postcode:

REPLACE(address1, ' ', ''), REPLACE(postcode, ' ','')

but I can guarantee there will still be loads of duplicates due to typos, ambiguous addresses etc. These, unfortunately, are difficult to deal with in SQL. Some address dedupe programs, for example, remove vowels from words but you cannot use 'a','e','i','o','u' as a matching pattern in SQL string functions.

Maybe your best approach, time permitting, is to have several attempts at it, examining the results after each attempt and refiniing/adjusting the query accordingly. Personally, I would send the names and addresses to a commercial deduplication agency.