Query to remove duplicates from a table ? ?

  • HI,

     

    I have a table with 20,000 names and addresses and I need to ensure that there are no  duplicates on it. I have been querying previously on firstname, surname, address1 and postcode. I have been trying to think of what the syntax could be  to write this query - any ideas?

     

    thanks,

     

    Paul

  • If you have a unique id in the table you could try this:

    DELETE FROM namesaddresses

    WHERE id NOT IN (

    SELECT id FROM (

    (SELECT DISTINCT id, firstname, surname, address1, postcode

    FROM namesaddresses)))

    Or if there is no RI on the table you could even Try something like this.

    SELECT DISTINCT firstname, surname, address1, postcode

    INTO ##temp

    FROM namesaddresses

    TRUNCATE TABLE namesaddresses

    INSERT INTO namesaddresses (firstname, surname, address1,postcode)

    SELECT firstname, surname, address1, postcode

    FROM ##temp

     

  • I would use the REPLACE function to remove blanks from the address and postcode:

    REPLACE(address1, ' ', ''), REPLACE(postcode, ' ','')

    but I can guarantee there will still be loads of duplicates due to typos, ambiguous addresses etc. These, unfortunately, are difficult to deal with in SQL. Some address dedupe programs, for example, remove vowels from words but you cannot use 'a','e','i','o','u' as a matching pattern in SQL string functions.

    Maybe your best approach, time permitting, is to have several attempts at it, examining the results after each attempt and refiniing/adjusting the query accordingly. Personally, I would send the names and addresses to a commercial deduplication agency.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply