Duplicates

  • I have a table like below with 1M+ rows:
     
    ID    col1    col2    col3    ..... col12
    1     abc              NULL         abc
    2     acd     abc     acd           abc
    3     abc     NULL  acd           adc
    4     abc     NULL                  adc
     
    And I need to find out all pairs of duplicates in each row ( for each ID), excludind NULLs and empty fields.
    Win2k sp4/ SQL2k sp2
     
    Thanks in advance.
     
    Markus
  • Define duplicates and what you need to do with them.

  • Duplicates are for ID=1 - col1 and col12, for ID=2 - col1 and col3, and another pair col2 and col12, for ID=3 and 4 - no dups. I want to update one of columns with dups to empty string.

  • What I supposed to do was:

    select ID, col1,col2, ..., col12 where (col1=col2 or col1=col3 ...or col1=col12) and col1 is not NULL and col2 is not null...and col12 is not null and col1 <>'' etc., etc.

    But I think there should be a nicer way.

     

    Thanks,

    Markus

  • I would try one of these two methods, depending on the frequency of duplicates.  The first one uses 11 update statements but only updates the rows that need it.  The second one only scans the table once, but will update all rows unless you build a huge WHERE clause from all the WHEN conditions.

    update tbl set col2 = ''

    where col2 = col1

    update tbl set col3 = ''

    where col3 in (col1,col2)

    update tbl set col4 = ''

    where col4 in (col1,col2,col3)

    update tbl set col5 = ''

    where col5 in (col1,col2,col3,col4)

    update tbl set col6 = ''

    where col6 in (col1,col2,col3,col4,col5)

    update tbl set col7 = ''

    where col7 in (col1,col2,col3,col4,col5,col6)

    update tbl set col8 = ''

    where col8 in (col1,col2,col3,col4,col5,col6,col7)

    update tbl set col9 = ''

    where col9 in (col1,col2,col3,col4,col5,col6,col7,col8)

    update tbl set col10 = ''

    where col10 in (col1,col2,col3,col4,col5,col6,col7,col8,col9)

    update tbl set col11 = ''

    where col11 in (col1,col2,col3,col4,col5,col6,col7,col8,col9,col10)

    update tbl set col12 = ''

    where col12 in (col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11)

    ------------------------------------------------------------------------

    update tbl set col2 = case when col1 = col2 then '' else col2 end,

     col3 = case when col3 in (col1,col2) then '' else col3 end,

     col4 = case when col4 in (col1,col2,col3) then '' else col4 end,

     col5 = case when col5 in (col1,col2,col3,col4) then '' else col5 end,

     col6 = case when col6 in (col1,col2,col3,col4,col5) then '' else col6 end,

     col7 = case when col7 in (col1,col2,col3,col4,col5,col6) then '' else col7 end,

     col8 = case when col8 in (col1,col2,col3,col4,col5,col6,col7) then '' else col8 end,

     col9 = case when col9 in (col1,col2,col3,col4,col5,col6,col7,col8) then '' else col9 end,

     col10 = case when col10 in (col1,col2,col3,col4,col5,col6,col7,col8,col9) then '' else col10 end,

     col11 = case when col11 in (col1,col2,col3,col4,col5,col6,col7,col8,col9,col10) then '' else col11 end,

     col12 = case when col12 in (col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11) then '' else col12 end

  • You might also consider normalizing this table .

  • I'll try all ways and let you know.

    You absolutely right about normalizing this table, but it's not in my power.

  • The second query is much faster and do the job.

    Thank you very much.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply