How to Eliminate or flag duplicate records

  • Hello Everyone,

    I am having a table with millions of rows in it and there are the columns are like

    AccountID Name Database Instance Grade Rate Special_Flag

    Now this table is not having Primary key/index on it also we have the only way to identify the unique identifier based on the combination of columns AccountID/Name/Region. Now when anybody asks the raw data without any filter and they just want the unique records from the table, I just do the distinct on these 3 ccolumns. But when I asks me to give them the report conatining all columns, the distinct wont works. Because the Special_Flag column is somewhere shows that under certain Instance the account is not special priviledge and under some Instance it is. So if any of the places if the account comes as Special_Flag Y we need to treat it as Y and ignore the rows which are having N. All in all I just wanted to make the one more flag in each of this type of table and mark those accounts as duplicate, where the combination of AccountID+Name+Database+Instance is coming twice or more. So when I rum my report I can exclude duplicates. Thanks in advance.

    Shekhar

  • I'm a little confused on your requirements. Can you post some sample data and the desired results based off of that sample data? That would help tremendously!

    John Rowan

    ======================================================
    ======================================================
    Forum Etiquette: How to post data/code on a forum to get the best help[/url] - by Jeff Moden

  • Here's a way.

    Uncomment your preferred line.

    WITH CTE AS(

    SELECT *, ROW_NUMBER() OVER( PARTITION BY AccountID, Name, Region ORDER BY Special_Flag DESC) rn

    FROM MyTable

    )

    --DELETE FROM CTE WHERE rn > 1;

    --UPDATE CTE SET Another_Special_Flag = 1 WHERE rn = 1;

    --SELECT * FROM CTE WHERE rn = 1;

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • Luis Cazares (6/22/2016)


    Here's a way.

    Uncomment your preferred line.

    WITH CTE AS(

    SELECT *, ROW_NUMBER() OVER( PARTITION BY AccountID, Name, Region ORDER BY Special_Flag DESC) rn

    FROM MyTable

    )

    --DELETE FROM CTE WHERE rn > 1;

    --UPDATE CTE SET Another_Special_Flag = 1 WHERE rn = 1;

    --SELECT * FROM CTE WHERE rn = 1;

    Of the many different ways to identify duplicate rows, I've found this one to be the most efficient.

    The real benefit comes when you uncomment Luis's first line to delete the duplicates. Then all you have to do is prevent them from getting in your table in the first place and you're home free. Of course, your requirements may dictate that you keep them all. At the very least, add Another_Special_Flag like Luis suggests and change your query for the report to only select those where it equals 1.

  • Thank you so much for your help, yes it worked for me.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply