T-SQL help needed to eliminate duplicates from table

  • In order for us to even try and help you further and to see what other problems you have (as I do not believe that the sample data and table layout resembles your real table) please post the full table DDL including indexes and constraints, full description of the SSIS process, detailed in each step, including the content of any sql/stored proc executed as part of that SSIS that has direct or indirect impact on the table with the issue.

    Also if this was an issue loading files in duplicate you should know which records were added already and that should make your life easier - just load the duplicate files onto another table and go through the records that match the duplicates and process them accordingly.

    And from an accountability point of view both 1,2 and 3 should be kept - these are not part of the error.

  • mw_sql_developer - Monday, June 11, 2018 10:51 AM

    Lynn Pettis - Monday, June 11, 2018 10:31 AM

    My problem, you were also asked to show what you have tried to solve your problem.  At this time, all I see is you asking others to do for free what you are getting paid to do.
    We are volunteers, not paid workers.  We are here to help you so that you can get better at writing the SQL you need to accomplish you work and perhaps need less help.  That could lead you to begin helping others here as well.  That is how many of us got started here, looking for help, getting the mentoring to get better, then giving back to the community that help us get better.

    Lynn: Believe me.. if I were to show you my work, it would only be the same us what was posted by pietlinden  ( or similar ). I got stuck and that is why I asked. 
    The script code that I posted and one more SQl was all that i did.. Thought about it and didn't see myself progressing.. so that is whey i asked....

    The only SQL you've posted is the example data; no attempted solution. We're asking to see your script because you may well be almost on the right path and it's a simple typographical error (you wouldn't be surprised how common that is). Showing your work also helps us find a solution for you; often the SQL will hold additional details on your goal. It helps saves time for us volunteers too; we don't have to write out a brand new query if we already have your script that just needs a tweak. Also, like Lynn said, we're not paid, so if yo want help, you have to "help us help you". Give us as much as you can do, including your attempted solution, so that we don't end up wasting (our) time on a solution you've already tried and didn't work.

    You'll get a far better experience here if you post your sample data, and show what you've tried. You might also, want to actually ask a question when you want help. your post above those come across as more of a request for us to solve your problem for, not a question asking for help on a problem you're having.

    On a different note, I wouldn't worry about Celko. He has a habit of moaning about people's sample data when they use UNION; among other things. Providing sample data in a consumable format is otherwise well received by the rest of the community.

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • mw_sql_developer - Monday, June 11, 2018 10:51 AM

    Lynn Pettis - Monday, June 11, 2018 10:31 AM

    My problem, you were also asked to show what you have tried to solve your problem.  At this time, all I see is you asking others to do for free what you are getting paid to do.
    We are volunteers, not paid workers.  We are here to help you so that you can get better at writing the SQL you need to accomplish you work and perhaps need less help.  That could lead you to begin helping others here as well.  That is how many of us got started here, looking for help, getting the mentoring to get better, then giving back to the community that help us get better.

    Lynn: Believe me.. if I were to show you my work, it would only be the same us what was posted by pietlinden  ( or similar ). I got stuck and that is why I asked. 
    The script code that I posted and one more SQl was all that i did.. Thought about it and didn't see myself progressing.. so that is whey i asked....

    If you got stuck it is even more important to post what you have tried.  It allows us to see what you were doing and where you got stuck.  This helps us guide you to a solution instead of having one (or more) of us doing the work from scratch.
    The real point here is that we want to help you learn and become better instead of depending on the internet to just solve your problems.  In the end it isn't us that has to support your code, it is you.

  • Rather long solution .. but it works...
    If someone out there can write a condensed SQL that will nail this I'd be happy to buy ....

    My SQL may take longer time and more IO but it works.

    If object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
    GO

    If object_id('tempdb..#copied') IS NOT NULL DROP TABLE #copied
    GO

    If object_id('tempdb..#AdjustedDistinct') IS NOT NULL DROP TABLE #AdjustedDistinct
    GO

    If object_id('tempdb..#MatchingPartner') IS NOT NULL DROP TABLE #MatchingPartner
    GO

    If object_id('tempdb..#OneFromRemainingPositiveRecords') IS NOT NULL DROP TABLE #OneFromRemainingPositiveRecords
    GO

    If object_id('tempdb..#t') IS NOT NULL DROP TABLE #t
    GO

    -----------------------------------------------------
    Create table #t( member varchar(10), amt_paid MONEY, amt_adjusted MONEY, dt VARCHAR(8) );
    GO
    -----------------------------------------------------
    INSERT INTO #t( member,amt_paid,amt_adjusted,dt)
    VALUES ( '1234', 100.00,0.00, '20180101')
    ,('1234', 0.00,-100.00, '20180101')
    ,('1234', 100.00,0.00, '20180101')
    ,('1234', 0.00,-100.00, '20180101')
    ,('1234', 100.00,0.00, '20180101')
    ,('1234', 100.00,0.00, '20180101')
    ,('1234', 100.00,0.00, '20180101')
    ,('4567', 100.00,0.00, '20180101')
    ,('4567', 0.00,-100.00, '20180101')
    ,('4567', 100.00,0.00, '20180101')
    ,('4567', 0.00,-100.00, '20180101')
    ,('4567', 100.00,0.00, '20180101')
    ,('4567', 100.00,0.00, '20180101')
    ,('4567', 100.00,0.00, '20180101');

    --Step 1 Lets give each record an IDENTITY
    --We will treat this table as our base table from now.
    select *,
    ID= IDENTITY( INT, 1,1 )
    INTO #Copied
    FROM #t

    --Step2-- lets get a distinct copy of any records with a negative value ( amt was adjusted )
    --These are adjustment claims
    Select * INTO #AdjustedDistinct
    FROM
    (

    Select member, amt_paid, amt_adjusted , dt, ID,
    ROW_NUMBER() OVER ( PARTITION BY member, amt_paid, amt_adjusted , dt ORDER BY member ) as RN
    FROM #Copied
    where
    amt_adjusted < 0 -- This way we start with the oned that have the negative values...

    )B
    WHERE
    B.RN =1

    --Step3-- lets find one partner record for each of those records in the #AdjustedDistinct table.
    --So what we are doing is finding a matching partner for each of those negative records.
    Select * INTO #MatchingPartner
    FROM
    (

    Select A.*,
    ROW_NUMBER() OVER ( PARTITION BY A.member, A.amt_paid, A.amt_adjusted , A.dt ORDER BY A.member ) as RN
    FROM
    #Copied A
    INNER JOIN #AdjustedDistinct B
    ON ( A.member = B.member ) AND ( A.amt_paid = B.amt_adjusted*(-1) ) AND ( A.dt = B.dt )
    WHERE
    A.amt_paid > 0
    ) X
    WHERE
    X.RN=1 -- This way we will only get just one partner

    --Step4
    --So now out of he record in #Copies, there can be positive records that have duplicates.
    --At this point we need just one of the remining positive records ( Where amt-paid > 0 )
    Select * INTO #OneFromRemainingPositiveRecords
    FROM
    (
        Select C.* ,
        ROW_NUMBER() OVER ( PARTITION BY C.member, C.amt_paid, C.amt_adjusted , C.dt ORDER BY C.member ) as RN
        FROM
        #Copied C
        WHERE
        C.amt_adjusted = 0 and C.amt_paid > 0.00
        AND
        C.ID not in ( Select ID from #AdjustedDistinct )
        AND
        c.ID not in ( Select ID from #MatchingPartner )
    )X
    WHERE
    X.RN =1

    --Step 5
    --So we have all the records we need listed in 3 tables.
    --Now it is time to put all of them together...
    --And now here is the final set
    SELECT * FROM #copied
    WHERE
    ID IN ( Select ID FROM #AdjustedDistinct )
    OR
    ID IN ( Select ID FROM #MatchingPartner )
    OR
    ID IN ( Select ID FROM #OneFromRemainingPositiveRecords )

    /*
    --Another expression to construct the final set we need...
    SELECT * FROM #AdjustedDistinct
    UNION ALL
    SELECT * FROM #MatchingPartner
    UNION ALL
    SELECT * FROM #OneFromRemainingPositiveRecords
    */

    /*
    Final Conclusion:
    A claim for 1234 arrived for $100
    A claim for 1234 as adjustment arrived for -$100 ( May be there was something wrong in the first claim )
    ( These 2 records got duplicated again as a result of them being in 2 differemnt monthly files, never mind how they got in )
    However the member 1234 did receive his medication.
    So a claim for 1234 arrived for $100 later ( the corrrect one for the medication he purchased )
    Again this last one got duplicated as a result of being included in several monthly files ( may be got loaded in next months file ).
    So our final set for should only have 3 records for 1234.

    --Same story for 4567
    */

  • mw_sql_developer - Monday, June 11, 2018 9:38 AM

    Eirikur Eiriksson - Monday, June 11, 2018 2:47 AM

    mw_sql_developer - Sunday, June 10, 2018 3:00 PM

    To make a long story short, one of our tables that have rx data has got messed up because the files get loaded more than one and as a result we now have duplicate records. Your help is needed to clean this up.

    So to make the problem simple, I created a script to describe the issue. MEMBER 1234 has 4 records for 100.00. I only need 2 of them. Why ? One of them is needed to cancel out with the record that has -100.00 in the amt_adjusted column. THen out of the 3 remaining. One is a new charge and the other 2 are simply duplicates. Same issue with the user 4567.

    So help me with the syntax of eliminating the duplicates.

    If object_id(‘tempdb..#t’) IS NOT NULL DROP TABLE #t

    Create table #t( member varchar(10), amt_paid MONEY, amt_adjusted MONEY, dt VARCHAR(8) );

    INSERT INTO #t( member,amt_paid,amt_adjusted,dt)

    Select ‘1234’, 100.00,0.00, ‘20180101’

    UNION

    Select ‘1234’, 0.00,-100.00, ‘20180101’

    UNION

    Select ‘1234’, 100.00,0.00, ‘20180101’

    UNION

    Select ‘1234’, 100.00,0.00, ‘20180101’

    UNION

    Select ‘1234’, 100.00,0.00, ‘20180101’

    UNION

    Select ‘4567’, 100.00,0.00, ‘20180101’

    UNION

    Select ‘4567’, 0.00,-100.00, ‘20180101’

    UNION

    Select ‘4567’, 100.00,0.00, ‘20180101’

    UNION

    Select ‘4567’, 100.00,0.00, ‘20180101’

    UNION

    Select ‘4567’, 100.00,0.00, ‘20180101’

    Fundamental mistake in the sample data, you must use UNION ALL, otherwise the sample set will NOT have any duplicates!
    😎

    Mr Eirikur Eiriksson: I dont use UNION or UNION ALL in real life. I only used this for the purpose of creating a sample script. In real life the data is loaded via a SSIS module. The error occurs because the same transaction can be in many files ( Example This months data may appear again in next months file, so it gets loaded 2 times )

    But you used UNION in your sample data and that was the mistake I was pointing out!
    😎

  • mw_sql_developer - Sunday, June 10, 2018 3:00 PM

    To make a long story short, one of our tables that have rx data has got messed up because the files get loaded more than one and as a result we now have duplicate records. Your help is needed to clean this up.

    Not sure if you have a solution that works or not.  Are you all set now?

    Also, what are you doing to prevent this type of mistake from happening again?  You are, after all, messing with someone's money. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden - Monday, June 11, 2018 7:35 PM

    mw_sql_developer - Sunday, June 10, 2018 3:00 PM

    To make a long story short, one of our tables that have rx data has got messed up because the files get loaded more than one and as a result we now have duplicate records. Your help is needed to clean this up.

    Not sure if you have a solution that works or not.  Are you all set now?

    Also, what are you doing to prevent this type of mistake from happening again?  You are, after all, messing with someone's money. 😉

    YES I have my own solution, it was posted. 
    What do I do to prevent this in the future: Not my problem... Let the DBA deal with this.

  • mw_sql_developer - Tuesday, June 12, 2018 4:24 AM

    What do I do to prevent this in the future: Not my problem... Let the DBA deal with this.

    I'm so glad your customer's money is in safe hands...

    Thom~

    Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
    Larnu.uk

  • mw_sql_developer - Tuesday, June 12, 2018 4:24 AM

    Jeff Moden - Monday, June 11, 2018 7:35 PM

    mw_sql_developer - Sunday, June 10, 2018 3:00 PM

    To make a long story short, one of our tables that have rx data has got messed up because the files get loaded more than one and as a result we now have duplicate records. Your help is needed to clean this up.

    Not sure if you have a solution that works or not.  Are you all set now?

    Also, what are you doing to prevent this type of mistake from happening again?  You are, after all, messing with someone's money. 😉

    YES I have my own solution, it was posted. 
    What do I do to prevent this in the future: Not my problem... Let the DBA deal with this.

    Really?  Not an attitude I would appreciate hearing from a fellow employee if I worked there.

  • Lynn Pettis - Tuesday, June 12, 2018 8:02 AM

    mw_sql_developer - Tuesday, June 12, 2018 4:24 AM

    Jeff Moden - Monday, June 11, 2018 7:35 PM

    mw_sql_developer - Sunday, June 10, 2018 3:00 PM

    To make a long story short, one of our tables that have rx data has got messed up because the files get loaded more than one and as a result we now have duplicate records. Your help is needed to clean this up.

    Not sure if you have a solution that works or not.  Are you all set now?

    Also, what are you doing to prevent this type of mistake from happening again?  You are, after all, messing with someone's money. 😉

    YES I have my own solution, it was posted. 
    What do I do to prevent this in the future: Not my problem... Let the DBA deal with this.

    Really?  Not an attitude I would appreciate hearing from a fellow employee if I worked there.

    I have told them the issue and they are working on correcting the root cause ( That is they are trying to eliminate the same file getting loaded several times ). However the same claim can be in different files that will result in duplicates again. So for now... I have my code that does the filtering and takes care of the issue ( That will take care of my report ).  I have offered them my code and lets see what approach they will take.  Anyhow I did my part on communicating the issue. Lets them ( who have the power ) take action.

  • mw_sql_developer - Tuesday, June 12, 2018 8:13 AM

    Lynn Pettis - Tuesday, June 12, 2018 8:02 AM

    mw_sql_developer - Tuesday, June 12, 2018 4:24 AM

    Jeff Moden - Monday, June 11, 2018 7:35 PM

    mw_sql_developer - Sunday, June 10, 2018 3:00 PM

    To make a long story short, one of our tables that have rx data has got messed up because the files get loaded more than one and as a result we now have duplicate records. Your help is needed to clean this up.

    Not sure if you have a solution that works or not.  Are you all set now?

    Also, what are you doing to prevent this type of mistake from happening again?  You are, after all, messing with someone's money. 😉

    YES I have my own solution, it was posted. 
    What do I do to prevent this in the future: Not my problem... Let the DBA deal with this.

    Really?  Not an attitude I would appreciate hearing from a fellow employee if I worked there.

    I have told them the issue and they are working on correcting the root cause ( That is they are trying to eliminate the same file getting loaded several times ). However the same claim can be in different files that will result in duplicates again. So for now... I have my code that does the filtering and takes care of the issue ( That will take care of my report ).  I have offered them my code and lets see what approach they will take.  Anyhow I did my part on communicating the issue. Lets them ( who have the power ) take action.

    That's more like what I would have expected from an SQL Developer than your previous answer. 😉

    Your code may work for a "cleanup" after such an accident but there are many rather simple methods to keep this type of thing from happening again.  If they can't suss it, post back.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • mw_sql_developer - Tuesday, June 12, 2018 8:13 AM

    Lynn Pettis - Tuesday, June 12, 2018 8:02 AM

    mw_sql_developer - Tuesday, June 12, 2018 4:24 AM

    Jeff Moden - Monday, June 11, 2018 7:35 PM

    mw_sql_developer - Sunday, June 10, 2018 3:00 PM

    To make a long story short, one of our tables that have rx data has got messed up because the files get loaded more than one and as a result we now have duplicate records. Your help is needed to clean this up.

    Not sure if you have a solution that works or not.  Are you all set now?

    Also, what are you doing to prevent this type of mistake from happening again?  You are, after all, messing with someone's money. 😉

    YES I have my own solution, it was posted. 
    What do I do to prevent this in the future: Not my problem... Let the DBA deal with this.

    Really?  Not an attitude I would appreciate hearing from a fellow employee if I worked there.

    I have told them the issue and they are working on correcting the root cause ( That is they are trying to eliminate the same file getting loaded several times ). However the same claim can be in different files that will result in duplicates again. So for now... I have my code that does the filtering and takes care of the issue ( That will take care of my report ).  I have offered them my code and lets see what approach they will take.  Anyhow I did my part on communicating the issue. Lets them ( who have the power ) take action.

    Much better response than "not my problem."  That is what I would appreciate from a fellow employee.

Viewing 12 posts - 16 through 26 (of 26 total)

You must be logged in to reply to this topic. Login to reply