Dual Duplicates redress.

  • I have created a stored procedure that pulls personal information from several tables. The output is formatted into XML. I need to manipulate two of the elements to be. I have written a sub-query to return the duplicates within two data fields.I read the base query into

    INSERT INTO #temp_dups SELECT [RowNumber], [Party], [EID], [BID], [Dups]
    FROM (
    SELECT [RowNumber]
    , [Party]
    , [EID]
    , [BID]
    , COUNT(*) OVER (PARTITION BY [Party]) AS [Dups]
    FROM #temp) AS [D]
    WHERE [D].[Dups] > 1
    ORDER BY [RowNumber]

    The data is typical of below:

    RowNumber             Party              EID              BID             Dups

    11                                12345           LLS            NP                  2

    12                                12345          P3V             TD                  2

    33                                 56789          UYL            BS                  2

    34                                 56789          SFM            NP                2

    176                               54080          D6V            IY                  3

    177                               54080         NFC            NF                 3

    178                               54080         USM           NF                 3

     

    Duplicates for a Party can exist in both EID and BID. The minimum case of Dups is 2 and the maximum , so far, of 6 Dups.

    But, obviously, there could be more in either EID or BID or any combination thereof.

    From this data I need to

    1.  Return for each single Party, EID and BID as concatenations of the respective data but do not repeat any data.
    2. Separate each value with a comma. no biggy.
    3. Return only on set of data for each Party. Data representation below.

    Sample of outputs from above.

    Party                           EID                              BID

    12345                    LLS, P3V                      NP, TD

    56789                   UYL, SFM                      NP, BS

    54080                  D6V,  NFC, USM            IY, NF

    I don't know what it is but I am completely brain-farting on this one.  I have never manipulated duplicate data, only deleted it.

     

    Any ideas on how to proceed?

     

    Thanks in advance.

  • Thanks for posting your issue and hopefully someone will answer soon.

    This is an automated bump to increase visibility of your question.

  • Thanks,

     

    This one is difficult. I have gone down many dead end methods on my own. If I come up with it I will post solution. But I suspect it's going to be really ugly.

  • If you would post your data in a readily consumable format, I'm sure someone would jump on this problem and provide a correct solution in about two heart beats.  Please see the first link in my signature line below for one way to provide "Readily Consumable Data".

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Looks like this works. Here's some data... this is what Jeff is talking about... readily consumable data (CREATE TABLE and INSERT SCRIPTS.)

    use tempdb;
    go
    -- set up your table
    CREATE TABLE #SomeData (
     RowNum INT,
     Party INT,
     EID CHAR(3),
     BID CHAR(2),
     Dups TINYINT
    );
    GO
    -- insert data
    INSERT INTO #SomeData VALUES
    (11,12345,'LLS','NP',2)
    ,(12,12345,'P3V','TD',2)
    ,(33,56789,'UYL','BS',2)
    ,(34,56789,'SFM','NP',2)
    ,(176,54080,'D6V','IY',3)
    ,(177,54080,'NFC','NF',3)
    ,(178,54080,'USM','NF',3);

    Now folks can run that and recreate your problem... and solve it. I did it because I'm trying to get my head around STUFF() and FOR XML.

    -- now someone can just copy & paste this
    -- and run it to get your setup.
    SELECT Party
     , EIDs = TRIM(STUFF((
                SELECT ', ' + EID
                FROM #SomeData sd1
       WHERE sd1.Party = sd2.Party
                FOR XML PATH('')
                ), 1, 1, ''))
     , BIDs = TRIM(STUFF((
       SELECT ', ' + BID
       FROM #SomeData sd1
       WHERE sd1.Party = sd2.Party
       FOR XML PATH('')
       ), 1,1, ''))
    FROM #SomeData sd2
    GROUP BY Party;

    • This reply was modified 5 years, 3 months ago by  pietlinden.
    • This reply was modified 5 years, 3 months ago by  pietlinden.
  • You can add GROUP BY to Piet's code to ensure that you get distinct data for each EID and BID.  Also change the 3rd parameter of STUFF from 1 to 2, so that you don't need to TRIM the data

    SELECT Party 
    , EIDs = STUFF((
    SELECT ', ' + sd1.EID
    FROM #SomeData sd1
    WHERE sd1.Party = sd2.Party
    GROUP BY sd1.EID
    FOR XML PATH('')
    ), 1, 2, '')
    , BIDs = STUFF((
    SELECT ', ' + sd1.BID
    FROM #SomeData sd1
    WHERE sd1.Party = sd2.Party
    GROUP BY sd1.BID
    FOR XML PATH('')
    ), 1, 2, '')
    FROM #SomeData sd2
    GROUP BY Party;
  • Thanks. That works fine. But the one issue that remains is there is a duplicate value for Bids in the NF is duplicated and there should be only one. I avoided using a cursor but I am using the DUPS number as an index @max-2 value and testing through all of the values for duplicate and dropping past that.

     

    Actually the requirement for the Bids was dropped and the Eid's look like this:

    SELECT Party 
    , EIDs = LTRIM(STUFF((
    SELECT ', ' + EID
    FROM #SomeData sd1
    WHERE sd1.Party = sd2.Party
    FOR XML PATH('')
    ), 1, 1, ''))
    /* , BIDs = LTRIM(STUFF((
    SELECT ', ' + BID
    FROM #SomeData sd1
    WHERE sd1.Party = sd2.Party
    FOR XML PATH('')
    ), 1,1, '')) */
    FROM #SomeData sd2
    GROUP BY Party;
  • jharvey6 76850 wrote:

    Thanks. That works fine. But the one issue that remains is there is a duplicate value for Bids in the NF is duplicated and there should be only one. I avoided using a cursor but I am using the DUPS number as an index @max-2 value and testing through all of the values for duplicate and dropping past that.

    Actually the requirement for the Bids was dropped and the Eid's look like this:

    SELECT Party 
    , EIDs = LTRIM(STUFF((
    SELECT ', ' + EID
    FROM #SomeData sd1
    WHERE sd1.Party = sd2.Party
    FOR XML PATH('')
    ), 1, 1, ''))
    /* , BIDs = LTRIM(STUFF((
    SELECT ', ' + BID
    FROM #SomeData sd1
    WHERE sd1.Party = sd2.Party
    FOR XML PATH('')
    ), 1,1, '')) */
    FROM #SomeData sd2
    GROUP BY Party;

     

    See my previous reply.  Add GROUP BY to remove the duplicates

  • This appears to work... but my eyesight is terrible, so test it:

    SELECT Party 
    , EIDs = STUFF((
    SELECT DISTINCT ', ' + EID
    FROM #SomeData sd1
    WHERE sd1.Party = sd2.Party
    FOR XML PATH('')
    ), 1, 2, '')
    , BIDs = STUFF((
    SELECT DISTINCT ', ' + BID
    FROM #SomeData sd1
    WHERE sd1.Party = sd2.Party
    FOR XML PATH('')
    ), 1,2, '')
    FROM #SomeData sd2
    GROUP BY Party;

    • This reply was modified 5 years, 3 months ago by  pietlinden. Reason: wrong code!

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply