"Duplicate" records - How to coalesce?

  • Sturev (5/25/2016)


    TheSQLGuru (5/25/2016)You are very welcome. When you have 45000 hours working with SQL Server your brain will probably think like mine too. 😎

    I've been off in force.com land and haven't touched SQL in about 5 years, so needless to say the rust is showing! I'm going to send you a connection request on LinkedIn as there might be some work I can toss your way!

    Thanks again!

    Sounds great - thanks!

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • I need all expected outputs for your sample data.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/25/2016)


    I need all expected outputs for your sample data.

    NEWIDFROMIDTOIDFROMRELATIONTORELATIONNOTESSTATUS

    1CMTNAA0015DQCMTNAA0012QTMother-in-LawSon-in-LawIn, LawActive

    2AMTNAA000QGCAMTNAA000K3SClientVendorSunglassesActive

  • Come now - I know you could have figured this one out by extending what I had!! 😛

    WITH t AS (

    SELECT ROW_NUMBER() OVER (ORDER BY a1.AFFILIATIONID, a1.BACKAFFILID) AS rownum,

    a1.AFFILIATIONID, a1.BACKAFFILID,

    a1.ENTITYID AS FROMRECORDID, a2.ENTITYID AS TORECORDID,

    a1.RELATION AS FROMRELATION, a2.RELATION AS TORELATION,

    a1.NOTES,

    a1.STATUS

    FROM #a a1 INNER JOIN #a a2 ON a1.AFFILIATIONID = a2.BACKAFFILID)

    SELECT rownum / 2 AS NEWRECORDID,

    --AFFILIATIONID, BACKAFFILID, --not needed, here for validation

    FROMRECORDID, TORECORDID,

    FROMRELATION, TORELATION,

    NOTES,

    STATUS

    FROM t

    WHERE rownum %2 = 0

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (5/25/2016)


    Come now - I know you could have figured this one out by extending what I had!! 😛

    You scared the crap out of me with that second post w/DENSE_RANK, CHECKSUM, ETC... 😀

    Looks awesome, I'm going to give that a test run here in a bit! Thanks again for jumping in on this; much appreciated!

  • Sturev (5/25/2016)


    drew.allen (5/25/2016)


    Here is another approach. This has the advantage that it accounts for orphaned records.

    Drew

    Thanks Drew! In this case we don't want the orphans; I know, not ideal but it makes sense in this case.

    In that case, you would just change the LEFT OUTER JOIN to an INNER JOIN and remove the check on IS NULL.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Same as before with the added columns, note the added calculated column and a covering index which improves the performance quite a bit, roughly 5 times faster than the self joining solutions although that comparison is not just as there are no indices there..

    😎

    USE tempdb;

    GO

    SET NOCOUNT ON;

    IF OBJECT_ID(N'tempdb..#SAMPLE_DATA') IS NOT NULL DROP TABLE #SAMPLE_DATA;

    CREATE TABLE #SAMPLE_DATA

    (

    AFFILIATIONID varchar(20) NOT NULL

    ,BACKAFFILID varchar(20) NOT NULL

    ,ENTITYID varchar(20) NOT NULL

    ,RELATION varchar(64) NOT NULL

    ,NOTES varchar(128) NOT NULL

    ,STATUS varchar(32) NOT NULL

    ,GROUP_CHECKSUM AS (CONVERT(BIGINT,CHECKSUM(BACKAFFILID ),0) + CONVERT(BIGINT,CHECKSUM(AFFILIATIONID),0)) PERSISTED

    );

    INSERT INTO #SAMPLE_DATA (AFFILIATIONID , BACKAFFILID , ENTITYID ,RELATION ,NOTES ,STATUS )

    VALUES ('QMTNAA000KMD', 'QMTNAA000KME', 'CMTNAA0015DQ', 'Mother-in-Law', 'In, Law' , 'Active')

    ,('QMTNAA000KME', 'QMTNAA000KMD', 'CMTNAA0012QT', 'Son-in-Law' , 'In, Law' , 'Active')

    ,('QMTNAA000OH1', 'QMTNAA000OH2', 'AMTNAA000QGC', 'Client' , 'Sunglasses', 'Active')

    ,('QMTNAA000OH2', 'QMTNAA000OH1', 'AMTNAA000K3S', 'Vendor' , 'Sunglasses', 'Active')

    ;

    -- POC INDEX FOR THE GROUPING

    CREATE NONCLUSTERED INDEX TMP#SAMPLE_DATA_GROUP_CHECKSUM ON #SAMPLE_DATA(GROUP_CHECKSUM) INCLUDE (ENTITYID,RELATION,NOTES,STATUS);

    -- USING THE CALCULATED COLUMN FOR GROUP RANKING WILL UTILIZE THE INDEX

    ;WITH GROUPED_SET AS

    (

    SELECT

    DENSE_RANK() OVER

    (

    ORDER BY ( SD.GROUP_CHECKSUM )

    ) AS GRP_ID

    ,SD.ENTITYID

    ,SD.RELATION

    ,SD.NOTES

    ,SD.STATUS

    FROM #SAMPLE_DATA SD

    )

    ,ORDERED_GROUP_SET AS

    (

    SELECT

    GS.GRP_ID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY GS.GRP_ID

    ORDER BY (SELECT NULL)

    ) AS ORDER_OF_APPEARANCE

    ,GS.ENTITYID

    ,GS.RELATION

    ,GS.NOTES

    ,GS.STATUS

    FROM GROUPED_SET GS

    )

    SELECT

    OGS.GRP_ID

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 1 THEN OGS.ENTITYID END) AS FROMRECORDID

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 2 THEN OGS.ENTITYID END) AS TORECORDID

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 1 THEN OGS.RELATION END) AS FROMRELATION

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 2 THEN OGS.RELATION END) AS TORELATION

    ,MAX(OGS.NOTES) AS NOTES

    ,MAX(OGS.STATUS) AS STATUS

    FROM ORDERED_GROUP_SET OGS

    GROUP BY OGS.GRP_ID;

    -- USING THE SUM OF THE CHECKSUMS ALSO WILL UTILIZE THE INDEX

    ;WITH GROUPED_SET AS

    (

    SELECT

    DENSE_RANK() OVER

    (

    ORDER BY (

    CONVERT(BIGINT,CHECKSUM(SD.BACKAFFILID ),0)

    + CONVERT(BIGINT,CHECKSUM(SD.AFFILIATIONID),0)

    )

    ) AS GRP_ID

    ,SD.ENTITYID

    ,SD.RELATION

    ,SD.NOTES

    ,SD.STATUS

    FROM #SAMPLE_DATA SD

    )

    ,ORDERED_GROUP_SET AS

    (

    SELECT

    GS.GRP_ID

    ,ROW_NUMBER() OVER

    (

    PARTITION BY GS.GRP_ID

    ORDER BY (SELECT NULL)

    ) AS ORDER_OF_APPEARANCE

    ,GS.ENTITYID

    ,GS.RELATION

    ,GS.NOTES

    ,GS.STATUS

    FROM GROUPED_SET GS

    )

    SELECT

    OGS.GRP_ID

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 1 THEN OGS.ENTITYID END) AS FROMRECORDID

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 2 THEN OGS.ENTITYID END) AS TORECORDID

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 1 THEN OGS.RELATION END) AS FROMRELATION

    ,MAX(CASE WHEN OGS.ORDER_OF_APPEARANCE = 2 THEN OGS.RELATION END) AS TORELATION

    ,MAX(OGS.NOTES) AS NOTES

    ,MAX(OGS.STATUS) AS STATUS

    FROM ORDERED_GROUP_SET OGS

    GROUP BY OGS.GRP_ID;

    Output (same for both queries)

    GRP_ID FROMRECORDID TORECORDID FROMRELATION TORELATION NOTES STATUS

    -------- -------------- --------------- --------------- -------------- ------------ --------

    1 AMTNAA000QGC AMTNAA000K3S Client Vendor Sunglasses Active

    2 CMTNAA0015DQ CMTNAA0012QT Mother-in-Law Son-in-Law In, Law Active

  • Sturev (5/25/2016)


    TheSQLGuru (5/25/2016)


    Come now - I know you could have figured this one out by extending what I had!! 😛

    You scared the crap out of me with that second post w/DENSE_RANK, CHECKSUM, ETC... 😀

    Looks awesome, I'm going to give that a test run here in a bit! Thanks again for jumping in on this; much appreciated!

    That other example wasn't my solution - that was Eirikur Eiriksson. He likes to show off sometimes. :hehe:

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

Viewing 8 posts - 16 through 22 (of 22 total)

You must be logged in to reply to this topic. Login to reply