Query to find which columns have different values in a pair of rows

  • I have a table which consists of pairs of records from two sources whose data differ in one or more columns:

    ..................col1.....col2......col3.....col4....

    row1.............'a'........'b'........'c'........'d'.....

    row2.............'a'........'b'........'X'........'d'.....

    row3.............'e'........'f'.........'g'........'h'.....

    row4.............'e'........'Z'........'g'........'h'.....

    ...

    ...

    In the example above the data in rows 1 and 2 differ in column 3, while rows 3 and 4 differ in column 2.

    Is there an elegant way to find out programmatically which columns have different values in each pair of rows?

    I know which pairs of rows to compare.

    The problem is not knowing how best to extract which columns in each pair have different values.

    PS.

    I populated the table using this construct:

    SELECT * FROM (SELECT * FROM table1

    EXCEPT

    SELECT * FROM table2) a

    UNION ALL

    SELECT * FROM (SELECT * FROM table2

    EXCEPT

    SELECT * FROM table1) b

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Marios, you know how hard this is without sample data and schema to work from.

    Are there identifiers that can be used to group these rows?

    Basically, what you're looking to do is distinct the set on whatever needs to be compared, then having count() > 1 on what's left to determine what needs to be reviewed.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (12/9/2011)


    Marios, you know how hard this is without sample data and schema to work from.

    Are there identifiers that can be used to group these rows?

    Basically, what you're looking to do is distinct the set on whatever needs to be compared, then having count() > 1 on what's left to determine what needs to be reviewed.

    My bad, you are right! 🙂

    Here is a script that creates a table and populates it with two records differing in a single column.

    How would we be able to programmatically extract which column is the one with the differing data?

    Column pairID identifies which rows belong to the same pair.

    Create table [dbo].[TblCompare]

    (

    [TblCompareOID] INT Identity(1,1) NOT NULL,

    [pairID] INT,

    [col1] INT,

    [col2] INT,

    [col3] INT,

    [col4] INT,

    Constraint [PK_TblCompare] Primary Key ([TblCompareOID]) ON [PRIMARY]

    )

    ON [PRIMARY]

    go

    INSERT INTO [dbo].[TblCompare]

    (

    [pairID]

    ,[col1]

    ,[col2]

    ,[col3]

    ,[col4]

    )

    VALUES (1,10,20,30,40);

    INSERT INTO [dbo].[TblCompare]

    (

    [pairID]

    ,[col1]

    ,[col2]

    ,[col3]

    ,[col4]

    )

    VALUES (1,10,20,666666,40);

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • Marios Philippopoulos (12/9/2011)


    My bad, you are right! 🙂

    LOL, no worries, we all do it occassionally. 😉

    Here is a script that creates a table and populates it with two records differing in a single column. How would we be able to programmatically extract which column is the one with the differing data?

    Painfully.

    Here's the basic method I was discussing:

    SELECT

    tc.*

    FROM

    TblCompare AS tc

    JOIN

    (SELECT

    PairID

    FROM

    (SELECT DISTINCT

    tc.*

    FROM

    tblCompare AS tc

    ) AS drv

    GROUP BY

    PairID

    HAVING

    COUNT(*) > 1

    ) AS drv2

    ONtc.PairID = drv2.PairID

    However, if you want to know which columns are the problems, you'll basically have to check each set independently, like so:

    WITH Col1Check AS

    (SELECT

    PairID,

    'Col1' AS ColName

    FROM

    (SELECT DISTINCT

    PairID,

    Col1

    FROM

    dbo.TblCompare

    ) AS drv

    GROUP BY

    PairID

    HAVING

    COUNT(*) > 1

    ),

    Col2Check AS

    (SELECT

    PairID,

    'Col2' AS ColName

    FROM

    (SELECT DISTINCT

    PairID,

    Col2

    FROM

    dbo.TblCompare

    ) AS drv

    GROUP BY

    PairID

    HAVING

    COUNT(*) > 1

    ),

    Col3Check AS

    (SELECT

    PairID,

    'Col3' AS ColName

    FROM

    (SELECT DISTINCT

    PairID,

    Col3

    FROM

    dbo.TblCompare

    ) AS drv

    GROUP BY

    PairID

    HAVING

    COUNT(*) > 1

    ),

    Col4Check AS

    (SELECT

    PairID,

    'Col4' AS ColName

    FROM

    (SELECT DISTINCT

    PairID,

    Col4

    FROM

    dbo.TblCompare

    ) AS drv

    GROUP BY

    PairID

    HAVING

    COUNT(*) > 1

    )

    SELECT PairID, ColName FROM Col1Check

    UNION ALL

    SELECT PairID, ColName FROM Col2Check

    UNION ALL

    SELECT PairID, ColName FROM Col3Check

    UNION ALL

    SELECT PairID, ColName FROM Col4Check

    Showing the details of the comparison gets problematic, primarily because you can't be sure you have mere duplication, unless there's business rules I don't know of. When you can have anything from 2 to x possible duplications, you need to adjust the ctes for lookups via RowNumber applications and the like. That will get ugly fast. You're best off just hooking the rows back in for manual review, like so:

    SELECT

    tc.*,

    un.ColName

    FROM

    TblCompare AS tc

    JOIN

    (SELECT PairID, ColName FROM Col1Check

    UNION ALL

    SELECT PairID, ColName FROM Col2Check

    UNION ALL

    SELECT PairID, ColName FROM Col3Check

    UNION ALL

    SELECT PairID, ColName FROM Col4Check

    ) AS un

    ONtc.PairID = un.PairID

    ORDER BY

    tc.PairID


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Wouldn't the UNPIVOT approach be a little easier?

    ;

    WITH cte AS

    (

    SELECT *

    FROM

    (SELECT *

    FROM tblCompare) p

    UNPIVOT

    (Vals FOR col IN

    (col1,col2,col3,col4)

    )AS unpvt

    )

    SELECT *

    FROM cte cte1

    INNER JOIN cte cte2

    ON

    cte1.pairID=cte2.pairID

    AND cte1.col=cte2.col

    AND cte1.Vals<>cte2.Vals



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • LutzM (12/9/2011)


    Wouldn't the UNPIVOT approach be a little easier?

    :blush: Yes.

    Pardon me, I need to find a mirror. I know someone who deserves a facepalm and I need to go show him... back shortly.


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • Evil Kraig F (12/9/2011)


    LutzM (12/9/2011)


    Wouldn't the UNPIVOT approach be a little easier?

    :blush: Yes.

    Pardon me, I need to find a mirror. I know someone who deserves a facepalm and I need to go show him... back shortly.

    How did you manage to change the color of your skin in your avatar to show the facepalm-result? 😛

    As a side note: Those who say "Never happened to me." may throw the next c.u.r.s.o.r. at us. 😀



    Lutz
    A pessimist is an optimist with experience.

    How to get fast answers to your question[/url]
    How to post performance related questions[/url]
    Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

  • Wow, thank you guys, this is great!

    I was thinking of UNPIVOT too, but had no clue how to do it.

    In the past I have done this comparison with a series of CASE statements, checking for equality of the pairs of columns in each CASE and assigning the outcome to NULL in case of equality and to the value of the 2nd column in the pair in case of inequality.

    The results of that I would put in a temp table and then output the final results.

    That is of course quite clumsy and painful the more columns you have.

    Thanks again!

    __________________________________________________________________________________
    SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
    Persisting SQL Server Index-Usage Statistics with MERGE[/url]
    Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

  • suppose _you_ know the matching criteria, and only want to know if rows defer ... have a look at checksum and only show rows that have non-matching checksum values for a given key (criteria).

    e.g.

    alter table add[FastTrack_CheckSum] AS (checksum([host_name],[program_name],[nt_domain],[nt_user_name],[login_name],[original_login_name],[client_net_address])) PERSISTED

    In this example the checksum column is being persisted to be able to put an index on it to ease data retrieval

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • LutzM (12/9/2011)


    Wouldn't the UNPIVOT approach be a little easier?

    ;

    WITH cte AS

    (

    SELECT *

    FROM

    (SELECT *

    FROM tblCompare) p

    UNPIVOT

    (Vals FOR col IN

    (col1,col2,col3,col4)

    )AS unpvt

    )

    SELECT *

    FROM cte cte1

    INNER JOIN cte cte2

    ON

    cte1.pairID=cte2.pairID

    AND cte1.col=cte2.col

    AND cte1.Vals<>cte2.Vals

    🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • There's the SQL Server "tablediff" utility as well; if it fits your requirements.

  • Marios Philippopoulos (12/9/2011)


    Is there an elegant way to find out programmatically which columns have different values in each pair of rows?

    1Ankit1 (12/10/2011)


    There's the SQL Server "tablediff" utility as well; if it fits your requirements.

    That brings up a good question for Marios. Do you care WHAT the values are or do you only want to know the names of the columns that have differences?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply