Query to find which columns have different values in a pair of rows

Question

Post reply

Query to find which columns have different values in a pair of rows

Marios Philippopoulos

SSC Guru

Points: 57190
More actions
December 9, 2011 at 12:46 pm

#264721

I have a table which consists of pairs of records from two sources whose data differ in one or more columns:
..................col1.....col2......col3.....col4....
row1.............'a'........'b'........'c'........'d'.....
row2.............'a'........'b'........'X'........'d'.....
row3.............'e'........'f'.........'g'........'h'.....
row4.............'e'........'Z'........'g'........'h'.....
...
...
In the example above the data in rows 1 and 2 differ in column 3, while rows 3 and 4 differ in column 2.
Is there an elegant way to find out programmatically which columns have different values in each pair of rows?
I know which pairs of rows to compare.
The problem is not knowing how best to extract which columns in each pair have different values.
PS.
I populated the table using this construct:
SELECT * FROM (SELECT * FROM table1
EXCEPT
SELECT * FROM table2) a
UNION ALL
SELECT * FROM (SELECT * FROM table2
EXCEPT
SELECT * FROM table1) b
__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 1

Marios, you know how hard this is without sample data and schema to work from.

Are there identifiers that can be used to group these rows?

Basically, what you're looking to do is distinct the set on whatever needs to be compared, then having count() > 1 on what's left to determine what needs to be reviewed.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 2

Evil Kraig F (12/9/2011)
Marios, you know how hard this is without sample data and schema to work from.
Are there identifiers that can be used to group these rows?
Basically, what you're looking to do is distinct the set on whatever needs to be compared, then having count() > 1 on what's left to determine what needs to be reviewed.

My bad, you are right! 🙂

Here is a script that creates a table and populates it with two records differing in a single column.

How would we be able to programmatically extract which column is the one with the differing data?

Column pairID identifies which rows belong to the same pair.

Create table [dbo].[TblCompare]

(

[TblCompareOID] INT Identity(1,1) NOT NULL,

[pairID] INT,

[col1] INT,

[col2] INT,

[col3] INT,

[col4] INT,

Constraint [PK_TblCompare] Primary Key ([TblCompareOID]) ON [PRIMARY]

)

ON [PRIMARY]

go

INSERT INTO [dbo].[TblCompare]

(

[pairID]

,[col1]

,[col2]

,[col3]

,[col4]

)

VALUES (1,10,20,30,40);

INSERT INTO [dbo].[TblCompare]

(

[pairID]

,[col1]

,[col2]

,[col3]

,[col4]

)

VALUES (1,10,20,666666,40);

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 3

Marios Philippopoulos (12/9/2011)
My bad, you are right! 🙂

LOL, no worries, we all do it occassionally. 😉

Here is a script that creates a table and populates it with two records differing in a single column. How would we be able to programmatically extract which column is the one with the differing data?

Painfully.

Here's the basic method I was discussing:

SELECT

tc.*

FROM

TblCompare AS tc

JOIN

(SELECT

PairID

FROM

(SELECT DISTINCT

tc.*

FROM

tblCompare AS tc

) AS drv

GROUP BY

PairID

HAVING

COUNT(*) > 1

) AS drv2

ONtc.PairID = drv2.PairID

However, if you want to know which columns are the problems, you'll basically have to check each set independently, like so:

WITH Col1Check AS

(SELECT

PairID,

'Col1' AS ColName

FROM

(SELECT DISTINCT

PairID,

Col1

FROM

dbo.TblCompare

) AS drv

GROUP BY

PairID

HAVING

COUNT(*) > 1

),

Col2Check AS

(SELECT

PairID,

'Col2' AS ColName

FROM

(SELECT DISTINCT

PairID,

Col2

FROM

dbo.TblCompare

) AS drv

GROUP BY

PairID

HAVING

COUNT(*) > 1

),

Col3Check AS

(SELECT

PairID,

'Col3' AS ColName

FROM

(SELECT DISTINCT

PairID,

Col3

FROM

dbo.TblCompare

) AS drv

GROUP BY

PairID

HAVING

COUNT(*) > 1

),

Col4Check AS

(SELECT

PairID,

'Col4' AS ColName

FROM

(SELECT DISTINCT

PairID,

Col4

FROM

dbo.TblCompare

) AS drv

GROUP BY

PairID

HAVING

COUNT(*) > 1

)

SELECT PairID, ColName FROM Col1Check

UNION ALL

SELECT PairID, ColName FROM Col2Check

UNION ALL

SELECT PairID, ColName FROM Col3Check

UNION ALL

SELECT PairID, ColName FROM Col4Check

Showing the details of the comparison gets problematic, primarily because you can't be sure you have mere duplication, unless there's business rules I don't know of. When you can have anything from 2 to x possible duplications, you need to adjust the ctes for lookups via RowNumber applications and the like. That will get ugly fast. You're best off just hooking the rows back in for manual review, like so:

SELECT

tc.*,

un.ColName

FROM

TblCompare AS tc

JOIN

(SELECT PairID, ColName FROM Col1Check

UNION ALL

SELECT PairID, ColName FROM Col2Check

UNION ALL

SELECT PairID, ColName FROM Col3Check

UNION ALL

SELECT PairID, ColName FROM Col4Check

) AS un

ONtc.PairID = un.PairID

ORDER BY

tc.PairID

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

LutzM SSC Guru Points: 107049 More actions · Answer 4

Wouldn't the UNPIVOT approach be a little easier?

;

WITH cte AS

(

SELECT *

FROM

(SELECT *

FROM tblCompare) p

UNPIVOT

(Vals FOR col IN

(col1,col2,col3,col4)

)AS unpvt

)

SELECT *

FROM cte cte1

INNER JOIN cte cte2

ON

cte1.pairID=cte2.pairID

AND cte1.col=cte2.col

AND cte1.Vals<>cte2.Vals

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Evil Kraig F SSC Guru Points: 100851 More actions · Answer 5

LutzM (12/9/2011)
Wouldn't the UNPIVOT approach be a little easier?

:blush: Yes.

Pardon me, I need to find a mirror. I know someone who deserves a facepalm and I need to go show him... back shortly.

- Craig Farrell

Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA

LutzM SSC Guru Points: 107049 More actions · Answer 6

Evil Kraig F (12/9/2011)
LutzM (12/9/2011)
Wouldn't the UNPIVOT approach be a little easier?
:blush: Yes.
Pardon me, I need to find a mirror. I know someone who deserves a facepalm and I need to go show him... back shortly.

How did you manage to change the color of your skin in your avatar to show the facepalm-result? 😛

As a side note: Those who say "Never happened to me." may throw the next c.u.r.s.o.r. at us. 😀

Lutz
A pessimist is an optimist with experience.

How to get fast answers to your question[/url]
How to post performance related questions[/url]
Links for Tally Table [/url] , Cross Tabs [/url] and Dynamic Cross Tabs [/url], Delimited Split Function[/url]

Marios Philippopoulos SSC Guru Points: 57190 More actions · Answer 7

Wow, thank you guys, this is great!

I was thinking of UNPIVOT too, but had no clue how to do it.

In the past I have done this comparison with a series of CASE statements, checking for equality of the pairs of columns in each CASE and assigning the outcome to NULL in case of equality and to the value of the 2nd column in the pair in case of inequality.

The results of that I would put in a temp table and then output the final results.

That is of course quite clumsy and painful the more columns you have.

Thanks again!

__________________________________________________________________________________
SQL Server 2016 Columnstore Index Enhancements - System Views for Disk-Based Tables[/url]
Persisting SQL Server Index-Usage Statistics with MERGE[/url]
Turbocharge Your Database Maintenance With Service Broker: Part 2[/url]

Johan Bijnens SSC Guru Points: 135369 More actions · Answer 8

suppose _you_ know the matching criteria, and only want to know if rows defer ... have a look at checksum and only show rows that have non-matching checksum values for a given key (criteria).

e.g.

alter table add[FastTrack_CheckSum] AS (checksum([host_name],[program_name],[nt_domain],[nt_user_name],[login_name],[original_login_name],[client_net_address])) PERSISTED

In this example the checksum column is being persisted to be able to put an index on it to ease data retrieval

Johan

Learn to play, play to learn !

Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:

- How to post Performance Problems
- How to post data/code to get the best help[/url]

- How to prevent a sore throat after hours of presenting ppt

press F1 for solution, press shift+F1 for urgent solution 😀

Need a bit of Powershell? How about this

Who am I ? Sometimes this is me but most of the time this is me

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 9

LutzM (12/9/2011)
Wouldn't the UNPIVOT approach be a little easier?
;
WITH cte AS
(
SELECT *
FROM
(SELECT *
FROM tblCompare) p
UNPIVOT
(Vals FOR col IN
(col1,col2,col3,col4)
)AS unpvt
)
SELECT *
FROM cte cte1
INNER JOIN cte cte2
ON
cte1.pairID=cte2.pairID
AND cte1.col=cte2.col
AND cte1.Vals<>cte2.Vals

🙂

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

1Ankit1 Ten Centuries Points: 1253 More actions · Answer 10

There's the SQL Server "tablediff" utility as well; if it fits your requirements.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 11

Marios Philippopoulos (12/9/2011)
Is there an elegant way to find out programmatically which columns have different values in each pair of rows?

1Ankit1 (12/10/2011)
There's the SQL Server "tablediff" utility as well; if it fits your requirements.

That brings up a good question for Marios. Do you care WHAT the values are or do you only want to know the names of the columns that have differences?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)