Comparing columns in two tables for unequal values

  • Hi,

    We have a double data entry process and want to compare the entries of one record in Table A to the duplicated entries in Table B (identical to TableA). What I want to do is compare a record from each table by an uniqueID and retain only columns where the values are not equal. I want to keep these results and save them to a table (TableC). TableC does not need to be defined in the solution here.

    I could easliy do this in a VBA script or, dare I say...a CURSOR!!!!!!, but I'd like to see if there is a better way and keep this back on the server and out of the application.

    I've looked a INTERSECT and EXCEPT, but they return the both whole records.

    Thanks in advance,

  • http://msdn.microsoft.com/en-us/library/ms180026.aspx

    This should do.

  • Roughly:

    DELETE a

    OUTPUT tblc

    FROM

    tblA as a

    JOIN

    tblB as b

    ON a.col1 = b.col1 AND a.col2 = b.col2 ....


    - Craig Farrell

    Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.

    For better assistance in answering your questions[/url] | Forum Netiquette
    For index/tuning help, follow these directions.[/url] |Tally Tables[/url]

    Twitter: @AnyWayDBA

  • mapperhd (9/30/2011)


    Hi,

    We have a double data entry process and want to compare the entries of one record in Table A to the duplicated entries in Table B (identical to TableA). What I want to do is compare a record from each table by an uniqueID and retain only columns where the values are not equal. I want to keep these results and save them to a table (TableC). TableC does not need to be defined in the solution here.

    I could easliy do this in a VBA script or, dare I say...a CURSOR!!!!!!, but I'd like to see if there is a better way and keep this back on the server and out of the application.

    I've looked a INTERSECT and EXCEPT, but they return the both whole records.

    Thanks in advance,

    A small example, is this what you are looking for? Or something else - your question is not 100% clear.

    INSERT INTO tableC(empID, aName, bName, aSalary, bSalary)

    SELECT a.empID,a.empName,b.empName,a.empSalary,b.empsalary

    FROM tableA a

    INNER JOIN tableB b

    ON a.empID = b.empid

    WHERE a.empName <> b.empName or a.empSalary <> b.empSalary

    http://sqlvince.blogspot.com/[/url]

  • Thanks for the replies.

    However, I need to keep only the columns that are not equal. Let's say there are 12 columns in the two records being compared. If columns 4 and 9 in record 1 do not have equal values to the same columns in record 2 how do I keep only those two columns (and the ID) to insert into another table?

    Sorry if I wan't very clear in my first post and I hope this helps to clarify.

    Thanks again for responses.

  • mapperhd (10/2/2011)


    Thanks for the replies.

    However, I need to keep only the columns that are not equal. Let's say there are 12 columns in the two records being compared. If columns 4 and 9 in record 1 do not have equal values to the same columns in record 2 how do I keep only those two columns (and the ID) to insert into another table?

    Sorry if I wan't very clear in my first post and I hope this helps to clarify.

    Thanks again for responses.

    You could potentially end up with a lot of tables with differences that way: you would need a new table for every unique combination of columns with differences in it. Are you sure you want to have just the columns with differences in it? And not simply the entire rows that you've found to contain at least one difference? Or do you just want the names of the columns that have differences for each set of 2 rows that has at least one different column between them?

    The last one is fairly easy (I have created some DDL plus sample data for you, please have a look at the link in my footer text for more info on how and WHY to provide this yourself the next time you have a question):

    declare @table1 table (

    ID int not null primary key,

    col1 varchar(10) not null,

    col2 varchar(200) null,

    col3 int not null,

    col4 datetime null

    );

    declare @table2 table (

    ID int not null primary key,

    col1 varchar(10) not null,

    col2 varchar(200) null,

    col3 int not null,

    col4 datetime null

    );

    insert @table1(ID, col1, col2, col3, col4)

    select 1, 'test1', 'some longer value', 23, {d '1900-01-01'} union all

    select 2, 'test2', 'some longer value', 23, {d '1900-01-01'} union all

    select 3, 'test3', 'some longer value', 23, {d '1900-01-01'}

    insert @table2(ID, col1, col2, col3, col4)

    select 1, 'test1', 'some longer value', 23, {d '1900-01-01'} union all

    select 2, 'test21', 'some longer value', 23, {d '1900-01-02'} union all

    select 3, 'test4', 'another longer value', 22, {d '1900-01-01'} union all

    select 4, 'test4', 'another longer value', 22, {d '1900-01-01'}

    select diff.column_names, t1.*, t2.*

    from @table1 t1

    full outer join @table2 t2 on (t2.ID = t1.ID)

    cross apply (

    select stuff((

    select ', ' + t.name as [text()]

    from (

    select 'ID' as name where t1.ID is null or t2.ID is null

    union all select 'col1' where not t1.col1 = t2.col1

    union all select 'col2' where not ((t1.col2 is null and t2.col2 is null) or (t1.col2 = t2.col2))

    union all select 'col3' where not t1.col3 = t2.col3

    union all select 'col4' where not ((t1.col4 is null and t2.col4 is null) or (t1.col4 = t2.col4))

    ) t

    for xml path(''), type

    ).value('.','varchar(8000)'),1, 2, '') as column_names

    ) diff

    where diff.column_names is not null

    Which outputs:

    column_namesIDcol1col2col3col4IDcol1col2col3col4

    col1, col42test2some longer value231900-01-01 00:00:00.0002test21some longer value231900-01-02 00:00:00.000

    col1, col2, col33test3some longer value231900-01-01 00:00:00.0003test4another longer value221900-01-01 00:00:00.000

    IDNULLNULLNULLNULLNULL4test4another longer value221900-01-01 00:00:00.000



    Posting Data Etiquette - Jeff Moden[/url]
    Posting Performance Based Questions - Gail Shaw[/url]
    Hidden RBAR - Jeff Moden[/url]
    Cross Tabs and Pivots - Jeff Moden[/url]
    Catch-all queries - Gail Shaw[/url]


    If you don't have time to do it right, when will you have time to do it over?

  • mapperhd (10/2/2011)


    Thanks for the replies.

    However, I need to keep only the columns that are not equal.

    The problem with this is that it forces you to violate the first normal form for database design: All records must have the same number of fields.

    For example, say you want to keep the record ID, the column name, and the values from each of the tables. With one column different, you want to keep a total of four columns; with two columns different, you want to keep a total of seven columns, etc. As you can see, this violates first normal form, because the records don't have the same number of fields.

    There are three basic ways that you can resolve this situation.

    1) Remove the spec to keep only columns that are different, so that you are keeping the same number of columns for each record.

    2) Store the column difference information in a single XML field, since XML doesn't have the same constraints as a database. (The problem with this being that you will still need to revert it to first normal form in order to work with it further.)

    3) "PIVOT" the information so that you have one row for each column that is different. So, if one column is different, you would have one row for that record; with two columns different, you would need two rows, etc. (Again, you would need to unpivot it to work with it further.)

    Which approach you use depends on how you are planning to use this data. Certainly, the first approach is the simplest.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • R.P.

    Thank you, your post was exactly what I was looking for.

  • If it helps, here is a script I use that creates a script to run to find the differences between the records in the same table in two different databases (like if there is a restore of a previous backup to a test database name to check for differences in the databases).

    It creates a script, so just copy the printed output to a new query window and run it.

    /* ====================================================================================================================== */

    /*

    This script creates a script to find the differences in two tables from two databases on the same instance (like a

    test version of a database, being tested by another group of people or with another program).

    Also creates the scripts to find records in one table that are not in the other, for both tables.

    After run, just copy the script that is printed to another query window and run it (after making any necessary adjustments to the code).

    Values to SET:

    ===============

    Just set the database context (just below), Schema Name, Table Name, and Database Names for the two databases (in the "Values to change by User" section).

    */

    USE MainDatabase-- SET THIS context to the primary database for the extraction of the table definitions

    GO

    DECLARE @SN varchar(200)

    DECLARE @TN varchar(200)

    DECLARE @db1 varchar(200)

    DECLARE @db2 varchar(200)

    DECLARE @collist varchar(4000)

    DECLARE @collist2 varchar(4000)

    DECLARE @ordinalposition int

    DECLARE @schemaname varchar(200)

    DECLARE @tablename varchar(200)

    DECLARE @dataelement varchar(200)

    DECLARE @datatype varchar(200)

    DECLARE @maxlen char(10)

    DECLARE @prec char(10)

    DECLARE @scale char(10)

    DECLARE @comparevals varchar(8000)

    DECLARE @ds varchar(8000)

    DECLARE @wow varchar(400)

    DECLARE @numelements int

    /* ============================================================================================== */

    /* Values to change by User */

    SET @SN = 'dbo'--Schema Name

    SET @TN = 'SomeTableName'--Table Name

    SET @db1 = 'MainDatabase'--Database 1 (a)

    SET @db2 = 'TestDatabase'--Database 2 (b)

    /* ============================================================================================== */

    SET @collist = ''--leave blank

    SET @collist2 = ''--leave blank

    SET NOCOUNT ON

    SELECT ORDINAL_POSITION AS 'OrdinalPosition',

    TABLE_SCHEMA AS 'SchemaName',

    TABLE_NAME AS 'TableName',

    COLUMN_NAME AS 'DataElement', DATA_TYPE AS 'DataType',

    ISNULL(CONVERT(char(10),CHARACTER_MAXIMUM_LENGTH),'') AS 'MaxLen', ISNULL(CONVERT(char(10),NUMERIC_PRECISION),'') AS 'Prec', ISNULL(CONVERT(char(10),NUMERIC_SCALE),'') AS 'Scale',

    CONVERT(varchar(200),'') as ColNames,

    CONVERT(varchar(400),'') as CompareVals

    INTO #dummytable1

    FROM information_schema.columns gg

    WHERE TABLE_SCHEMA=@sn AND TABLE_NAME = @TN

    ORDER BY [SchemaName], TABLE_CATALOG, TABLE_NAME, ORDINAL_POSITION

    SELECT @collist2=@collist2 + DataElement + ',' FROM #dummytable1 ORDER BY OrdinalPosition

    SET @collist2 = SUBSTRING(@collist2,1,LEN(@collist2)-1)

    UPDATE #dummytable1

    SET ColNames = 'a.' + DataElement + ' AS a' + DataElement + ',b.' + DataElement + ' AS b' + DataElement + ','

    SELECT @collist=@collist + ColNames FROM #dummytable1 ORDER BY OrdinalPosition

    SET @collist = SUBSTRING(@collist,1,LEN(@collist)-1)

    SELECT @numelements=MAX(OrdinalPosition) FROM #dummytable1

    /* ============================================================================================== */

    DECLARE backupFiles CURSOR FOR

    SELECT OrdinalPosition,SchemaName,TableName,DataElement,DataType,MaxLen,Prec,Scale

    FROM #dummytable1

    ORDER BY OrdinalPosition

    OPEN backupFiles

    -- Loop through all the files for the database

    FETCH NEXT FROM backupFiles INTO @ordinalposition,@schemaname,@tablename,@dataelement,@datatype,@maxlen,@prec,@scale

    WHILE @@FETCH_STATUS = 0

    BEGIN

    SET @ds = ''

    if @datatype = 'char' OR @datatype='nchar' or @datatype='varchar' or @datatype='nvarchar' or @datatype = 'varbinary'

    SET @ds = 'ISNULL(a.' + @dataelement + ','''') <> ISNULL(b.' + @dataelement + ','''')'

    if @datatype = 'datetime'

    SET @ds = 'ISNULL(a.' + @dataelement + ',CONVERT(datetime,''01/01/1800'')) <> ISNULL(b.' + @dataelement +',CONVERT(datetime,''01/01/1800''))'

    if @datatype = 'int' OR @datatype = 'smallint' OR @datatype = 'bigint' OR @datatype='tinyint'

    SET @ds = 'ISNULL(a.' + @dataelement + ',0) <> ISNULL(b.' + @dataelement + ',0)'

    if @datatype = 'decimal'

    SET @ds = 'ISNULL(a.' + @dataelement + ',0.0) <> ISNULL(b.' + @dataelement + ',0.0)'

    if @datatype = 'money'

    SET @ds = 'ISNULL(a.' + @dataelement + ',0.0) <> ISNULL(b.' + @dataelement + ',0.0)'

    if @datatype = 'bit'

    SET @ds = 'ISNULL(a.' + @dataelement + ',0) <> ISNULL(b.' + @dataelement + ',0)'

    if @datatype = 'text'

    SET @ds = 'ISNULL(CONVERT(varchar(4000),a.' + @dataelement + '),'''') <> ISNULL(CONVERT(varchar(4000),b.' + @dataelement + '),'''')'

    if @ds = '' --AND @datatype <> 'text'

    SET @ds = 'a.' + @dataelement + ' <> b.' + @dataelement

    if @ds <> ''

    BEGIN

    UPDATE #dummytable1

    SET CompareVals=@ds

    WHERE OrdinalPosition=@ordinalPosition

    END

    FETCH NEXT FROM backupFiles INTO @ordinalposition,@schemaname,@tablename,@dataelement,@datatype,@maxlen,@prec,@scale

    END

    CLOSE backupFiles

    DEALLOCATE backupFiles

    /* ============================================================================================== */

    PRINT ' '

    PRINT ' '

    PRINT '-- Check for different values for the records that are common to both databases (a=' + @db1 + ',b=' + @db2 + ') for table ' + @SN + '.' + @TN

    SET @wow = (SELECT DataElement FROM #dummytable1 WHERE OrdinalPosition=1)

    PRINT 'SELECT ' + @collist

    PRINT 'FROM ' + @db1 + '.' + @SN + '.' + @TN + ' a JOIN ' + @db2 + '.' + @SN + '.' + @TN + ' b'

    PRINT 'ON a.' + @wow + ' = b.' + @wow + ' --change this PK if necessary'

    PRINT 'WHERE'

    DECLARE backupFiles CURSOR FOR

    SELECT OrdinalPosition,SchemaName,TableName,DataElement,DataType,MaxLen,Prec,Scale,CompareVals

    FROM #dummytable1

    ORDER BY OrdinalPosition

    OPEN backupFiles

    -- Loop through all the files for the database

    FETCH NEXT FROM backupFiles INTO @ordinalposition,@schemaname,@tablename,@dataelement,@datatype,@maxlen,@prec,@scale,@comparevals

    WHILE @@FETCH_STATUS = 0

    BEGIN

    if @ordinalposition=@numelements

    SET @ds = @comparevals

    else

    SET @ds = @comparevals + ' OR'

    if @comparevals <> ''

    PRINT @ds

    FETCH NEXT FROM backupFiles INTO @ordinalposition,@schemaname,@tablename,@dataelement,@datatype,@maxlen,@prec,@scale,@comparevals

    END

    CLOSE backupFiles

    DEALLOCATE backupFiles

    --SELECT * FROM #dummytable1

    PRINT ''

    PRINT '--Find records in db ' + @db1 + ' that are not in db ' + @db2

    PRINT 'SELECT ' + @collist2

    PRINT 'FROM ' + @db1 + '.' + @SN + '.' + @TN

    PRINT 'WHERE ' + @wow + ' NOT IN (SELECT a.' + @wow + ' FROM ' + @db2 + '.' + @SN + '.' + @TN + ' a)'

    PRINT ''

    PRINT ''

    PRINT '--Find records in db ' + @db2 + ' that are not in db ' + @db1

    PRINT 'SELECT ' + @collist2

    PRINT 'FROM ' + @db2 + '.' + @SN + '.' + @TN

    PRINT 'WHERE ' + @wow + ' NOT IN (SELECT b.' + @wow + ' FROM ' + @db1 + '.' + @SN + '.' + @TN + ' b)'

    DROP TABLE #dummytable1

    SET NOCOUNT OFF

  • Note that the script assumes that the structure of the table in the two databases hasn't changed.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply