How do you compare two big tables in same database on the same server?

  • Now I was asked to run a compare on the data. I created a table and ran the cursor. It stored data in table1.

    I think created another table and ran my new script. It stores data in table2.

    Now I want to compare the tables to insure they are they same data.

    Challenges:

    1) There are over 12 million rows in each table

    2) There are 15 columns in each table

    Does anyone know how to compare two tables such as these? I tried the red gate products. However, they only compare tablea in database1 to tablea in database2. My two tables are in the same database on the same server.

    I am also using SQL Server 2000 sp4.

    Thanks,

    Tony

    Things will work out.  Get back up, change some parameters and recode.

  • How about something like this for a start (I'm using AdventureWorks for SQL Server 2005, but the code is valid for SQL 2000):

    -- create a copy of a table in the same database

    SELECT * INTO dbo.Department FROM HumanResources.Department AS D

    -- FULL OUTER JOIN using CHECKSUM() - should be the same if data is the same.

    SELECT

    *

    FROM

    (

    SELECT DepartmentID, CHECKSUM(*) AS tableA_CheckSum FROM HumanResources.Department

    ) AS D FULL OUTER JOIN

    (

    SELECT DepartmentID, CHECKSUM(*) AS tableB_CheckSum FROM dbo.Department

    ) AS D2 ON D.tableA_CheckSum = D2.tableB_CheckSum

    -- change a row to see if CHECKSUM finds the difference

    UPDATE dbo.Department

    SET GroupName = 'Test'

    WHERE

    DepartmentID = 1

    -- re-query using CHECKSUM to see the difference.

    SELECT

    *

    FROM

    (

    SELECT DepartmentID, CHECKSUM(*) AS tableA_CheckSum FROM HumanResources.Department

    ) AS D FULL OUTER JOIN

    (

    SELECT DepartmentID, CHECKSUM(*) AS tableB_CheckSum FROM dbo.Department

    ) AS D2 ON D.tableA_CheckSum = D2.tableB_CheckSum

  • CHECKSUM is not good enough.

    Different records may have the same CHECKSUM.

    _____________
    Code for TallyGenerator

  • Hello,

    Thanks for responding.

    I need to compare two tables that are in existence. I saw this example of using Checksum after an update statement. I watched the video also.

    But I don't see how that tells me how to compare two tables? The example seems to show if a change is made to one table.

    Again, I need to compare two existing tables.

    Thanks.

    Tony

    Things will work out.  Get back up, change some parameters and recode.

  • Does anyone know how to compare two tables such as these? I tried the red gate products. However, they only compare tablea in database1 to tablea in database2. My two tables are in the same database on the same server.

    I am also using SQL Server 2000 sp4.

    Thanks,

    Tony

    Hi,

    SQL Data Compare does let you compare differently named tables. Simply Edit the Project and go to the Remap Objects tab. This lets you unmap objects and remap differently named ones.

    Hope this helps!

    Kind regards,

    David Atkinson

    Red Gate Software

  • Sergiy (5/11/2009)


    CHECKSUM is not good enough.

    Different records may have the same CHECKSUM.

    I agree that the checksum function will produce the same number for different rows.

    However, the chance of a CHECKSUM being the same for rows with the same PK and different data is so remote that I wouldn't hesitate to use it.

    Here is a sample to compare two tables:--Setup Sampel Data

    DECLARE @TableA TABLE (ID INT PRIMARY KEY, Val1 INT, Val2 VARCHAR(50))

    DECLARE @TableB TABLE (ID INT PRIMARY KEY, Val1 INT, Val2 VARCHAR(50))

    INSERT @TableA

    SELECT 1, 1 , 'foo'

    UNION ALL SELECT 2, 9, 'bar'

    UNION ALL SELECT 3, 9, 'bar'

    UNION ALL SELECT 4, 8, 'foo'

    UNION ALL SELECT 5, 7, 'jibby'

    INSERT @TableB

    SELECT 1, 1 , 'foo1' -- Changed

    UNION ALL SELECT 2, 9, 'bar'

    UNION ALL SELECT 3, 9, 'bar'

    UNION ALL SELECT 4, 7, 'foo' -- Changed

    UNION ALL SELECT 5, 7, 'jibby'

    -- Perform comparison

    SELECT *

    FROM

    (

    SELECT ID, CHECKSUM(*) AS CheckA

    FROM @TableA

    ) AS A

    FULL OUTER JOIN

    (

    SELECT ID, CHECKSUM(*) AS CheckB

    FROM @TableB

    ) AS B

    ON A.ID = B.ID

    WHERE

    A.CheckA B.CheckB

    Your tables aren't that big, but if you run into performance issues I'd suggest:

    1. Create to temp tables with the PK and the Checksum (indexed) and you could join those together.

    2. Pull out chunks for comparison (either as a query or to temp tables). Meaning, if you have a PK that is an INT you could just look for PK values 1 - 1Million, then look for 1million - 2 million.. etc..

    Cheers!

  • Like this:

    -- Find Rows in TableA that are not in TableB

    Select * From TableA

    Except

    Select * From TableB

    -- Find Rows in TableB that are not in TableA

    Select * From TableB

    Except

    Select * From TableA

    [font="Times New Roman"]-- RBarryYoung[/font], [font="Times New Roman"] (302)375-0451[/font] blog: MovingSQL.com, Twitter: @RBarryYoung[font="Arial Black"]
    Proactive Performance Solutions, Inc.
    [/font]
    [font="Verdana"] "Performance is our middle name."[/font]

  • I need to run this for 15 columns. Folks keep posting this same checksum example with one column.

    Things will work out.  Get back up, change some parameters and recode.

  • I really appreciate the responding guys, but I don't think SQL Server 2000 can do the except.

    Thanks

    Tony

    Things will work out.  Get back up, change some parameters and recode.

  • I tried using the data compare. I unmapped the columns and then mapped the two I needed.

    However, when reviewing results, it doesn't give me any. It is trying to do a row by row comparison on the same tables. Not the two I mapped.

    I didn't get a comparison.

    Thank you for suggesting the unmapping though.

    Tony

    Things will work out.  Get back up, change some parameters and recode.

  • WebTechie38 (5/12/2009)


    I tried using the data compare. I unmapped the columns and then mapped the two I needed.

    However, when reviewing results, it doesn't give me any. It is trying to do a row by row comparison on the same tables. Not the two I mapped.

    I didn't get a comparison.

    Thank you for suggesting the unmapping though.

    Tony

    use TableDiff.exe which is exist in C:\Program files\Microsoft SQL Server\90\COM.

    for more info see SQL BOL.

  • WebTechie38 (5/12/2009)


    I need to run this for 15 columns. Folks keep posting this same checksum example with one column.

    "Select *" is not one column it is all the columns,

  • You know I've met some great people on this forum and some I truly look up to as mentors.

    Then there are times people post things and you are not sure how to respond.

    Yes, I believe select * is more than one column. Thank you.

    SELECT

    *

    FROM

    (

    SELECT DepartmentID, CHECKSUM(*) AS tableA_CheckSum FROM HumanResources.Department

    ) AS D FULL OUTER JOIN

    (

    SELECT DepartmentID, CHECKSUM(*) AS tableB_CheckSum FROM dbo.Department

    ) AS D2 ON D.tableA_CheckSum = D2.tableB_CheckSum

    But if I understand the above query, and granted I am still learning, we are looking at one column (DepartmentID). That my friend is one column.

    I was simply stating that I need to compare all the columns. In any event, I appreciate you taking the time to respond to my original question.

    Tony

    Things will work out.  Get back up, change some parameters and recode.

  • I included DepartmentID in the output assuming it is a unique key so you could include it in the join criteria because of the possibility, however slight, that duplicate checksums are created. It is highly unlikely that there would be a duplicate checksum for the same id if there is a change in other data. The checksum is across all columns (checksum(*)). The comparison would only be across 1 column if the code was checksum(DepartmentID).

    Please note that in the FULL OUTER JOIN the join is on the checksum columns, and if you run the entire test I provided, you see that for the row I modified you get 2 rows, 1 with null columns for tableB and 1 with null columns for tableA as the checksum values are different. Note the bolded portions of the code:

    SELECT

    *

    FROM

    (

    SELECT DepartmentID, CHECKSUM(*) AS tableA_CheckSum FROM HumanResources.Department

    ) AS D FULL OUTER JOIN

    (

    SELECT DepartmentID, CHECKSUM(*) AS tableB_CheckSum FROM dbo.Department

    ) AS D2 ON D.tableA_CheckSum = D2.tableB_CheckSum

    Because the join is on the created Checksum columns the comparison is across all columns. If the join was on DepartmentID then the comparison would be for 1 column.

    I wish I had remembered the tablediff utility provided as it is really the tool you should use to do the comparison. Kudos to Kishore for bringing it up.

    Edit: Added code with bolding.

  • WebTechie38 (5/12/2009)


    I need to run this for 15 columns. Folks keep posting this same checksum example with one column.

    Check out BOL for CHECKSUM, and it should make sense why you would use CHECKSUM(*).

    Greg
    _________________________________________________________________________________________________
    The glass is at one half capacity: nothing more, nothing less.

Viewing 15 posts - 1 through 15 (of 23 total)

You must be logged in to reply to this topic. Login to reply