Select Latest record by comparing 2 tables.

  • Hello,

    I have 2 identical tables with almost identical data(Few field values will vary).

    Both tables will get updated regularly with time stamp.

    I want to get the latest updated row by comparing these 2 tables based on time stamp.

    Can anyone help me please?

    Thanks in advance.

  • WITH Combined AS (

    SELECT Timestampcol, <other columns>, 'Table1' AS TableName

    FROM Table1

    UNION ALL

    SELECT Timestampcol, <other columns>, 'Table2'

    FROM Table2

    )

    SELECT Timestampcol, <other columns>, TableName

    FROM Combined

    John

  • John Mitchell-245523 (2/4/2016)


    WITH Combined AS (

    SELECT Timestampcol, <other columns>, 'Table1' AS TableName

    FROM Table1

    UNION ALL

    SELECT Timestampcol, <other columns>, 'Table2'

    FROM Table2

    )

    SELECT Timestampcol, <other columns>, TableName

    FROM Combined

    John

    I think you might need to extend this a little to get the latest:

    WITH Combined AS (

    SELECT Timestampcol, <other columns>, 'Table1' AS TableName

    FROM Table1

    UNION ALL

    SELECT Timestampcol, <other columns>, 'Table2'

    FROM Table2

    ), mx as (Select Timestampcol = max(Timestampcol) from Combined)

    SELECT Timestampcol, <other columns>, TableName

    FROM Combined

    join mx on Combined.Timestampcol = mx.Timestampcol

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Oh yes, of course. I got as far as the CTE and then forgot to put the MAX in. Thanks for spotting, Phil.

    John

  • I suspect that this will perform much better than having a second CTE and a join.

    WITH Combined AS (

    SELECT Timestampcol, <other columns>, 'Table1' AS TableName

    FROM Table1

    UNION ALL

    SELECT Timestampcol, <other columns>, 'Table2'

    FROM Table2

    )

    SELECT TOP(1) Timestampcol, <other columns>, TableName

    FROM Combined

    ORDER BY Timestampcol DESC

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • Sorry i did not understand clearly.

    Can you please give me some real example with the below query?

    Thanks in advance.

  • kavithaje (2/4/2016)


    Sorry i did not understand clearly.

    Can you please give me some real example with the below query?

    Thanks in advance.

    Without knowing your table structures, the examples provided are as real as you are going to get.

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • 1st table Name: Work_Station_Admin_1

    MachineIDInstanceKeyTimeKey RevisionIDrowversion AccountSchemaClass

    167772831 3/3/2015 3:0910x000000000084B0CB A Group

    167772832 3/3/2015 3:0910x000000000084B0CC B User

    167772833 3/3/2015 3:0910x000000000084B0CD C Group

    167772834 5/15/2015 16:2210x000000000084B0CE D User

    167773001 5/15/2015 15:2910x000000000084B0CF E User

    167773002 5/15/2015 15:2910x000000000084B0D0 F User

    167773003 5/15/2015 15:2910x000000000084B0D1 G User

    167773004 5/15/2015 15:2910x000000000084B0D2 H Group

    167773005 5/15/2015 15:2910x000000000084B0D3 I Group

    2nd Table Name: Work_Station_Admin_2

    MachineIDInstanceKeyTimeKey RevisionID rowversion AccountSchemaClass

    2 1 8/9/2010 12:331 0x0000000000EA9C7AA NULL

    2 2 8/9/2010 12:331 0x0000000000EA9CACB NULL

    2 3 8/9/2010 12:331 0x0000000000EA9CCCC NULL

    2 5 8/9/2010 12:331 0x0000000000EA9D8BD NULL

    2 7 8/9/2010 12:331 0x0000000000EA9E0DE NULL

    2 9 2/15/2011 21:051 0x00000000032A1242F NULL

    2 10 2/15/2011 21:051 0x00000000032A1245G NULL

    Primary key is combination of MachineID + InstanceKey

    In which ever table MachineID and Instance ID gets updated at latest, i have to insert that row into new table.

    Hope this clarifies my issue.

  • For each combination of machine/instance?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Maybe something like,

    WITH TEMP_CTE AS(

    SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass FROM TABLE_A

    UNION ALL

    SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass FROM TABLE_B

    )

    SELECT * FROM TEMP_CTE WHERE TimeKey = (SELECT MAX(TimeKey) FROM TEMP_CTE)

    Also looking at the sample data you'll get multiple rows since there's duplicate time stamps.

  • Yes. For each combination of machine ID and instance ID, which ever gets updated at latest in both the tables, that i should insert in another new table.

    This is because both the table data are coming from 2 different database. they are planning to migrate the data from one DB to another. it will take time to migrate it completely. Till that time my work is to get the updated row out of 2 tables and put it in new table. reporting guys will be using this new table as a reference for their reporting purpose.

  • This should give you what you need:

    WITH Combined AS(

    SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass

    FROM TABLE_A

    UNION ALL

    SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass

    FROM TABLE_B

    )

    , Ordered AS (

    SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass,

    ROW_NUMBER() OVER(PARTITION BY MachineID, InstanceKey ORDER BY TimeKey DESC ) AS rn

    -- you can add other columns to the ORDER BY to create a more specific ordering

    FROM Combined

    )

    SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass

    FROM Ordered

    WHERE rn = 1

    I do have to ask whether you have considered using MERGE REPLICATION instead.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • This works perfectly fine for me. Thank you.

    But if you don't mind, can you explain me the code please?

  • Hi Allen,

    The query is returning duplicate rows. i have same MachineID and InstanceKey in both tables. only Account. Schema class and time key is different. I mean Work_Station_Admin_1 got updated with new values.

    Now it should retrieve only 1 row with latest timekey(Work_Station_Admin_1). But am getting both rows.

    How can i get only 1 row with the latest TimeKey?

    Work_Station_Admin_1

    MachineIDInstanceKeyTimeKeyRevisionIDAccountSchemaClassrowversion

    [1234567][1][2/10/2016 10:00][1][2][xyz][0x000000000084B0CB]

    Work_Station_Admin_2

    MachineIDInstanceKeyTimeKeyRevisionIDAccountSchemaClassrowversion

    [1234567][1][11/11/2015 15:14][1][1][abc][0x000000000084B0CC]

  • kavithaje (2/10/2016)


    Hi Allen,

    The query is returning duplicate rows. i have same MachineID and InstanceKey in both tables. only Account. Schema class and time key is different. I mean Work_Station_Admin_1 got updated with new values.

    Now it should retrieve only 1 row with latest timekey(Work_Station_Admin_1). But am getting both rows.

    How can i get only 1 row with the latest TimeKey?

    Work_Station_Admin_1

    MachineIDInstanceKeyTimeKeyRevisionIDAccountSchemaClassrowversion

    [1234567][1][2/10/2016 10:00][1][2][xyz][0x000000000084B0CB]

    Work_Station_Admin_2

    MachineIDInstanceKeyTimeKeyRevisionIDAccountSchemaClassrowversion

    [1234567][1][11/11/2015 15:14][1][1][abc][0x000000000084B0CC]

    Drew's query should do exactly that.

    You also asked how his query works. Here is a short explanation:

    First, in "WITH Combined AS (...)", the data from the two tables is simply concatenated. Nothing special.

    Then, in ", Ordered AS (...)", that concatenated data is extended with one extra column, named rn, and based on the ROW_NUMBER() function. The entire set of data returned from the first part is partitioned (logically divided in groups) based on MachineID and InstanceKey, so rows with the same MachineID and InstanceKey are in the same group, and rows with a difference in one or both column are in another group. Within each group, the rows are then numbered by descending TimeKey, so the most recent gets number 1, the next gets number 2, etc.

    Finally, in "SELECT ... WHERE rn = 1", the results from that second part are filtered and presented. The filter keeps only the rows with rn = 1, which based on the above means that you will get only the most recent row for any distinct MachineID/InstanceKey combination.

    To get more insight in this, you can (temporarily) replace the last part with simple selects such as SELECT * FROM Combined or SELECT * FROM Ordered - that way you can see exactly what data is returned from those two parts of the query.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply