February 4, 2016 at 7:40 am
Hello,
I have 2 identical tables with almost identical data(Few field values will vary).
Both tables will get updated regularly with time stamp.
I want to get the latest updated row by comparing these 2 tables based on time stamp.
Can anyone help me please?
Thanks in advance.
February 4, 2016 at 7:53 am
WITH Combined AS (
SELECT Timestampcol, <other columns>, 'Table1' AS TableName
FROM Table1
UNION ALL
SELECT Timestampcol, <other columns>, 'Table2'
FROM Table2
)
SELECT Timestampcol, <other columns>, TableName
FROM Combined
John
February 4, 2016 at 8:07 am
John Mitchell-245523 (2/4/2016)
WITH Combined AS (
SELECT Timestampcol, <other columns>, 'Table1' AS TableName
FROM Table1
UNION ALL
SELECT Timestampcol, <other columns>, 'Table2'
FROM Table2
)
SELECT Timestampcol, <other columns>, TableName
FROM Combined
John
I think you might need to extend this a little to get the latest:
WITH Combined AS (
SELECT Timestampcol, <other columns>, 'Table1' AS TableName
FROM Table1
UNION ALL
SELECT Timestampcol, <other columns>, 'Table2'
FROM Table2
), mx as (Select Timestampcol = max(Timestampcol) from Combined)
SELECT Timestampcol, <other columns>, TableName
FROM Combined
join mx on Combined.Timestampcol = mx.Timestampcol
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 4, 2016 at 8:12 am
Oh yes, of course. I got as far as the CTE and then forgot to put the MAX in. Thanks for spotting, Phil.
John
February 4, 2016 at 8:30 am
I suspect that this will perform much better than having a second CTE and a join.
WITH Combined AS (
SELECT Timestampcol, <other columns>, 'Table1' AS TableName
FROM Table1
UNION ALL
SELECT Timestampcol, <other columns>, 'Table2'
FROM Table2
)
SELECT TOP(1) Timestampcol, <other columns>, TableName
FROM Combined
ORDER BY Timestampcol DESC
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 4, 2016 at 8:43 am
Sorry i did not understand clearly.
Can you please give me some real example with the below query?
Thanks in advance.
February 4, 2016 at 8:47 am
kavithaje (2/4/2016)
Sorry i did not understand clearly.Can you please give me some real example with the below query?
Thanks in advance.
Without knowing your table structures, the examples provided are as real as you are going to get.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 4, 2016 at 9:07 am
1st table Name: Work_Station_Admin_1
MachineIDInstanceKeyTimeKey RevisionIDrowversion AccountSchemaClass
167772831 3/3/2015 3:0910x000000000084B0CB A Group
167772832 3/3/2015 3:0910x000000000084B0CC B User
167772833 3/3/2015 3:0910x000000000084B0CD C Group
167772834 5/15/2015 16:2210x000000000084B0CE D User
167773001 5/15/2015 15:2910x000000000084B0CF E User
167773002 5/15/2015 15:2910x000000000084B0D0 F User
167773003 5/15/2015 15:2910x000000000084B0D1 G User
167773004 5/15/2015 15:2910x000000000084B0D2 H Group
167773005 5/15/2015 15:2910x000000000084B0D3 I Group
2nd Table Name: Work_Station_Admin_2
MachineIDInstanceKeyTimeKey RevisionID rowversion AccountSchemaClass
2 1 8/9/2010 12:331 0x0000000000EA9C7AA NULL
2 2 8/9/2010 12:331 0x0000000000EA9CACB NULL
2 3 8/9/2010 12:331 0x0000000000EA9CCCC NULL
2 5 8/9/2010 12:331 0x0000000000EA9D8BD NULL
2 7 8/9/2010 12:331 0x0000000000EA9E0DE NULL
2 9 2/15/2011 21:051 0x00000000032A1242F NULL
2 10 2/15/2011 21:051 0x00000000032A1245G NULL
Primary key is combination of MachineID + InstanceKey
In which ever table MachineID and Instance ID gets updated at latest, i have to insert that row into new table.
Hope this clarifies my issue.
February 4, 2016 at 9:28 am
For each combination of machine/instance?
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
February 4, 2016 at 9:30 am
Maybe something like,
WITH TEMP_CTE AS(
SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass FROM TABLE_A
UNION ALL
SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass FROM TABLE_B
)
SELECT * FROM TEMP_CTE WHERE TimeKey = (SELECT MAX(TimeKey) FROM TEMP_CTE)
Also looking at the sample data you'll get multiple rows since there's duplicate time stamps.
February 4, 2016 at 9:44 am
Yes. For each combination of machine ID and instance ID, which ever gets updated at latest in both the tables, that i should insert in another new table.
This is because both the table data are coming from 2 different database. they are planning to migrate the data from one DB to another. it will take time to migrate it completely. Till that time my work is to get the updated row out of 2 tables and put it in new table. reporting guys will be using this new table as a reference for their reporting purpose.
February 4, 2016 at 11:24 am
This should give you what you need:
WITH Combined AS(
SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass
FROM TABLE_A
UNION ALL
SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass
FROM TABLE_B
)
, Ordered AS (
SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass,
ROW_NUMBER() OVER(PARTITION BY MachineID, InstanceKey ORDER BY TimeKey DESC ) AS rn
-- you can add other columns to the ORDER BY to create a more specific ordering
FROM Combined
)
SELECT MachineID, InstanceKey, TimeKey, RevisionID, rowversion, Account, SchemaClass
FROM Ordered
WHERE rn = 1
I do have to ask whether you have considered using MERGE REPLICATION instead.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 5, 2016 at 2:36 am
This works perfectly fine for me. Thank you.
But if you don't mind, can you explain me the code please?
February 10, 2016 at 4:56 am
Hi Allen,
The query is returning duplicate rows. i have same MachineID and InstanceKey in both tables. only Account. Schema class and time key is different. I mean Work_Station_Admin_1 got updated with new values.
Now it should retrieve only 1 row with latest timekey(Work_Station_Admin_1). But am getting both rows.
How can i get only 1 row with the latest TimeKey?
Work_Station_Admin_1
MachineIDInstanceKeyTimeKeyRevisionIDAccountSchemaClassrowversion
[1234567][1][2/10/2016 10:00][1][2][xyz][0x000000000084B0CB]
Work_Station_Admin_2
MachineIDInstanceKeyTimeKeyRevisionIDAccountSchemaClassrowversion
[1234567][1][11/11/2015 15:14][1][1][abc][0x000000000084B0CC]
February 10, 2016 at 5:37 am
kavithaje (2/10/2016)
Hi Allen,The query is returning duplicate rows. i have same MachineID and InstanceKey in both tables. only Account. Schema class and time key is different. I mean Work_Station_Admin_1 got updated with new values.
Now it should retrieve only 1 row with latest timekey(Work_Station_Admin_1). But am getting both rows.
How can i get only 1 row with the latest TimeKey?
Work_Station_Admin_1
MachineIDInstanceKeyTimeKeyRevisionIDAccountSchemaClassrowversion
[1234567][1][2/10/2016 10:00][1][2][xyz][0x000000000084B0CB]
Work_Station_Admin_2
MachineIDInstanceKeyTimeKeyRevisionIDAccountSchemaClassrowversion
[1234567][1][11/11/2015 15:14][1][1][abc][0x000000000084B0CC]
Drew's query should do exactly that.
You also asked how his query works. Here is a short explanation:
First, in "WITH Combined AS (...)", the data from the two tables is simply concatenated. Nothing special.
Then, in ", Ordered AS (...)", that concatenated data is extended with one extra column, named rn, and based on the ROW_NUMBER() function. The entire set of data returned from the first part is partitioned (logically divided in groups) based on MachineID and InstanceKey, so rows with the same MachineID and InstanceKey are in the same group, and rows with a difference in one or both column are in another group. Within each group, the rows are then numbered by descending TimeKey, so the most recent gets number 1, the next gets number 2, etc.
Finally, in "SELECT ... WHERE rn = 1", the results from that second part are filtered and presented. The filter keeps only the rows with rn = 1, which based on the above means that you will get only the most recent row for any distinct MachineID/InstanceKey combination.
To get more insight in this, you can (temporarily) replace the last part with simple selects such as SELECT * FROM Combined or SELECT * FROM Ordered - that way you can see exactly what data is returned from those two parts of the query.
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply