October 5, 2011 at 8:57 am
Hi guys and gals, i'm new here so please be gentle. I'm not a novice but aint an expert in T-SQL either and was looking for some advice please...
What I am trying to do is create a Stored Procedure that when called, copies records from my holding tables to my current tables.
Both the holding tables and current tables are identical.
My first thought was just to copy the holding table and replace the current table. However, there are literally millions of rows in each table and i'm feeling this could take some time.
What I wanted to do is copy just the changes (and creations) from the holding tables into the current tables. Luckily any amendment to the holding tables has a 'Created Date' field which is populated with Getdate()
This is where I get the problem.
records in the holding table can be both created or updated (but thankfully none are ever deleted)
What method is best to create a quick and simple procedure that copies all the records with an 'Created Date' greater than 'x' from the holding table and either INSERTS or UPDATES into the current table?
Each table can have 40 or 50 columns to write out for each INSERT and UPDATE, and this I guess will be time consuming to code, i was hoping there was a INSERT * or UPDATE * alternative...
Any advice much appreciated
Some examples to help explain things better
HOLD TABLE (created date, Primary Key, data item 1, data item 2)
01/01/2000, PK1, data A, data B
05/11/2011, PK2, data C, data D
05/11/2011, PK3, data E, Data F
CURRENT TABLE (created date, Primary Key, data item 1, data item 2)
01/01/2000, PK1, data A, data B
01/01/2005, PK2, data A, data B
PK2 record would need to be updated
PK3 record would need to be inserted
Thanks
October 5, 2011 at 9:22 am
A MERGE statement is what you're looking for.
You cannot use anything like *, you have to explicitly specify the column names for both the INSERT and UPDATE parts. You can easily build the column lists by querying the metadata views (INFORMATION_SCHEMA.COLUMNS or sys.columns).
The basic syntax would be something like this (untested as you haven't provided sample DDL):
MERGE HOLDTABLE AS SOURCE
USING CURRENTTABLE AS TARGET
ON ( SOURCE.myPK = TARGET.myPK )
WHEN MATCHED AND ( SOURCE.last_updated >= TARGET.Last_Updated )
THEN
UPDATE SET
col1 = SOURCE.col1 ,
col2 = SOURCE.col2 ,
col3 = SOURCE.col3,
...
WHEN NOT MATCHED
THEN INSERT ( col1, col2, col3 )
VALUES ( SOURCE.col1 ,
SOURCE.col2 ,
SOURCE.col3,
...
)
October 5, 2011 at 9:24 am
Domestos (10/5/2011)
Hi guys and gals, i'm new here so please be gentle. I'm not a novice but aint an expert in T-SQL either and was looking for some advice please...What I am trying to do is create a Stored Procedure that when called, copies records from my holding tables to my current tables.
Both the holding tables and current tables are identical.
My first thought was just to copy the holding table and replace the current table. However, there are literally millions of rows in each table and i'm feeling this could take some time.
What I wanted to do is copy just the changes (and creations) from the holding tables into the current tables. Luckily any amendment to the holding tables has a 'Created Date' field which is populated with Getdate()
This is where I get the problem.
records in the holding table can be both created or updated (but thankfully none are ever deleted)
What method is best to create a quick and simple procedure that copies all the records with an 'Created Date' greater than 'x' from the holding table and either INSERTS or UPDATES into the current table?
Each table can have 40 or 50 columns to write out for each INSERT and UPDATE, and this I guess will be time consuming to code, i was hoping there was a INSERT * or UPDATE * alternative...
Any advice much appreciated
Some examples to help explain things better
HOLD TABLE (created date, Primary Key, data item 1, data item 2)
01/01/2000, PK1, data A, data B
05/11/2011, PK2, data C, data D
05/11/2011, PK3, data E, Data F
CURRENT TABLE (created date, Primary Key, data item 1, data item 2)
01/01/2000, PK1, data A, data B
01/01/2005, PK2, data A, data B
PK2 record would need to be updated
PK3 record would need to be inserted
Thanks
The insert would not be too hard. You just need to do a left join or an exists.
Something like:
insert into current (createdDate, PrimaryKey, DataItem1, DataItem2)
select CreatedDate, primarykey, dataItem1, dataItem2
from Hold h
left join Current c on h.primarykey = c.primarykey
where c.primarykey is null
The update is harder because you would actually have to compare the columns. Perhaps you can assume that if a new record has been inserted that something has changed in which case you just do the update.
October 5, 2011 at 9:44 am
Thanks HowardW,
Could you quickly tell me what the
WHEN MATCHED AND
and
WHEN NOT MATCHED
statements are matching on? all the fields? or just a primary key?
also do I need a
[Code="sql"] AND ( SOURCE.last_updated >= TARGET.Last_Updated ) [/code]
after the NOT MATCHED statement?
October 5, 2011 at 9:52 am
Domestos (10/5/2011)
Thanks HowardW,Could you quickly tell me what the
WHEN MATCHED AND
and
WHEN NOT MATCHED
statements are matching on? all the fields? or just a primary key?
It's using the ON clause. e.g.:
ON ( SOURCE.myPK = TARGET.myPK )
also do I need a
[Code="sql"] AND ( SOURCE.last_updated >= TARGET.Last_Updated ) [/code]
after the NOT MATCHED statement?
Nope. The NOT MATCHED portion is only used for inserts, so it will insert any rows where that row does not exist in the target table - there is no TARGET.last_updated to compare against when it doesn't exist.
I'd recommend having a good look at the MERGE arcicle in BOL (I linked to it above) to make sure you understand it.
October 5, 2011 at 10:02 am
Great thx, i'll give it a go and let you know how it goes.
October 6, 2011 at 6:48 am
After some messing about, lots of strange errors and some frantic head scratching i discovered that I was on SQL Server 2005 , and MERGE doesnt work with SQL Server 2005.
Went back to the drawing board and came up with this solutions to my problem,
UPDATE [dbo].[ETL_HL7_TEMP_imp_patient_tbl]
SET [address1] = [SOURCE].address1
,[address2] = [SOURCE].address2
,[address3] = [SOURCE].address3
...
,[change_date] = [SOURCE].change_date
OUTPUT INSERTED.pasid INTO @tblPasid
FROM dbo.ETL_HL7_TEMP_imp_patient_tbl [TARGET]
INNER JOIN [dbo].[ETL_HL7_HOLD_imp_patient_tbl] [SOURCE] ON [TARGET].pasid=[SOURCE].pasid
WHERE [SOURCE].[change_date]>@update_from
;
INSERT INTO [dbo].[ETL_HL7_TEMP_imp_patient_tbl]
([address1]
,[address2]
...
,[change_date])
SELECT [address1]
,[address2]
,[address3]
...
,[change_date]
FROM [dbo].[ETL_HL7_HOLD_imp_patient_tbl]
WHERE [change_date]>@update_from AND pasid NOT IN (SELECT pasid from @tblPasid)
October 6, 2011 at 6:53 am
Yes, it's SQL Server 2008 only. In fairness, you did post your question in the SQL Server 2008 forum 🙂
Thanks for coming back and sharing your solution!
October 6, 2011 at 7:02 am
HowardW (10/6/2011)
In fairness, you did post your question in the SQL Server 2008 forum 🙂Thanks for coming back and sharing your solution!
Ohh yer - but then I thought I was on 2008 anyway, but im only on SSMS 2008 :doze:
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply