July 20, 2012 at 5:26 am
Hello,
I am Inserting data into tables in a large datawarehouse;
I want to insert only changed data into the warehouse instead of INSERTING the whole load all OVER AGAIN
I need idea on how to go about these
Many thanks
July 20, 2012 at 5:35 am
Number of options depending on what edition of SQL you have depends what you can do.
CDC (Change data capture)
HashBytes on a MD5 hash, if source & target hash dont match then insert or update.
The use of merge commands, based on the unique constraint of the row, if they dont exist then insert if they do check the other columns for any differences and update where there is.
Custom triggers which log updates inserts deletes etc which are stored in an audit table and can be played on the warehouse - pretty much like CDC but if you have Standard edition.
July 20, 2012 at 5:46 am
I think Anthony meant "custom triggers" - Winnie the Pooh's mates were not that hot when it came to database theory, customised or not 😀
There is another possibility. If your source systems maintain DateCreated/DateModified columns on the tables you are interested in, it should be relatively simple to select rows where date created/modified > [max date created/modified from warehouse] and then do a MERGE to get them into your DW.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 20, 2012 at 5:49 am
Good spot Phil. I shall ammend.
July 20, 2012 at 8:57 am
Thanks Guys, but we dont have change data capture turned on, I was thinking of stored proc that use a Not exists ..
ANY IDEAS guys
July 20, 2012 at 9:01 am
That would not capture updates to existing data.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 20, 2012 at 9:05 am
anthony.green (7/20/2012)
Good spot Phil. I shall ammend.
Amend? :hehe:
July 20, 2012 at 9:07 am
Yep, not exists would capture new rows based on a suitable identifier, but won't get updates to extant rows.
Unless that's all you need 🙂
July 20, 2012 at 9:19 am
Gazareth (7/20/2012)
anthony.green (7/20/2012)
Good spot Phil. I shall ammend.Amend? :hehe:
i assumed that was deliberate!
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
July 20, 2012 at 9:24 am
Phil Parkin (7/20/2012)
Gazareth (7/20/2012)
anthony.green (7/20/2012)
Good spot Phil. I shall ammend.Amend? :hehe:
i assumed that was deliberate!
Me too. Mostly.
Muphry's Law always good for a laugh, whether deliberate or not 🙂
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply