Extracting data based on timestamp

Question

Extracting data based on timestamp

LezL

Grasshopper

Points: 14
More actions
November 18, 2005 at 2:38 am

#166347

Hi All
First post here. We have a number of DTS tasks that run overnight to populate a SQL Server DB behind our web ordering system. The DTS pulls data from our SOP system, which is Progress on AIX, using a Merant driver.
We currently dump everything required into a staging db overwriting and transforming it, then send it to the live db.
Problem is the amount of data being transferred is getting too large, so I would like to only extract changed data. I don't really know where to begin. Some of the source tables have "last modified" fields, some don't. Where do I hold the date / time to compare against? How I make SQL and Progress see a common datetime format. What about those tables that have new keys generated?
A lot of questions there I know. Any pointers gratefully received.
Cheers,
LezL

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply

Jonathan Stokes SSCrazy Eights Points: 9861 More actions · Answer 1

If you cannot guarantee a last updated stamp is correct, then the best policy is to presume none of the data in the two tables exists.

Therefore, you will need to run scripts to check what has changed between the two. This can be put into 3 areas.

Insert new records (easily accommodated by searching for sourceid not in the destinationid)

Delete old records (by searching for destinationid not in soriceid)

Changed records (also known as slowly changing dimensions - Tgere are hundreds of theories out there as to the best way to accomodate. Search the net and you will find them all. The best method depends on many variables such as the amount of data, indexes, constraints, triggers etc)

The most simplest (though not always most effective) way is to use an update statement that compares row by row each column and updates destinationcolumns where different.

It may be that truncating the table and inserting from new works quicker. If you do go with the route of checking row by row, ensures that you still run a full cylce population at least once a month.

------------------------------
The Users are always right - when I'm not wrong!

LezL Grasshopper Points: 14 More actions · Answer 2

Thanks for the info. One thing, though - do I need to do these checks between source and staging and staging and live or would I just send all of the staging data to live?

Cheers,

LezL

Jonathan Stokes SSCrazy Eights Points: 9861 More actions · Answer 3

The only comparison should be between stage and destination.

Stage should really be used for collecting all available data before decisions are made in regards to populating targets.

IE - source - stage(copy or view of source) - target

------------------------------
The Users are always right - when I'm not wrong!