What are the alternatives to CDC for capturing and identifying the delta for a warehouse extraction?

  • Hi,

    I'm putting together a spec for a warehouse that extracts data from 12+ sites. I've told them that CDC is the way to go and that they'll need SQL Server 2008 Enterprise to do this. They won't support R2.

    They say that this will cost them 600K and to look at other options.

    I know it can be accomplished (albeit inefficiently) with shadow tables and triggers, but I would like to know what other options there may be.

    Note: I need to capture ALL change, including that between loads, so a LastModifiedDateTime field added to each table won't cut it.

    Thanks in advance.

  • Hi Dave,

    Bret Flippen wrote a good article on this here[/url]. I implemented his suggestions recently at a client and, during an exchange on Twitter, Brett directed me to a cool custom component on CodePlex that generates hashes in a data flow. It's called the SSIS Multiple Hash[/url] and works well.

    Hope this helps,

    Andy

    Andy Leonard, Chief Data Engineer, Enterprise Data & Analytics

  • Thanks Andy. I'll check it out.

  • Andy Leonard (2/25/2011)


    Bret Flippen wrote a good article on this here[/url].

    Thanks for the good reference!

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • A question to the OP:

    Consider the following scenario ...

    Package runs at 1pm.

    1) Change made to record at 1.05pm.

    2) Another change made to same record at 1.10pm.

    Package runs again at 1.15pm.

    Your requirement is to pick up 'all changes', so I guess that means both (1) and (2), even if (2) overwrites (1) - is that correct? Otherwise, Last Modified would work OK wouldn't it?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Hi Phil,

    Yes. LastModifiedDateTime would work if you don't need to capture all change and don't allow deletions (which is a different story).

    If you allow deletions and don't just mark them as deleted (i.e. IsActive = false), then CDC or triggers can keep track of the deletions for you.

    I'm just curious about what others have done in similar situations.

    Cheers,

    Dave

  • davepc (2/28/2011)


    Hi Phil,

    Yes. LastModifiedDateTime would work if you don't need to capture all change and don't allow deletions (which is a different story).

    If you allow deletions and don't just mark them as deleted (i.e. IsActive = false), then CDC or triggers can keep track of the deletions for you.

    I'm just curious about what others have done in similar situations.

    Cheers,

    Dave

    What is your exact definition of "all changes"?

    If a row is updated twice, you want the two seperate updates?

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi Koen,

    Yes. I want the two updates.

    They will both be entered into the warehouse and have appropriate EffectiveFrom and EffectiveTo datetime fields. The latest one has a null EffectiveTo value.

    They will also get a surrogate key that will be used as a foreign key for related data that falls in the same datetime range.

    I know a warehouse isn't an auditing tool, but I want to capture the context of reference data at the time it was used. It's important for this particular warehouse as it's medical based.

    Cheers,

    Dave

  • With the built-in utilities from SQL Server Standard edition, you're stuck with the triggers as you said before, since change_tracking doesn't take all updates into account.

    CDC does...

    If you can modify the system that inputs the data, you can add OUTPUT clauses everywhere, but as soon as one forgets them in a new query, it breaks your system.

    So I guess triggers are your only option (unless I'm missing something). Or convince them to go for CDC 🙂

    Need an answer? No, you need a question
    My blog at https://sqlkover.com.
    MCSE Business Intelligence - Microsoft Data Platform MVP

  • Hi Andy,

    It is really helpful if you share that link where i can get the solution that Brett provided.

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply