August 7, 2012 at 5:19 pm
Hi all,
I've been researching the different options that we have in building our data warehouse. I am intrigued by CDC and the new CDC objects in SSIS, but from what I understand, once you use a CDC Splitter, the transactional consistency is gone, and so there is a chance that data integrity issues arise in the data warehouse.
Is there a way to use the new CDC objects in SSIS 2012 to build a data warehouse and maintain data integrity? How would I go about doing that? Some top-level ideas would be great!
Thanks in advance.
August 9, 2012 at 12:38 am
shahgols (8/7/2012)
Hi all,I've been researching the different options that we have in building our data warehouse. I am intrigued by CDC and the new CDC objects in SSIS, but from what I understand, once you use a CDC Splitter, the transactional consistency is gone, and so there is a chance that data integrity issues arise in the data warehouse.
Is there a way to use the new CDC objects in SSIS 2012 to build a data warehouse and maintain data integrity? How would I go about doing that? Some top-level ideas would be great!
Thanks in advance.
The CDC splitter just seperates the different types of actions: inserts versus updates versus deletes. In itself, this doesn't violate transactional consistency IF you are using the net values processing mode.
For example, an insert of a row followed by an update is passed through by CDC as a single insert with the most recent values. In this case, transactional consistency is maintained for a single row. There is no garantuee that if for example row A is updated and row B is inserted, they will still be in the same order.
If you don't use net processing mode and you fetch all the individual changes, it is discouraged to use the CDC splitter.
This blog post by Matt Masson explains the different processing modes:
Processing Modes for the CDC Source[/url]
Direct quote:
Note: You will (typically) not want to use the CDC Splitter when processing all changes, as sending your change rows down multiple paths will make it difficult to apply them in the correct order.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
August 9, 2012 at 8:59 am
Thank you for your help, I appreciate it. Based on your response, I think that using CDC with net changes is the right fit for us. I have a question about something that you said, I am not sure how this affects data integrity or something else, can you please explain? Thank you.
"There is no garantuee that if for example row A is updated and row B is inserted, they will still be in the same order."
August 10, 2012 at 12:19 am
Let's say you have two net changes: row A is updated and row B is inserted, in that order.
You retrieve those changes with the CDC source and you use the CDC splitter to send them to their appropriate destinations.
Row B directly goes to the destination using an OLE DB Destination using Fast Load and is thus immediately inserted in the destination table.
Row A goes to a staging table - also with OLE DB Destination - and after the dataflow an Execute SQL Task performs an update on the destination table using the staging table.
This means row A is updated after row B is inserted, which is the reverse order of what happened at the source.
If you use this method to do your updates - which is the most efficient way - all your updates are done after all the inserts. Normally this isn't an issue, but I was just pointing it out. You never know 🙂
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
August 10, 2012 at 10:26 am
Understood, thanks for that, makes sense. I also didn't know what the best way of doing an update would be.
I opened another thread about CDC net changes behaving "strange", at least for me. Would love to know what your thoughts are on this. Below is the link to the thread, thanks.
http://www.sqlservercentral.com/Forums/Topic1342735-2799-1.aspx
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply