October 5, 2010 at 7:13 pm
The boss wants a reconciliation import to be part of the ETL process. First, it should check to see if the source table's record count is the same as the destination table's record count. If not, the task should extract rows from the source data and destination data in 15-day increments based on the MaxDate column in each table. The SSIS package should then check each chunk of records to reconcile the differences until all differences between the destination is the same as the source. I guess it should loop like this:
Source Destination
Loop 1 x y
Loop 2 x-15 days x-15 days
And so on. If there is a row in the destination that was not in the source, then the row should be DELETED from the destination. If the row in the destination is different from the equivalent row in the source, the destination row should be UPDATED. If the row in the source does not have an equivalent in the destination, the row should be INSERTED into the destination.
What is the most efficient way to pull this off? And does it make sense to loop through the tables and extract chunks of data because the tables are large?
October 5, 2010 at 7:33 pm
I don't know what "large" means for you, and this will make a difference.
And it also makes a difference in what version of SQL Server you're on: 2005 or 2008/R2?
If your on one of the 2008 versions, I'd recommend using the MERGE statement to do all of this.
And if you're on 2005, I'd actually recommend using DELETE, UPDATE and INSERT statements to do what you need.
Yep, I'd skip SSIS completely with this.
Wayne
Microsoft Certified Master: SQL Server 2008
Author - SQL Server T-SQL Recipes
October 5, 2010 at 10:32 pm
imani_technology (10/5/2010)
And so on. If there is a row in the destination that was not in the source, then the row should be DELETED from the destination. If the row in the destination is different from the equivalent row in the source, the destination row should be UPDATED. If the row in the source does not have an equivalent in the destination, the row should be INSERTED into the destination.
WayneS (10/5/2010)
If your on one of the 2008 versions, I'd recommend using the MERGE statement to do all of this.
And if you're on 2005, I'd actually recommend using DELETE, UPDATE and INSERT statements to do what you need.
I have few questions:
Why not implement the SCD transformation for the above dataset.
For the DELETED Part Left join would suffice
Raunak J
October 5, 2010 at 10:37 pm
Also these days everybody is talking a lot about Change Data Capture...which is also a worth time for this issue
Raunak J
October 5, 2010 at 10:37 pm
We are using SQL Server 2008 R2. From what I understand, we are going to deal with about a million rows.
October 5, 2010 at 10:38 pm
imani_technology (10/5/2010)
We are using SQL Server 2008 R2. From what I understand, we are going to deal with about a million rows.
This really shouldn't blow your brains off when you are dealing with data warehousing :-):-):-)
Raunak J
October 5, 2010 at 10:50 pm
I agree, but my boss thinks the performance will be better if I break the data up into 15-day chunks. Do you think that would be necessary? It would be nice if we could skip the looping.
October 5, 2010 at 10:53 pm
It is always better to load only the changed data to the warehouse. Yes your "BOSS" stands correct!!!:-)
Raunak J
October 5, 2010 at 11:01 pm
Yes, but he wants me to compare and change the data in 15-day chunks. Would putting that into a loop necessarily boost performance?
October 5, 2010 at 11:06 pm
You must understand that implementing a MDM or CDC is far superior than batch loading...the CDC actually tries to eliminate the batch window(15 day in your case)
Give some time and study the CDC methodology:-)
Raunak J
October 6, 2010 at 11:13 am
The boss would like us to move toward an MDM scenario. Where can I learn more about that in a Microsoft context?
October 6, 2010 at 4:13 pm
Thanks for the insight! I have done some research on CDC, which is leading me to more questions. Is it okay for me to start a new thread reagarding Oracle, SSIS, and CDC?
October 6, 2010 at 9:32 pm
Absolutely. But remember not to cross post.:-)
Raunak J
October 7, 2010 at 12:17 pm
How about this?
If
source.Inventory_Item_Id = destination.Inventory_Item_Id
AND
source.Creation_Date = destination.Creation_Date
AND
source.Last_Updated_Date <> destination.Last_Updated_Date
Then start "reconciliation" process.
Can I use a Lookup to do this? Will this properly prevent too many source rows from being processed?
October 7, 2010 at 12:19 pm
Also, is the looping process that the boss wants really necessary? Is breaking up the data into 15-day chunks going to improve performance or slow it down? After all we're only talking about a miliion rows or so.
Viewing 15 posts - 1 through 15 (of 15 total)
You must be logged in to reply to this topic. Login to reply