November 21, 2016 at 9:15 am
I am mirroring about (23) tables in another DW. I have built my first incremental loads package, for the first table, and it successfully checks for new rows, updated rows and deletes missing rows. I was very excited. 🙂
But now I have moved on to the next table and it has (44) columns!!! Which means that I have to check for change in the 2 million rows of 44 columns! I know that this is going to be a big strain of the server resources and probably take some time but even worse, I have to build the most ridiculously long expression in my Conditional Split task to check every single source row's values against every single destination row's values (eg. column1 != LkUp_column1 || column2 != LkUp_column2.... column44 != LkUp_column44).
There has to be a better way to do this? I read about using a hash. Is this relatively easy to implement or do I really have to write the War-and-Peace expression in the Conditional Split?
November 21, 2016 at 9:21 am
Jerid421 (11/21/2016)
I am mirroring about (23) tables in another DW. I have built my first incremental loads package, for the first table, and it successfully checks for new rows, updated rows and deletes missing rows. I was very excited. 🙂But now I have moved on to the next table and it has (44) columns!!! Which means that I have to check for change in the 2 million rows of 44 columns! I know that this is going to be a big strain of the server resources and probably take some time but even worse, I have to build the most ridiculously long expression in my Conditional Split task to check every single source row's values against every single destination row's values (eg. column1 != LkUp_column1 || column2 != LkUp_column2.... column44 != LkUp_column44).
There has to be a better way to do this? I read about using a hash. Is this relatively easy to implement or do I really have to write the War-and-Peace expression in the Conditional Split?
A better way to do this is to add and maintain a date modified column to your source data. Use this to drives INSERTs and UPDATEs (some additional work would be needed to handle deletes).
Or you could consider implementing Change Tracking.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 21, 2016 at 4:40 pm
Why not just use EXCEPT?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 21, 2016 at 5:16 pm
Jeff Moden (11/21/2016)
Why not just use EXCEPT?
EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 21, 2016 at 5:47 pm
Phil Parkin (11/21/2016)
Jeff Moden (11/21/2016)
Why not just use EXCEPT?EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.
Now you have me curious... Why not recommended? What's wrong with it compared to some other method?
--Jeff Moden
Change is inevitable... Change for the better is not.
November 21, 2016 at 8:03 pm
I would consider adding a date modified column but the whole point of copying these tables is because I don't have admin. rights over them.... read-only. So, I am sucking them into my own DW so that I can integrate them into my own data model with some other tables.
I downloaded a Check Sum Task (third-party) today and used it in my package to detect the change. I think that it might have worked. I just need the source data to "change" tonight so that I can run the package and see what the results of the package are.
I've never heard of EXCEPT. I'll have to look into it tomorrow.
November 21, 2016 at 8:20 pm
Jeff Moden (11/21/2016)
Phil Parkin (11/21/2016)
Jeff Moden (11/21/2016)
Why not just use EXCEPT?EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.
Now you have me curious... Why not recommended? What's wrong with it compared to some other method?
It's the amount of data, not the comparison method, which concerns me. And presumably it's only going to get worse over time.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 21, 2016 at 8:23 pm
If you are prepared to accept the occasional false positive, the checksum method is not so bad.
The Except method is, however, more robust.
The absence of evidence is not evidence of absence
- Martin Rees
The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
- Phil Parkin
November 22, 2016 at 2:36 pm
Phil Parkin (11/21/2016)
Jeff Moden (11/21/2016)
Phil Parkin (11/21/2016)
Jeff Moden (11/21/2016)
Why not just use EXCEPT?EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.
Now you have me curious... Why not recommended? What's wrong with it compared to some other method?
It's the amount of data, not the comparison method, which concerns me. And presumably it's only going to get worse over time.
Ah. Understood. Time for me to do some experiments with larger tables in this area. The largest table I ever did this with was something like 40 or 50 columns wide and only a million or so rows. Thanks for bringing it up, Phil.
--Jeff Moden
Change is inevitable... Change for the better is not.
November 23, 2016 at 10:58 am
Jeff Moden (11/22/2016)
Phil Parkin (11/21/2016)
Jeff Moden (11/21/2016)
Phil Parkin (11/21/2016)
Jeff Moden (11/21/2016)
Why not just use EXCEPT?EXCEPT on a 2-million-row resultset is tidier than a column-by-column comparison, but still not to be recommended, IMO.
Now you have me curious... Why not recommended? What's wrong with it compared to some other method?
It's the amount of data, not the comparison method, which concerns me. And presumably it's only going to get worse over time.
Ah. Understood. Time for me to do some experiments with larger tables in this area. The largest table I ever did this with was something like 40 or 50 columns wide and only a million or so rows. Thanks for bringing it up, Phil.
I'd be interested in seeing your results Jeff. And thank you Phil for the link to Change Tracking. Since here we are still on an older version of SQL Server I had not come upon this one.
----------------------------------------------------
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply