May 6, 2011 at 9:45 am
I have an ssis job that does a comparison of a staging table and a production table. It inserts any new rows into the production table and updates any rows that have been changed since the last run. Right now I am using Fingerprint Assignment in ssis to identify the new and missing. However, we are not happy with the performance of this. These tables can contain up to several million rows in some cases. Is there a more efficient way in TS-SQL or ssis to accomplish this?
May 6, 2011 at 12:51 pm
You could use CROSS APPLY with HASHBYTES to do a comparison of the rows. As long as you have some sort of unique id that would help you identify where to make the update. We do something like this with our data warehouse in which we track changes to an employee's record. Their employee id does not change, so we can use that.
Basically you use CROSS APPLY to generate an XML string of the data, and then convert it to a HASHBYTE for comparison.
SELECT
employee_id,
fname,
lname,
....,
HASHBYTES('SHA1',h.x) as hashValue
FROM
stgTable
CROSS APPLY(
SELECT
fname,
lname,
...
FOR XML RAW) as h(x)
This will give you a varbinary representation of the data. Do the same for your production table, and then compare the 2 joining on employee_id and checking where stgTable.hashValue != prodTable.hashValue.
Those instances where the hashValues are not the same are the records you need to update, and if you don't find the employee_id in prodTable, then those are the ones you insert.
This may be an overly simple example, but you should be able to apply it to your situation. We use this method to manage changes of over 50,000 active employees daily. The comparison, update and insert is complete in less than 60 seconds.
May 6, 2011 at 1:10 pm
andy-973092 (5/6/2011)
Right now I am using Fingerprint Assignment in ssis to identify the new and missing. However, we are not happy with the performance of this.
Is this a third party tool or a technique I'm just not familiar with the name of?
Never stop learning, even if it hurts. Ego bruises are practically mandatory as you learn unless you've never risked enough to make a mistake.
For better assistance in answering your questions[/url] | Forum Netiquette
For index/tuning help, follow these directions.[/url] |Tally Tables[/url]
Twitter: @AnyWayDBA
May 6, 2011 at 2:20 pm
Is your staging table flushed and reloaded every time with only changed and new rows?
Jack Corbett
Consultant - Straight Path Solutions
Check out these links on how to get faster and more accurate answers:
Forum Etiquette: How to post data/code on a forum to get the best help
Need an Answer? Actually, No ... You Need a Question
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply