April 23, 2010 at 7:15 am
i'm agree for the SCD wizard, but i thought u wanted something different.
i have a real pleasure to read all that post on this subject
I didn't use the codeplex one, and the merge option as well, and actually for me, dimension should not contains millions rows
but i accept the fact that sometime, we don't have choices.
:hehe:
April 23, 2010 at 7:20 am
Brandon Carl Goodman (4/23/2010)
Very nice. I have had to implement the check sum before. What I did was build a checksum value over the values of columns (in this case all of them) and performed and insert, update, delete based upon that value. I like your setup in your last post. However, I believe that we may be getting slightly off base of helping a fellow with his issue.:-) Pleasure discussing strategy with you.
It's my pleasure as well
Off topic, off topic... The tile is SSIS: Insert/Update soruce to destination.
Exactly what we're discussing
How did you implement the checksum, I've never actually done it before.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
April 23, 2010 at 7:35 am
Here is a link for Phil Brammer article on using the checksum component for ssis.
http://www.ssistalk.com/2007/03/09/ssis-using-a-checksum-to-determine-if-a-row-has-changed/
When I implemented the checksum, I did all of it from T-SQL. Allow me to explain -
1. I have a base and a staging table identical in schema. Data from my import is loaded into my staging table. On the staging table I have added an additional column (checksum value) that is a computed column contain a checksum over all of the columns of the staging table, with the exception of a few columns. As data is loaded into staging table, checksum value is computed on the fly.
2. Once data has been loaded into the staging table, I perform a comparison between the staging table and base table.
a. Delete - delete from base table where business key not exist in staging table.
b. Update - create a checksum value over the columns of base table and compare that checksum value and business key against the checksum value and the business key of the staging table. If not equal update, else ignore.
c. Insert - where business key not exist in base table.
If you like I could create a mockup of the syntax for you if that would make more sense. I am more of a visual person my self so sometimes this helps. A picture is worth a thousand words.
April 23, 2010 at 7:44 am
Thanks for the link, very informative. However, at my current project I'm not allowed to install custom components, so yeah
Correct me if I'm wrong, but to me it seems that the set-up that you outlined just makes a perfect copy of the staging table in the base table. (rows that are not in the staging table are deleted, rows that are not in the base table are inserted, changed rows are updated. Hence, you end up with the staging table).
Isn't it simpler to just truncate the base table, perform a load of the staging table and re-build indexes?
(or did I miss something. That's possible, it's friday :-))
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
April 23, 2010 at 7:48 am
Business requirements state that I am not allowed to truncate the base table as it needs to available at all times. I will work on a mockup for you and post it.
April 23, 2010 at 12:10 pm
I have not forgotten about mock-up. Having a busy Friday.
April 23, 2010 at 3:37 pm
CREATE TABLE [Database].[dbo].[Table_Stage]
([AccountNumber]char(16),
[Name]varchar(26),
[Address]varchar(26),
[Checksum_Col]AS Checksum([AccountNumber], [Name], [Address]))
CREATE [Database].[dbo].[Table_Base]
([AccountNumber]char(16),
[Name]varchar(26),
[Address]varchar(26))
/*----------------------------------------------------------------------------------------------------------------------------
Delete where not exists.
----------------------------------------------------------------------------------------------------------------------------*/
DELETE
FROM [Database].[dbo].[Table_Base]
WHERE [AccountNumber] NOT IN(SELECT [AccountNumber] FROM [Database].[dbo].[Table_Stage]) --delete based upon business key
/*----------------------------------------------------------------------------------------------------------------------------
Update records that have a differing checksum value based upon business key (AccountNumber.
----------------------------------------------------------------------------------------------------------------------------*/
UPDATE [Database].[dbo].[Table_Base]
SET [AccountNumber]=stage.[AccountNumber],
[Name]=stage.[Name],
[Address]=stage.[Address]
FROM [Database].[dbo].[Table_Base] base
INNER JOIN [Database].[dbo].[Table_Stage] stage
ON base.[AccountNumber] = stage.[AccountNumber]
WHERE CHECKSUM(base.[AccountNumber],
base.[Name],
base.[Address]) <> stage.[Checksum_Col]
/*----------------------------------------------------------------------------------------------------------------------------
Insert where not exists.
----------------------------------------------------------------------------------------------------------------------------*/
INSERT INTO [Database].[dbo].[Table_Base]
([AccountNumber],
[Name],
[Address])
SELECT [AccountNumber],
[Name],
[Address]
FROM [Database].[dbo].[Table_Stage]
WHERE [AccountNumber] NOT IN (SELECT [AccountNumber] FROM [Database].[dbo].[Table_Base])
As promised a mock-up. Enjoy the weekend.
April 27, 2010 at 3:18 am
Thanks!
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
July 2, 2012 at 11:42 am
Seems I am not understanding something. If the idea is to use ETL tools like ssis to speedup development and to avoid hard coding of store procedures why do we need to go and type the sql for a update / insert / merge inside a ssis package to be able to do a simple task like a upsert / merge statement, It seems to me to defy the purpose.
It would be faster just to type the same procedure and compile as a store procedure. Probably save you time as well.
July 2, 2012 at 12:56 pm
The code that is listed is actually a compiled stored procedure that is called from the SSIS package. The code is merely for demonstration purposes.
This post is also two years old.
July 3, 2012 at 10:47 am
Thanks, I am using Visual studio 2010 to develop the ssis. Doing a oracle to oracle elt job. However I have not been able to find a component that can handle a update into a table seamlessly, without writing some code. If that is the case I can just as well use PL/SQL for the ETL. Do you have any suggestions?
July 3, 2012 at 2:02 pm
cbrown-779669 (7/3/2012)
Thanks, I am using Visual studio 2010 to develop the ssis. Doing a oracle to oracle elt job. However I have not been able to find a component that can handle a update into a table seamlessly, without writing some code. If that is the case I can just as well use PL/SQL for the ETL. Do you have any suggestions?
You're right, SSIS doesn't have a component to do updates efficiently.
The best option is to write updates to a staging table and do the update with a good old SQL UPDATE statement.
Need an answer? No, you need a question
My blog at https://sqlkover.com.
MCSE Business Intelligence - Microsoft Data Platform MVP
March 27, 2019 at 6:33 am
This was removed by the editor as SPAM
Viewing 13 posts - 16 through 27 (of 27 total)
You must be logged in to reply to this topic. Login to reply
This website stores cookies on your computer.
These cookies are used to improve your website experience and provide more personalized services to you, both on this website and through other media.
To find out more about the cookies we use, see our Privacy Policy