April 7, 2012 at 11:19 pm
When you load the Staging table for an Incremental Load is it not common practice to do a comparison with what is in the corresponding Data Warehouse Table to populate the Staging Table?
Also is it not common practice to mark the record in Staging with an INSERT, UPDATE or DELETE Flag so that you know what to do with the record?
Any ideas or suggestions would be greatly appreciated.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
April 7, 2012 at 11:51 pm
#1 - Yes it is common practice to compare data in staging with the data warehouse.
#2 - Because of #1 I don't care if rows in staging are marked as INSERT or UPDATE. Since we are comparing staging to DW, the ETL process determines what type of action will take place in the DW. For example, a row inserted into a source database may or may not be an insert into your DW, depending on your data model. On the other hand, deletes need to be tracked which I would normally deal with separately in a staging table that stores the business key of the data that was deleted. This way the ETL can take the appropriate action in the DW.
LinkedIn - http://www.linkedin.com/in/carlosbossy
Blog - http://www.carlosbossy.com
Follow me - @carlosbossy
April 8, 2012 at 12:01 am
Carlos Bossy (4/7/2012)
#2 - Because of #1 I don't care if rows in staging are marked as INSERT or UPDATE. Since we are comparing staging to DW, the ETL process determines what type of action will take place in the DW. For example, a row inserted into a source database may or may not be an insert into your DW, depending on your data model. On the other hand, deletes need to be tracked which I would normally deal with separately in a staging table that stores the business key of the data that was deleted. This way the ETL can take the appropriate action in the DW.
Thanks for your reply!
I need to log how many records were Inserted, Updated or Deleted along with $ Amounts for various columns during the incermental load.
How do I do that using your method?
Thanks!
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
April 8, 2012 at 12:03 am
Carlos Bossy (4/7/2012)
#1 - Yes it is common practice to compare data in staging with the data warehouse.
Don't you perform the initial load to staging. Load the DW Tables, and Clear Staging before you load the Staging with an Incremental Load?
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
July 20, 2012 at 5:59 am
Depends on your ETL process to load from staging to target, but you should be able to capture INSERT/UPDATE/DELETE records from the ETL process. I used the MERGE statement and DELETE statements to handle all my ETL which allowed me to use the OUTPUT clause and capture all effected records in a temp table. I used the data in the temp table to log the count of records that were INSERTS/UPDATES/DELETES.
July 20, 2012 at 1:52 pm
richykong (7/20/2012)
Depends on your ETL process to load from staging to target, but you should be able to capture INSERT/UPDATE/DELETE records from the ETL process. I used the MERGE statement and DELETE statements to handle all my ETL which allowed me to use the OUTPUT clause and capture all effected records in a temp table. I used the data in the temp table to log the count of records that were INSERTS/UPDATES/DELETES.
Could you please provide of the Sample of the OUTPUT Clause and Inserts into the Temp Table for INSERTS & UPDATES?
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
July 20, 2012 at 3:15 pm
Hope this helps.
CREATE TABLEdbo.TargetTable
([SurrogateKey]INTIDENTITY(1,1)NOT NULL PRIMARY KEY
,[JoinKey]INTNOTNULL
,[Detail1]VARCHAR(128)NOT NULL
,[Detail2]VARCHAR(128)NOT NULL)
CREATE TABLEdbo.StageTable
([JoinKey]INTNOTNULL
,[Detail1]VARCHAR(128)NOT NULL
,[Detail2]VARCHAR(128)NOT NULL)
INSERT INTO dbo.TargetTable
(JoinKey,[Detail1], [Detail2])
SELECT1,'Test 1', 'blahblahblah'
UNION ALL SELECT 2,'Test 2', 'blahblahblah'
UNION ALL SELECT 3,'Test 3', 'blahblahblah'
UNION ALL SELECT 4,'Test 4', 'blahblahblah'
INSERT INTO dbo.StageTable
(JoinKey,[Detail1], [Detail2])
SELECT1,'Test A', 'blahblahblah'
UNION ALL SELECT 2,'Test B', 'blahblahblah'
UNION ALL SELECT 5,'Test C', 'blahblahblah'
UNION ALL SELECT 6,'Test D', 'blahblahblah'
--SELECT the table before the MERGE
SELECT * FROM dbo.TargetTable
SELECT * FROM dbo.StageTable
DECLARE@TempASTABLE
(ActionVARCHAR(20)
,JoinKeyINT
,Detail1VARCHAR(128)
,Detail2VARCHAR(128))
MERGEdbo.TargetTableASTarget
USINGdbo.StageTableASSource
ONTarget.JoinKey=Source.JoinKey
WHEN MATCHED
AND(Target.[Detail1]<>Source.[Detail1]
ORTarget.[Detail2]<>Source.[Detail2])
THEN
UPDATE
SET[Detail1]=Source.[Detail1]
,[Detail2]=Source.[Detail2]
WHEN NOT MATCHED
THEN
INSERT
(JoinKey
,Detail1
,Detail2)
VALUES(Source.JoinKey
,Source.Detail1
,Source.Detail2)
--- You can reference 'Source', 'Target', 'Inserted', or 'Deleted' to output from.
OUTPUT$action
,Inserted.JoinKey
,Inserted.Detail1
,Inserted.Detail2
INTO @Temp;
--SELECT The results of the merge action.
SELECT *
FROM @Temp
SELECT * FROM dbo.TargetTable
SELECT * FROM dbo.StageTable
DROP TABLE dbo.TargetTable
DROP TABLE dbo.StageTable
July 21, 2012 at 1:49 pm
richykong (7/20/2012)
...I used the MERGE statement and DELETE statements to handle all my ETL...
I do agree on the MERGE statement but I do not on the use of DELETE - by definition we do not want to delete any data from the core Data Warehouse tables.
The fact that a client is gone as got deleted from the OLTP source system does not means it has to be deleted in the Data Warehouse. Data Warehouse has to provide the means to see what happened on the past then Data Warehouse has to keep a record for the now extinct client (probably including a active-until date column) and also has to keep all the transactions associated with it.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.July 21, 2012 at 2:38 pm
Thanks for the responses.
I totally agree and understand that you do not want to delete records but flag them as inactive.
As far as the OUTPUT Clause could you please go into more detail and provide more specific examples?
Thanks.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
July 21, 2012 at 3:29 pm
Welsh Corgi (7/21/2012)
I totally agree and understand that you do not want to delete records but flag them as inactive.
In general the inactive flag is not enough because it doesn't tell the whole storey... inactive since when? was it active two years ago? two month ago? two hours ago?
Stuff that can go inactive should include a starting and an ending timestamp to let the system know when was active.
This is important because business sooner or later would ask for some business metrics comparing this quarter with the same quarter last year and most probably the quantity of some-stuff (like customers) active during the compared periods would be part of the equation.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.July 21, 2012 at 3:59 pm
Whan I said flag them as inactive I dod not mean to store an "Inactive Flag", I mean to indicate that they were inactive as a particular Date, etc.
For better, quicker answers on T-SQL questions, click on the following...
http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better answers on performance questions, click on the following...
http://www.sqlservercentral.com/articles/SQLServerCentral/66909/
July 21, 2012 at 4:56 pm
Welsh Corgi (7/21/2012)
Whan I said flag them as inactive I dod not mean to store an "Inactive Flag", I mean to indicate that they were inactive as a particular Date, etc.
Shame on me :crying: I'm sorry I didn't realize it was you asking the question, man - I have to pay attention.
_____________________________________
Pablo (Paul) Berzukov
Author of Understanding Database Administration available at Amazon and other bookstores.
Disclaimer: Advice is provided to the best of my knowledge but no implicit or explicit warranties are provided. Since the advisor explicitly encourages testing any and all suggestions on a test non-production environment advisor should not held liable or responsible for any actions taken based on the given advice.July 21, 2012 at 5:10 pm
Carlos Bossy (4/7/2012)
#1 - Yes it is common practice to compare data in staging with the data warehouse.#2 - Because of #1 I don't care if rows in staging are marked as INSERT or UPDATE. Since we are comparing staging to DW, the ETL process determines what type of action will take place in the DW. For example, a row inserted into a source database may or may not be an insert into your DW, depending on your data model. On the other hand, deletes need to be tracked which I would normally deal with separately in a staging table that stores the business key of the data that was deleted. This way the ETL can take the appropriate action in the DW.
What "ETL process"? I believe the OP is trying to write one.
--Jeff Moden
Change is inevitable... Change for the better is not.
July 21, 2012 at 5:13 pm
PaulB-TheOneAndOnly (7/21/2012)
Welsh Corgi (7/21/2012)
I totally agree and understand that you do not want to delete records but flag them as inactive.In general the inactive flag is not enough because it doesn't tell the whole storey... inactive since when? was it active two years ago? two month ago? two hours ago?
Stuff that can go inactive should include a starting and an ending timestamp to let the system know when was active.
This is important because business sooner or later would ask for some business metrics comparing this quarter with the same quarter last year and most probably the quantity of some-stuff (like customers) active during the compared periods would be part of the equation.
+1000
--Jeff Moden
Change is inevitable... Change for the better is not.
July 21, 2012 at 5:17 pm
PaulB-TheOneAndOnly (7/21/2012)
Welsh Corgi (7/21/2012)
Whan I said flag them as inactive I dod not mean to store an "Inactive Flag", I mean to indicate that they were inactive as a particular Date, etc.Shame on me :crying: I'm sorry I didn't realize it was you asking the question, man - I have to pay attention.
If it makes you feel any better, I saw "flag them as inactive" and thought your clarification was very worthwhile in case someone else thought of something like an "IsActive" flag.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 1 through 15 (of 16 total)
You must be logged in to reply to this topic. Login to reply