What is the best design pattern for CDC and SCD2 processing

Question

Post reply

What is the best design pattern for CDC and SCD2 processing

Scott In Sydney

Hall of Fame

Points: 3331
More actions
November 16, 2017 at 10:14 pm

#324574

tl;dr
1) Should I use MERGE, or not???  https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/. Aaron's a pretty switched on guy; I tend to believe what he says!!!
2) I'd like a best practice design pattern for T-SQL only, for Change Data Capture and SCD2.
3) Regardless, I'll be developing in SSDT. So, should I use SSDT for this, or T-SQL? Does one perform better than the other, or easier to develop and maintain? For SSDT I'm planning on following this pattern:  https://datasavvy.me/2015/12/20/demystifying-the-type-2-slowly-changing-dimension-with-biml/
4) Is there a best practice for CDC: checksum, binary_checksum, hashbytes (slow?), EXISTS(), etc, etc?
Details:
Most of the examples I've found for SCD2 processing has been using MERGE. But then I saw Aaron's blog post in #1.
I've been working on design patterns for both CDC and SCD2. To keep this post small, I've uploaded my current design pattern prototypes here: https://github.com/scottbass/SQLServer. If you're willing to look and provide advice that would be fantastic.
For the below code, is there a way to improve performance? I note this post:  https://blogs.msdn.microsoft.com/miah/2008/02/17/sql-if-exists-update-else-insert/. But I couldn't wrap my head around how to fold this into my prototypes.
I hope this is enough detail, ping me if you need more info...
/* Run once
IF OBJECT_ID('tempdb.dbo.#source') IS NOT NULL DROP TABLE #source; IF OBJECT_ID('tempdb.dbo.#target') IS NOT NULL DROP TABLE #target;
-- Conditions: -- 1) Source data is not SCD2 -- 2) Target data is not SCD2 -- 3) Use NOT EXISTS and INTERSECT for Change Data Capture -- 4) Physically delete target rows not in source -- 5) Both source and target have an identity column for SK (they don't have to match) (or else join on the NK) -- 6) Use a join to create a XREF table of the source and target SK based on the NK
CREATE TABLE #source ( SK    INT IDENTITY(1,1) , FirstName VARCHAR(20) , LastName  VARCHAR(30) , EmailAddress VARCHAR(50) )
CREATE TABLE #target ( SK    INT IDENTITY(1,1) , FirstName VARCHAR(20) , LastName  VARCHAR(30) , EmailAddress VARCHAR(50) , Status   CHAR(1) )
INSERT INTO #source (FirstName,LastName,EmailAddress) VALUES ('John','Doe','john.doe@foo.com') ,('Mary','Jones','mary.jones@bar.com') ,('Joe','Bloggs','joe@bloggs.com')
*/
-- Create code as a SP for easy reuse DROP PROCEDURE deleteme GO CREATE PROCEDURE deleteme AS BEGIN
-- Use a full outer join to create a XREF table IF OBJECT_ID('tempdb.dbo.#xref') IS NOT NULL DROP TABLE #xref;
;WITH ACTION (ACTION, SRC_SK, TGT_SK, CHANGED) AS ( SELECT CASE    -- List the WHEN clauses in order of probability    WHEN (tgt.SK IS NULL and src.SK IS NOT NULL) THEN 'INSERT'    WHEN (tgt.SK IS NOT NULL and src.SK IS NOT NULL) THEN 'UPDATE'    WHEN (tgt.SK IS NOT NULL and src.SK IS NULL) THEN 'DELETE'    ELSE '' -- Will never happen but CASE likes an ELSE clause    END AS ACTION   ,src.SK AS SRC_SK   ,tgt.SK AS TGT_SK   ,CASE    WHEN NOT EXISTS (     SELECT       src.FirstName       ,src.LastName       ,src.EmailAddress
    INTERSECT
    SELECT       tgt.FirstName       ,tgt.LastName       ,tgt.EmailAddress     ) THEN 1 ELSE 0    END AS CHANGED FROM #source src FULL OUTER JOIN #target tgt ON src.FirstName = tgt.FirstName AND src.LastName = tgt.LastName ) SELECT * INTO #xref FROM ACTION a WHERE a.ACTION IN ('INSERT','DELETE') OR (a.ACTION = 'UPDATE' AND a.CHANGED = 1) ;
-- Use the XREF table to issue separate INSERT/UPDATE/DELETE blocks BEGIN TRANSACTION
-- INSERT INSERT INTO #target (FirstName,LastName,EmailAddress,Status) SELECT src.FirstName   ,src.LastName   ,src.EmailAddress   ,'I' FROM #source src INNER JOIN #xref x ON src.SK = x.SRC_SK WHERE x.ACTION = 'INSERT';
-- UPDATE UPDATE tgt SET FirstName = src.FirstName   ,LastName = src.LastName   ,EmailAddress = src.EmailAddress   ,Status   = 'U' FROM #target tgt INNER JOIN #xref x ON tgt.SK = x.TGT_SK INNER JOIN #source src ON src.SK = x.SRC_SK WHERE x.ACTION = 'UPDATE'
-- DELETE DELETE tgt FROM #target tgt INNER JOIN #xref x ON tgt.SK = x.TGT_SK WHERE x.ACTION = 'DELETE';
COMMIT;
SELECT * FROM #source; SELECT * FROM #target; SELECT * FROM #xref;
END
-- Load #1: All new rows TRUNCATE TABLE #target; EXEC deleteme;
-- Load #2: No change (run the SP again) -- There should be no change to target EXEC deleteme;
-- Load #3: -- Record #1 unchanged -- Record #2 changed (married, new email address) -- Record #3 deleted -- Record #4 added UPDATE #source SET EmailAddress='mary.smith@blah.com' WHERE SK=2; DELETE FROM #source WHERE SK=3; INSERT INTO #source (FirstName,LastName,EmailAddress) VALUES ('Billy','Bob','william.robert@gmail.com')
EXEC deleteme;

Viewing 2 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply

Scott In Sydney Hall of Fame Points: 3331 More actions · Answer 1

Ok, perhaps my previous post was 1) too broad, and 2) too long.

I'll simplify:

1) Should I use MERGE, or are there (still) bugs? Please frame your answer in light of https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/.

2) If you have any design patterns you use for SCD2 processing, please describe your approach, and provide code if possible and willing.