Query help for special condition

  • Hi sql gurus,

    I am currently doing some R&D on MERGE statement and as you know it's little complex one, I am trying to get one solution for which I am not able to think how to implement it.

    I have to maintain a SCD type 2 dimension. My source contain data with start date and end date for say an employee and we have an employee id as identifier. For some reasons (plz don't ask as it's legacy system), our target table has a separate integer identifier along with employee id. Target table also contains start date, end date and is current flag.

    Now while i was trying to implement merge, problem is how to assign new identifier value for new record (it should be max value +1 , and this column is of course not identity column).

    Please help me with some example. I am scratching my head.

  • You know the drill. Can you provide some sample consumable DDL, data and desired results, please?

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • Off course. Here it goes.

    Source table contains data like this:

    EmpId attr1 attr2 attr3 attr4 effStartDate effEndDate

    E1001 abc xyz knop sts '2015-06-01 00:00:00.000' '9999-12-31 00:00:00.000'

    E1002 vah sje jshu kjd '2015-01-01 00:00:00.000' '2016-12-31 00:00:00.000'

    E1002 vah sst jshu kjd '2017-01-01 00:00:00.000' '9999-12-31 00:00:00.000'

     

    And target dataset (SCD type 2 table) is like:

    EmpAppId EmpId attr1 attr2 attr3 attr4 effStartDate effEndDate IsActive IsCurrent

    1000001 E1001 abc xyz knop sts '2015-06-01 00:00:00.000' '9999-12-31 00:00:00.000' 'Y' 'Y'

    1000002 E1002 vah sje jshu kjd '2015-01-01 00:00:00.000' '2016-12-31 00:00:00.000' 'Y' 'N'

    1000002 E1002 vah sst jshu kjd '2017-01-01 00:00:00.000' '9999-12-31 00:00:00.000' 'Y' 'Y'

     

    So, while next SCD update, we have to take care of following:

    1. From source only take records in to consideration with efdEndDate = '9999-12-31' for SCD update .

    2. From target only take records with IsCurrent ='Y' for SCD update

    3. Based on just EmpId, if for a record in source its not present in target, then add new one with max(EmpAppId)+1

    4. Based on just EmpId,  if for a record in target its not present in source, then mark that record as inactive by updating isActive flag = 'N'.

    5. If for an EmpId in source, a record is present in Target then check for effStartDate. If its not same then mark that record in target as isCurrent = 'N', effEndDate = (effStartDate of new record in source). Also make new entry of that source record in target with same EmpAppId as allocated for other records in target.

    Hope I am able to clarify my problem statement in simple way. If any point not clear, please let me know.

  • sqlenthu 89358 wrote:

    Now while i was trying to implement merge, problem is how to assign new identifier value for new record (it should be max value +1 , and this column is of course not identity column).

    Please help me with some example. I am scratching my head.

    This is exactly what I'm talking about.  The lofty theorizers should come up with this code.

    Could the EmpAppId column in the target table be dropped?  From the way it seemingly appears here the primary key of the target table could be EmpId, effStartDate, EffEndDate.  Is it so?  The EmpAppId is not unique

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Ah, my apologies. I'd assumed that, with >2,700 points, you would understand exactly what I meant.

    What we need here is code which other developers can run in SSMS.

    So

    1)

    DROP TABLE IF EXISTS #Source

    CREATE TABLE #Source(EmpId VARCHAR(x) NOT NULL, etc)

    Same for #Target

    2)

    INSERT #Source

    VALUES ()

    Same for #Target

    3)

    What should #Target look like after the query has run.

    Ideally represented as

    SELECT *

    FROM

    VALUES (),()

    The absence of evidence is not evidence of absence
    - Martin Rees
    The absence of consumable DDL, sample data and desired results is, however, evidence of the absence of my response
    - Phil Parkin

  • This is a beginning and it might get the OP past the id assignment issue.

    drop table if exists #src_data;
    go
    create table #src_data(
    EmpId varchar(12),
    attr1 char(3),
    attr2 char(3),
    attr3 char(4),
    attr4 char(3),
    effStartDate datetime,
    effEndDate datetime);

    drop table if exists #tgt_data;
    create table #tgt_data(
    EmpAppId int,
    EmpId varchar(12),
    attr1 char(3),
    attr2 char(3),
    attr3 char(4),
    attr4 char(3),
    effStartDate datetime,
    effEndDate datetime,
    IsActive char(1),
    IsCurrent char(1));

    insert #src_data(EmpId, attr1, attr2, attr3, attr4, effStartDate, effEndDate) values
    ('E1001', 'abc', 'xyz', 'knop', 'sts', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000'),
    ('E1002', 'vah', 'sje', 'jshu', 'kjd', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000'),
    ('E1002', 'vah', 'sst', 'jshu', 'kjd', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000'),
    /* new records */
    ('E9998', 'vah', 'sje', 'jshu', 'kjd', '2015-01-01 00:00:00.000', '9999-12-31 00:00:00.000'),
    ('E9999', 'vah', 'sje', 'jshu', 'kjd', '2015-01-01 00:00:00.000', '9999-12-31 00:00:00.000');

    insert #tgt_data(EmpAppId, EmpId, attr1, attr2, attr3, attr4, effStartDate, effEndDate, IsActive, IsCurrent) values
    (1000001, 'E1001', 'abc', 'xyz', 'knop', 'sts', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y'),
    (1000002, 'E1002', 'vah', 'sje', 'jshu', 'kjd', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000', 'Y', 'N'),
    (1000002, 'E1002', 'vah', 'sst', 'jshu', 'kjd', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y'),
    /* IsActive set to 'No' */
    (9000001, 'E9002', 'vah', 'sst', 'jshu', 'kjd', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y');

    set nocount on;
    set xact_abort on;

    begin transaction
    begin try
    --3)
    with
    max_key_cte(max_id) as (
    select max(EmpAppId) max_id from #tgt_data)
    insert #tgt_data(EmpAppId, EmpId, attr1, attr2, attr3, attr4, effStartDate, effEndDate, IsActive, IsCurrent)
    select
    mkc.max_id+row_number() over (order by (select null)),
    sd.*,
    'Y',
    'Y'
    from
    max_key_cte mkc
    cross join
    #src_data sd
    where
    sd.effEndDate='9999-12-31'
    and not exists(select 1 from #tgt_data td where sd.EmpId=td.EmpId);

    --4)
    update td
    set
    IsActive='N'
    from
    #tgt_data td
    where not exists
    (select 1 from #src_data sd where td.EmpId=sd.EmpId);

    --5) ???

    commit transaction;
    end try
    begin catch
    print ('No bueno');
    rollback transaction;
    end catch

    select * from #tgt_data;

     

    Aus dem Paradies, das Cantor uns geschaffen, soll uns niemand vertreiben können

  • Here is the script to have data at initial stage. Here table Target acts as SCD type 2 table

    IF OBJECT_ID('tempdb.dbo.#Source', 'U') IS NOT NULL
    DROP TABLE dbo.#Source;

    CREATE TABLE dbo.#Source
    (EmpId varchar(10), attr1 varchar(10), attr2 varchar(10), attr3 varchar(10), effStartDate datetime, effEndDate datetime)

    insert into dbo.#Source
    select 'E1001', 'Robert', 'Kreigik', 'WA', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
    select 'E1002', 'Samuel', 'Vaas', 'WA', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000' union all
    select 'E1002', 'Samuel', 'D Vaas', 'WA', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
    select 'E1003', 'Ned', 'Menderick', 'NY', '2016-09-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
    select 'E1004', 'Damien', 'Martin', 'NY', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000'

    IF OBJECT_ID('tempdb.dbo.#Target', 'U') IS NOT NULL
    DROP TABLE dbo.#Target;

    CREATE TABLE dbo.#Target
    (EmpAppId int, EmpId varchar(10), attr1 varchar(10), attr2 varchar(10), attr3 varchar(10), effStartDate datetime, effEndDate datetime,
    IsActive char(1), IsCurrent char(1))

    Insert into dbo.#Target
    select 1000001, 'E1001', 'Robert', 'Kreigik', 'WA', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y' UNION ALL
    select 1000002, 'E1002', 'Samuel', 'Vaas', 'WA', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000', 'Y', 'N' UNION ALL
    select 1000002, 'E1002', 'Samuel', 'D Vaas', 'WA', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y' UNION ALL
    select 1000003, 'E1003', 'Ned', 'Menderick', 'NY', '2016-09-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y' UNION ALL
    select 1000004, 'E1004', 'Damien', 'Martin', 'NY', '2017-01-01 00:00:00.000', '9999-12-31 00:00:00.000', 'Y', 'Y'

    Once you ran the above scripts, you will get output like this:

    Capture

    Now, after second run we get the new feed in source again (as truncate and load) and it looks like this:

    truncate table dbo.#Source

    insert into dbo.#Source
    select 'E1001', 'Robert', 'Kreigik', 'WA', '2015-06-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
    select 'E1002', 'Samuel', 'Vaas', 'WA', '2015-01-01 00:00:00.000', '2016-12-31 00:00:00.000' union all
    select 'E1002', 'Samuel', 'D Vaas', 'WA', '2017-01-01 00:00:00.000', '2017-05-31 00:00:00.000' union all
    select 'E1002', 'Samuel', 'D Vaas', 'NY', '2017-06-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
    select 'E1003', 'Ned', 'Menderick', 'NY', '2016-09-01 00:00:00.000', '9999-12-31 00:00:00.000' union all
    select 'E1005', 'Ruth', 'Gardener', 'OH', '2017-06-01 00:00:00.000', '9999-12-31 00:00:00.000'

    The output looks like this:

    Capture2

    Now i need a MERGE statement only so that i can update Target table in a way that:

    1. E1002 get modified based on attr3, changing the effEndDate and isCurrent in existing Target record and Inserting new record in Target table based on new Source record.
    2. E1004 should be marked as IsCurrent = 'N' changing the effEndDate as the record does not exist in Source anymore.
    3. E1005 should be inserted as new record in Target with EmpAppId  1000005 (which is max(EmpAppId of Source)+1)

    Hope it clarifies the requirement. I want to implement it as Merge only because actually these are big tables and i dont want to scan them again and again and also mainly want to learn Merge with this example.

  • Can anyone please help on this please ?

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply