Grouping based on chronological periods

  • Hello,

    I have a table Customers that contains the current value and another table Customers_History contains the history values.

    The customers have more that 10 fields

    AS Bellow a sample of data :

    CREATE TABLE #CUSTOMERS(

    ID int primary key identity,

    First_Name varchar(255),

    Last_Name varchar(255),

    Nick_name varchar(255),

    Email varchar(255),

    Date_Bithday date,

    Gender varchar(1),

    Is_Married int,

    Has_kids int,

    Customer_Type varchar(50),

    Account_Status varchar(50)

    )

    insert into #CUSTOMERS values('xxxxxxx','yyyyyyyy','zzzzzzzz','y.x@gmail.com','1986-06-01','M',1,1,'premium','actif')

    CREATE TABLE #CUSTOMERS_HISTORY(

    CUSTOMER_ID int,

    ColumnName varchar(50),

    PreviousValue varchar(50),

    CurrentValue varchar(50),

    DateChange datetime

    )

    insert into #CUSTOMERS_HISTORY Values(1,'Nick_name',NULL,'zzzzzzzz','2023-01-05 17:04:03.000')

    insert into #CUSTOMERS_HISTORY Values(1,'Email',NULL,'yyyyy.xxxx@gmail.com','2023-01-06 17:04:03.000')

    insert into #CUSTOMERS_HISTORY Values(1,'Email','yyyyy.xxxx@gmail.com','y.x@gmail.com','2023-01-06 18:04:03.000')

    insert into #CUSTOMERS_HISTORY Values(1,'Is_Married','0','1','2023-01-07 11:04:03.000')

    insert into #CUSTOMERS_HISTORY Values(1,'Has_kids','0','1','2023-01-07 11:04:03.000')

    insert into #CUSTOMERS_HISTORY Values(1,'Account_Status','actif','non actif','2023-01-08 11:04:03.000')

    insert into #CUSTOMERS_HISTORY Values(1,'Account_Status','non actif','suspend','2023-01-08 11:04:33.000')

    insert into #CUSTOMERS_HISTORY Values(1,'Account_Status','suspend','actif','2023-01-08 11:05:05.000')

    I want grouping the data based on chronological periods from Historical table (Like a Slowly changing dimension Type 2)

    The logic is taking the ColumnName Values from CUSTOMERS_HISTORY based on min DateChange that row will be the first row of my table and for each updated getting the next row

    Desired Data :

    Capture_

    Thanks for help !

  • This looks like a homework or assignment or an interview question, so you will forgive me for not just spitting out a full TSQL query for this. What is always nice to have is a "what have you tried" script. You MAY be really close to solving this, so knowing that would be helpful. But lacking that, this is my thought process on solving this which could be a completely different direction you were going in.

    I would start by breaking it up into multiple steps and solve each problem individually.

    Step 1 join on the ID and look at the data you have. From there you have all the data in a format you just need to do some parsing and logic on it. An example, for the Nick_name, do something similar to:

    CASE WHEN ColumnName = 'Nick_name' 
    THEN CurrentValue
    ELSE LAG(Nick_name) OVER (PARTION BY ID ORDER BY DateChange)
    END AS [Nick_name]

    NOTE - offhand, I don't remember the LAG syntax and the above MAY have a typo in it, but basically you want to look at the previous row and I believe that by default LAG is looking at the previous row unless otherwise specified.

    Step 2 repeat step 1 for all columns that could be changing in the history table

    Step 3 is a bit more tricky as it is a matter of getting that first row. But for that, you know the DateChange appears to be hard coded, so that field is easy and the other fields you are likely going to do something along the lines of:

    (SELECT PreviousValue 
    FROM #CUSTOMERS_HISTORY
    WHERE DateChange = (
    SELECT MIN(DateChange)
    FROM #CUSTOMER_HISTORY
    WHERE ColumnName = 'Nick_name'
    )
    )

    Step 4 UNION (PROBABLY want to use UNION ALL as that should get better performance and more accurate data) the results from step 2 and and step 3 and order it by whatever column makes sense to you (likely ID and DateChange, but maybe you care more about First_Name and DateChange... I'm not sure).

    Step 5 IF this is going to be used multiple times, then making a VIEW of the query is the next step I'd take. If the query is going to be used multiple times BUT is going to ALWAYS be filtered, then I'd use a stored procedure and likely add an OPTIMIZE FOR UNKNONW query hint.

    I would also recommend that you do a dump of some sample data to Excel and build up your own expected output and once you know what it should look like, compare it with what you got.

    Something to note though - your sample data doesn't match the output. The sample data says that initially the user is married and has kids, but then row 1 indicates they do not.

    Final note - the above approach is likely not the most efficient approach, but in my opinion it is one of the more readable approaches and easiest to explain what you are doing and how you are doing it. If performance is the goal, then I'd have more questions such as is realtime data important or is hour old data sufficient (for example) and how often does the data change?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Hi @bmg002

    Thanks for replying .

    I did a query but it doesn't look like efficient and need to add some code to complete it .

    Like i said the idea is following the history

    DECLARE @Histo AS DATE
    SET @Histo ='1900-12-31'

    ;with Customers as(
    SELECT *,GETDATE() AS [CreatedDate] FROM #CUSTOMERS
    )

    ,histo as (
    SELECT
    *
    ,ROW_NUMBER() OVER(partition by CUSTOMER_ID,ColumnName order by DateChange ASC) AS RN
    FROM #CUSTOMERS_HISTORY
    )

    ,histo_min_pivot as(
    SELECT CUSTOMER_ID,DateChange ,Account_Status,Email,Has_kids,Is_Married,Nick_name
    FROM
    (
    select
    CUSTOMER_ID,
    DateChange,
    PreviousValue,
    ColumnName
    from histo WHERE RN=1
    ) x
    pivot
    (
    max(PreviousValue)
    for ColumnName in (Account_Status,Email,Has_kids,Is_Married,Nick_name)
    ) p
    )

    ,histo_get_one_row as(
    SELECT
    CUSTOMER_ID
    ,MAX(Account_Status) AS Account_Status
    ,MAX(Email) AS Email
    ,MAX(Has_kids) AS Has_kids
    ,MAX(Is_Married) AS Is_Married
    ,MAX(Nick_name) AS Nick_name
    FROM histo_min_pivot
    GROUP BY CUSTOMER_ID
    )
    ,first_row_for_history as(
    select

    s.ID
    ,s.First_Name
    ,s.Last_Name
    ,h.Nick_name
    ,h.Email
    ,s.Date_Bithday
    ,s.Gender
    ,h.Is_Married
    ,h.Has_kids
    ,s.Customer_Type
    ,h.Account_Status
    ,@Histo DateChange
    from histo_get_one_row h
    inner join Customers s
    on h.CUSTOMER_ID=s.ID
    )
    ,histo_pivot_between_first_last_row as(
    SELECT CUSTOMER_ID,DateChange ,Account_Status,Email,Has_kids,Is_Married,Nick_name
    FROM
    (
    select
    CUSTOMER_ID,
    DateChange,
    CurrentValue,
    ColumnName
    from histo WHERE RN=1
    ) x
    pivot
    (
    max(CurrentValue)
    for ColumnName in (Account_Status,Email,Has_kids,Is_Married,Nick_name)
    ) p
    )
    ,between_rows as(
    select
    s.ID
    ,s.First_Name
    ,s.Last_Name
    ,h.Nick_name
    ,h.Email
    ,s.Date_Bithday
    ,s.Gender
    ,h.Is_Married
    ,h.Has_kids
    ,s.Customer_Type
    ,h.Account_Status
    ,h.DateChange
    from histo_pivot_between_first_last_row h
    inner join Customers s
    on h.CUSTOMER_ID=s.ID
    )
    select * from first_row_for_history
    union
    select * from between_rows
    order by DateChange

    • This reply was modified 10 months, 2 weeks ago by  MrRobot.
  • There was some problems with your CTE's for the between_rows data due to the filter on RN=1. That shouldn't be applied on the between_rows set of data (happens higher up in the CTE) as it then strips out multiple changes on a single column.

    BUT I believe this query builds on what you had originally with 1 minor change in the CTE's (I removed 1 of the RN=1 rows) and then built out the final SELECT:

    CREATE TABLE #CUSTOMERS
    (
    IDINT PRIMARY KEY IDENTITY,
    First_NameVARCHAR(255),
    Last_NameVARCHAR(255),
    Nick_nameVARCHAR(255),
    EmailVARCHAR(255),
    Date_BithdayDATE,
    GenderVARCHAR(1),
    Is_MarriedINT,
    Has_kidsINT,
    Customer_TypeVARCHAR(50),
    Account_StatusVARCHAR(50)
    );

    INSERT INTO #CUSTOMERS
    VALUES
    (
    'xxxxxxx',
    'yyyyyyyy',
    'zzzzzzzz',
    'y.x@gmail.com',
    '1986-06-01',
    'M',
    1,
    1,
    'premium',
    'actif'
    );

    CREATE TABLE #CUSTOMERS_HISTORY
    (
    CUSTOMER_IDINT,
    ColumnNameVARCHAR(50),
    PreviousValueVARCHAR(50),
    CurrentValueVARCHAR(50),
    DateChangeDATETIME
    );

    INSERT INTO #CUSTOMERS_HISTORY
    VALUES
    (
    1,
    'Nick_name',
    NULL,
    'zzzzzzzz',
    '2023-01-05 17:04:03.000'
    );

    INSERT INTO #CUSTOMERS_HISTORY
    VALUES
    (
    1,
    'Email',
    NULL,
    'yyyyy.xxxx@gmail.com',
    '2023-01-06 17:04:03.000'
    );

    INSERT INTO #CUSTOMERS_HISTORY
    VALUES
    (
    1,
    'Email',
    'yyyyy.xxxx@gmail.com',
    'y.x@gmail.com',
    '2023-01-06 18:04:03.000'
    );

    INSERT INTO #CUSTOMERS_HISTORY
    VALUES
    (
    1,
    'Is_Married',
    '0',
    '1',
    '2023-01-07 11:04:03.000'
    );

    INSERT INTO #CUSTOMERS_HISTORY
    VALUES
    (
    1,
    'Has_kids',
    '0',
    '1',
    '2023-01-07 11:04:03.000'
    );

    INSERT INTO #CUSTOMERS_HISTORY
    VALUES
    (
    1,
    'Account_Status',
    'actif',
    'non actif',
    '2023-01-08 11:04:03.000'
    );

    INSERT INTO #CUSTOMERS_HISTORY
    VALUES
    (
    1,
    'Account_Status',
    'non actif',
    'suspend',
    '2023-01-08 11:04:33.000'
    );

    INSERT INTO #CUSTOMERS_HISTORY
    VALUES
    (
    1,
    'Account_Status',
    'suspend',
    'actif',
    '2023-01-08 11:05:05.000'
    );

    DECLARE @Histo AS DATE;
    SET @Histo = '1900-12-31';
    WITH Customers
    AS
    (
    SELECT
    *,
    GETDATE()AS [CreatedDate]
    FROM#CUSTOMERS
    ),
    histo
    AS
    (
    SELECT
    *,
    ROW_NUMBER() OVER ( PARTITION BY
    CUSTOMER_ID,
    ColumnName
    ORDER BY DateChange ASC
    ) AS RN
    FROM#CUSTOMERS_HISTORY
    ),
    histo_min_pivot
    AS
    (
    SELECT
    CUSTOMER_ID,
    DateChange,
    Account_Status,
    Email,
    Has_kids,
    Is_Married,
    Nick_name
    FROM
    (
    SELECT
    CUSTOMER_ID,
    DateChange,
    PreviousValue,
    ColumnName
    FROMhisto
    WHERERN = 1
    ) x
    PIVOT
    (
    MAX(PreviousValue)
    FOR ColumnName IN
    (
    Account_Status,
    Email,
    Has_kids,
    Is_Married,
    Nick_name
    )
    ) p
    ),
    histo_get_one_row
    AS
    (
    SELECT
    CUSTOMER_ID,
    MAX(Account_Status) AS Account_Status,
    MAX(Email)AS Email,
    MAX(Has_kids)AS Has_kids,
    MAX(Is_Married)AS Is_Married,
    MAX(Nick_name)AS Nick_name
    FROMhisto_min_pivot
    GROUP BY CUSTOMER_ID
    ),
    first_row_for_history
    AS
    (
    SELECT
    s.ID,
    s.First_Name,
    s.Last_Name,
    h.Nick_name,
    h.Email,
    s.Date_Bithday,
    s.Gender,
    h.Is_Married,
    h.Has_kids,
    s.Customer_Type,
    h.Account_Status,
    @HistoDateChange
    FROMhisto_get_one_rowh
    INNER JOINCustomerss
    ONh.CUSTOMER_ID= s.ID
    ),
    histo_pivot_between_first_last_row
    AS
    (
    SELECT
    CUSTOMER_ID,
    DateChange,
    Account_Status,
    Email,
    Has_kids,
    Is_Married,
    Nick_name
    FROM
    (
    SELECT
    CUSTOMER_ID,
    DateChange,
    CurrentValue,
    ColumnName
    FROMhisto
    ) x
    PIVOT
    (
    MAX(CurrentValue)
    FOR ColumnName IN
    (
    Account_Status,
    Email,
    Has_kids,
    Is_Married,
    Nick_name
    )
    ) p
    ),
    between_rows
    AS
    (
    SELECT
    s.ID,
    s.First_Name,
    s.Last_Name,
    h.Nick_name,
    h.Email,
    s.Date_Bithday,
    s.Gender,
    h.Is_Married,
    h.Has_kids,
    s.Customer_Type,
    h.Account_Status,
    h.DateChange
    FROMhisto_pivot_between_first_last_rowh
    INNER JOINCustomerss
    ONh.CUSTOMER_ID= s.ID
    )
    SELECT
    first_row_for_history.ID,
    first_row_for_history.First_Name,
    first_row_for_history.Last_Name,
    first_row_for_history.Nick_name,
    first_row_for_history.Email,
    first_row_for_history.Date_Bithday,
    first_row_for_history.Gender,
    first_row_for_history.Is_Married,
    first_row_for_history.Has_kids,
    first_row_for_history.Customer_Type,
    first_row_for_history.Account_Status,
    first_row_for_history.DateChange
    FROMfirst_row_for_history
    UNION
    SELECT
    between_rows.ID,
    between_rows.First_Name,
    between_rows.Last_Name,
    CASE
    WHEN between_rows.Nick_name IS NOT NULL THEN
    between_rows.Nick_name
    WHEN
    (
    SELECTTOP 1
    br2.Nick_name
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Nick_name IS NOT NULL
    ORDER BY br2.DateChange DESC
    ) IS NOT NULL THEN
    (
    SELECTTOP 1
    br2.Nick_name
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Nick_name IS NOT NULL
    ORDER BY br2.DateChange DESC
    )
    ELSE
    first_row_for_history.Nick_name
    ENDAS Nick_name,
    CASE
    WHEN between_rows.Email IS NOT NULL THEN
    between_rows.Email
    WHEN
    (
    SELECTTOP 1
    br2.Email
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Email IS NOT NULL
    ORDER BY br2.DateChange DESC
    ) IS NOT NULL THEN
    (
    SELECTTOP 1
    br2.Email
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Email IS NOT NULL
    ORDER BY br2.DateChange DESC
    )
    ELSE
    first_row_for_history.Email
    ENDAS email,
    between_rows.Date_Bithday,
    CASE
    WHEN between_rows.Gender IS NOT NULL THEN
    between_rows.Gender
    WHEN
    (
    SELECTTOP 1
    br2.Gender
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Gender IS NOT NULL
    ORDER BY br2.DateChange DESC
    ) IS NOT NULL THEN
    (
    SELECTTOP 1
    br2.Gender
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Gender IS NOT NULL
    ORDER BY br2.DateChange DESC
    )
    ELSE
    first_row_for_history.Gender
    ENDAS Gender,
    CASE
    WHEN between_rows.Is_Married IS NOT NULL THEN
    between_rows.Is_Married
    WHEN
    (
    SELECTTOP 1
    br2.Is_Married
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Is_Married IS NOT NULL
    ORDER BY br2.DateChange DESC
    ) IS NOT NULL THEN
    (
    SELECTTOP 1
    br2.Is_Married
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Is_Married IS NOT NULL
    ORDER BY br2.DateChange DESC
    )
    ELSE
    first_row_for_history.Is_Married
    ENDAS Is_Married,
    CASE
    WHEN between_rows.Has_kids IS NOT NULL THEN
    between_rows.Has_kids
    WHEN
    (
    SELECTTOP 1
    br2.Has_kids
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Has_kids IS NOT NULL
    ORDER BY br2.DateChange DESC
    ) IS NOT NULL THEN
    (
    SELECTTOP 1
    br2.Has_kids
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Has_kids IS NOT NULL
    ORDER BY br2.DateChange DESC
    )
    ELSE
    first_row_for_history.Has_kids
    ENDAS Has_kids,
    CASE
    WHEN between_rows.Customer_Type IS NOT NULL THEN
    between_rows.Customer_Type
    WHEN
    (
    SELECTTOP 1
    br2.Customer_Type
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Customer_Type IS NOT NULL
    ORDER BY br2.DateChange DESC
    ) IS NOT NULL THEN
    (
    SELECTTOP 1
    br2.Customer_Type
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Customer_Type IS NOT NULL
    ORDER BY br2.DateChange DESC
    )
    ELSE
    first_row_for_history.Customer_Type
    ENDAS Customer_Type,
    CASE
    WHEN between_rows.Account_Status IS NOT NULL THEN
    between_rows.Account_Status
    WHEN
    (
    SELECTTOP 1
    br2.Account_Status
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Account_Status IS NOT NULL
    ORDER BY br2.DateChange DESC
    ) IS NOT NULL THEN
    (
    SELECTTOP 1
    br2.Account_Status
    FROMbetween_rows br2
    WHEREbr2.DateChange< between_rows.DateChange
    AND br2.Account_Status IS NOT NULL
    ORDER BY br2.DateChange DESC
    )
    ELSE
    first_row_for_history.Account_Status
    ENDAS Account_Status,
    between_rows.DateChange
    FROMbetween_rows
    JOINfirst_row_for_history
    ONfirst_row_for_history.ID= between_rows.ID
    ORDER BY DateChange;

    DROP TABLE #CUSTOMERS;
    DROP TABLE #CUSTOMERS_HISTORY;

    Looks a bit crazy, but basically IF a field that is changing (I used Nick_name, Email, Gender, Is_Married, Has_kids, Customer_Type, Account_Status, but you could expand this out to any column that makes sense... name could change I imagine too, so may want to include those), it first checks is this row the row it changed on. If so, put the new value. Otherwise, we look to find the most recent value it was. If both of these are blank, then we use the value from the first row.

    Now there are some problems with the above query that you can test out yourself pretty easily. Try deleting a value from #Customers_History that has a value in the #Customers table such as Nick_name (row 1 of #Customers_history) - you will notice that #Customers still has a Nick_name assigned, but the result set has it NULL the whole way down. This is not expected (to me), but may be expected to you.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Hi @bmg002,

    Thank you for your time.

    Yes i am agree with your comment , also it is not expected to me in that case i need to take the value from customers Table .

    Secondly , i have more than 10 columns to track in history , is there away to do it dynamically ?   or i need to do the test 10 times ?

    Thanks 🙂

  • This was removed by the editor as SPAM

  • For handling it dynamically, I imagine it would be possible, but dynamic SQL is always risky. Since this would be a "write once and never again" (hopefully), my opinion is that 10 columns is not that many to track with semi-copy pasted code.

    If you do decide to go with a dynamic SQL route, the query is going to get a LOT more messy as you have to have all of your CTE's in the dynamic SQL for the columns and it'll probably get messy very quickly and be a lot more work to debug and diagnose issues. PLUS you'd want to make sure nobody made a column that could result in SQL injection and if they did, you'd need to be careful how you handle it.

    My opinion, I would create this as a view or stored procedure (depending on how you plan to use it) and the query will be a write-once query and then just need to do updates if more columns are added.

    Are you needing help fixing the bug or are you on top of that?

    Also, are you needing anything optimized or does it run "good enough"? Not sure if I see any obvious optimizations, but I didn't really look too hard either. I was just taking your query and making it give the results you expected.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • Hi @bmg002,

    Unfortunately i didn't fixed it yet i need some help on that too

    I launched the query and it takes about 1h28 min , i think it's a lot ..

     

  • That is a long run time... so it sounds like it needs some optimizing.

    How much data are you working with and where are you presenting the data? If it is millions of rows coming back to the screen in SSMS (for example), the bottleneck is likely presenting it on screen. If it is closer to 1000 rows then the problem is probably on the data retrieval and calculation side.

    Also, have you reviewed the execution plan? If not, that's where I'd start to see where the bottleneck is.

    It would be good to know where the problems reside and correct the problem area. If the slowness is due to presenting it to screen, then the only way to make it faster is to reduce the result set. If the slowness is due to the calculations, then reducing calculations and CTE's may help.

    Was it just as slow before my proposed changes or did those make it drastically worse?

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • @bmg002,

    Well i have 1 millions on customers and 14900 on customers history.

    The first query that i share was speedy , i did a second query and better and looks like optimized.

    Unfortunately can see the execution plan because i don't have access only DBA users can do it

    AS Bellow the new code :

    DECLARE @Histo AS DATE
    SET @Histo ='1900-12-31'

    ;with Customers as(
    SELECT *,GETDATE() AS [CreatedDate] FROM #CUSTOMERS
    )
    ,histo as (
    SELECT
    *
    ,ROW_NUMBER() OVER(partition by CUSTOMER_ID,ColumnName order by DateChange ASC) AS RN
    FROM #CUSTOMERS_HISTORY
    )

    ,histo_min_pivot as(
    SELECT CUSTOMER_ID,DateChange ,Account_Status,Email,Has_kids,Is_Married,Nick_name
    FROM
    (
    select
    CUSTOMER_ID,
    DateChange,
    PreviousValue,
    ColumnName
    from histo WHERE RN=1
    ) x
    pivot
    (
    max(PreviousValue)
    for ColumnName in (Account_Status,Email,Has_kids,Is_Married,Nick_name)
    ) p
    )

    ,histo_get_one_row as(
    SELECT
    CUSTOMER_ID
    ,MAX(Account_Status) AS Account_Status
    ,MAX(Email) AS Email
    ,MAX(Has_kids) AS Has_kids
    ,MAX(Is_Married) AS Is_Married
    ,MAX(Nick_name) AS Nick_name
    FROM histo_min_pivot
    GROUP BY CUSTOMER_ID
    )
    ,first_row_for_history as(
    select

    s.ID
    ,s.First_Name
    ,s.Last_Name
    ,h.Nick_name
    ,h.Email
    ,s.Date_Bithday
    ,s.Gender
    ,h.Is_Married
    ,h.Has_kids
    ,s.Customer_Type
    ,h.Account_Status
    ,@Histo DateChange
    from histo_get_one_row h
    inner join Customers s
    on h.CUSTOMER_ID=s.ID
    )
    ,histo_pivot_between_first_last_row as(
    SELECT CUSTOMER_ID,DateChange ,Account_Status,Email,Has_kids,Is_Married,Nick_name
    FROM
    (
    select
    CUSTOMER_ID,
    DateChange,
    CurrentValue,
    ColumnName
    from histo WHERE RN=1
    ) x
    pivot
    (
    max(CurrentValue)
    for ColumnName in (Account_Status,Email,Has_kids,Is_Married,Nick_name)
    ) p
    )
    ,between_rows as(
    select
    s.ID
    ,s.First_Name
    ,s.Last_Name
    ,h.Nick_name
    ,h.Email
    ,s.Date_Bithday
    ,s.Gender
    ,h.Is_Married
    ,h.Has_kids
    ,s.Customer_Type
    ,h.Account_Status
    ,h.DateChange
    from histo_pivot_between_first_last_row h
    inner join Customers s
    on h.CUSTOMER_ID=s.ID
    )
    ,all_rows as(
    select * from first_row_for_history
    union
    select * from between_rows
    )
    ,end_q as(
    select
    t.ID
    ,t.First_Name
    ,t.Last_Name
    ,t2.Nick_name
    ,t3.Email
    ,t.Date_Bithday
    ,t.Gender
    ,t4.Is_Married
    ,t5.Has_kids
    ,t.Customer_Type
    ,t6.Account_Status
    ,t.DateChange
    from all_rows t
    outer apply(
    select top(1) t2.* from all_rows t2
    where (t.ID = t2.ID and t2.DateChange<=t.DateChange and t2.Nick_name is not null )
    order by t2.DateChange desc
    ) t2
    outer apply(
    select top(1) t3.* from all_rows t3
    where (t.ID = t3.ID and t3.DateChange<=t.DateChange and t3.Email is not null )
    order by t3.DateChange desc
    ) t3
    outer apply(
    select top(1) t4.* from all_rows t4
    where (t.ID = t4.ID and t4.DateChange<=t.DateChange and t4.Is_Married is not null )
    order by t4.DateChange desc
    ) t4
    outer apply(
    select top(1) t5.* from all_rows t5
    where (t.ID = t5.ID and t5.DateChange<=t.DateChange and t5.Has_kids is not null )
    order by t5.DateChange desc
    ) t5

    outer apply(
    select top(1) t6.* from all_rows t6
    where (t.ID = t6.ID and t6.DateChange<=t.DateChange and t6.Account_Status is not null )
    order by t6.DateChange desc
    ) t6

    )
    select * from end_q
    order by DateChange
  • I am glad you got it optimized. One thing that MAY also help with your optimization is to change your UNION to a UNION ALL. That will remove an implicit "DISTINCT" which is a slower operation.

    One word of warning though - if you ARE putting that into a view or stored procedure, I would be VERY hesitant to use "SELECT *". The reason being if anyone changes the table structure and adds a column (for example), the query will likely fail. It is much safer to only select the columns you need.

    Also, you will often get a performance boost by SQL using better indexes if you only select the rows you need. What I mean is in "end_q", all of the t# outer applied tables are doing a SELECT * BUT you are only looking at a single column. You will probably get a performance boost by only selecting the column you care about. There are other SELECT *'s that you may want to clean up too. Best practice is to NOT use SELECT * in production level code.

    I was also doing a quick test of your query and your new query has an issue in that it is missing the "suspend" account status in the result set and ends with the "non actif" status instead of "actif". Is that expected? If not, in the CTE "histo_pivot_between_first_last_row" you need to take out the "WHERE RN=1".

    Lastly, I did do a review of the actual execution plan on my system (with the temp tables) and your query (with my adjustment for the RN=1 line to ensure the same result set and UNION ALL instead of UNION, but I left the SELECT *'s in there) is faster than mine! Unfortunately, with my testing anyways, if a customer has no history, they do not show up in the result set it seems. So if I add a second customer to the customers table with no history, they do not show up in the results. I haven't had a chance to fix/review that bit yet, but will review it shortly.

    EDIT - to get the results of Customers with no history, change your INNER JOINs to RIGHT JOINs.

    Now all that is left is to get the fields IF they were set at creation time instead of due to a change. What I mean is if you add a customer with all fields populated, currently all fields that are changeable come back as NULL instead of the initial value.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

  • How is this (performance wise and for what you need):

    DECLARE @Histo AS DATE;
    SET @Histo = '1900-12-31';


    ;WITH Customers
    AS
    (
    SELECT
    *,
    GETDATE() AS [CreatedDate]
    FROM #CUSTOMERS
    ),
    histo
    AS
    (
    SELECT
    *,
    ROW_NUMBER() OVER ( PARTITION BY
    CUSTOMER_ID,
    ColumnName
    ORDER BY DateChange ASC
    ) AS RN
    FROM #CUSTOMERS_HISTORY
    ),
    histo_min_pivot
    AS
    (
    SELECT
    CUSTOMER_ID,
    DateChange,
    Account_Status,
    Email,
    Has_kids,
    Is_Married,
    Nick_name
    FROM
    (
    SELECT
    CUSTOMER_ID,
    DateChange,
    PreviousValue,
    ColumnName
    FROM histo
    WHERE RN = 1
    ) x
    PIVOT
    (
    MAX(PreviousValue)
    FOR ColumnName IN
    (
    Account_Status,
    Email,
    Has_kids,
    Is_Married,
    Nick_name
    )
    ) p
    UNION
    SELECT
    Customers.ID,
    @Histo,
    Customers.Account_Status,
    Customers.Email,
    Customers.Has_kids,
    Customers.Is_Married,
    Customers.Nick_name
    FROM Customers
    WHERE Customers.ID NOT IN
    (
    SELECT
    customer_ID
    FROM histo
    )
    ),
    histo_get_one_row
    AS
    (
    SELECT
    CUSTOMER_ID,
    MAX(Account_Status) AS Account_Status,
    MAX(Email) AS Email,
    MAX(Has_kids) AS Has_kids,
    MAX(Is_Married) AS Is_Married,
    MAX(Nick_name) AS Nick_name
    FROM histo_min_pivot
    GROUP BY CUSTOMER_ID
    ),
    first_row_for_history
    AS
    (
    SELECT
    s.ID,
    s.First_Name,
    s.Last_Name,
    h.Nick_name,
    h.Email,
    s.Date_Bithday,
    s.Gender,
    h.Is_Married,
    h.Has_kids,
    s.Customer_Type,
    h.Account_Status,
    @Histo DateChange
    FROM histo_get_one_row h
    RIGHT JOIN Customers s
    ON h.CUSTOMER_ID = s.ID
    ),
    histo_pivot_between_first_last_row
    AS
    (
    SELECT
    CUSTOMER_ID,
    DateChange,
    Account_Status,
    Email,
    Has_kids,
    Is_Married,
    Nick_name
    FROM
    (
    SELECT
    CUSTOMER_ID,
    DateChange,
    CurrentValue,
    ColumnName
    FROM histo --WHERE RN=1
    ) x
    PIVOT
    (
    MAX(CurrentValue)
    FOR ColumnName IN
    (
    Account_Status,
    Email,
    Has_kids,
    Is_Married,
    Nick_name
    )
    ) p
    ),
    between_rows
    AS
    (
    SELECT
    s.ID,
    s.First_Name,
    s.Last_Name,
    h.Nick_name,
    h.Email,
    s.Date_Bithday,
    s.Gender,
    h.Is_Married,
    h.Has_kids,
    s.Customer_Type,
    h.Account_Status,
    h.DateChange
    FROM histo_pivot_between_first_last_row h
    RIGHT JOIN Customers s
    ON h.CUSTOMER_ID = s.ID
    ),
    all_rows
    AS
    (
    SELECT
    *
    FROM first_row_for_history
    UNION ALL
    SELECT
    *
    FROM between_rows
    ),
    end_q
    AS
    (
    SELECT
    t.ID,
    t.First_Name,
    t.Last_Name,
    t2.Nick_name,
    t3.Email,
    t.Date_Bithday,
    t.Gender,
    t4.Is_Married,
    t5.Has_kids,
    t.Customer_Type,
    t6.Account_Status,
    t.DateChange
    FROM all_rows t
    OUTER APPLY
    (
    SELECT TOP ( 1 )
    t2.Nick_name
    FROM all_rows t2
    WHERE (
    t.ID = t2.ID
    AND t2.DateChange <= t.DateChange
    AND t2.Nick_name IS NOT NULL
    )
    ORDER BY t2.DateChange DESC
    ) t2
    OUTER APPLY
    (
    SELECT TOP ( 1 )
    t3.*
    FROM all_rows t3
    WHERE (
    t.ID = t3.ID
    AND t3.DateChange <= t.DateChange
    AND t3.Email IS NOT NULL
    )
    ORDER BY t3.DateChange DESC
    ) t3
    OUTER APPLY
    (
    SELECT TOP ( 1 )
    t4.*
    FROM all_rows t4
    WHERE (
    t.ID = t4.ID
    AND t4.DateChange <= t.DateChange
    AND t4.Is_Married IS NOT NULL
    )
    ORDER BY t4.DateChange DESC
    ) t4
    OUTER APPLY
    (
    SELECT TOP ( 1 )
    t5.*
    FROM all_rows t5
    WHERE (
    t.ID = t5.ID
    AND t5.DateChange <= t.DateChange
    AND t5.Has_kids IS NOT NULL
    )
    ORDER BY t5.DateChange DESC
    ) t5
    OUTER APPLY
    (
    SELECT TOP ( 1 )
    t6.*
    FROM all_rows t6
    WHERE (
    t.ID = t6.ID
    AND t6.DateChange <= t.DateChange
    AND t6.Account_Status IS NOT NULL
    )
    ORDER BY t6.DateChange DESC
    ) t6
    )
    SELECT
    *
    FROM end_q
    WHERE DateChange IS NOT NULL
    ORDER BY DateChange;

    Similar to what you had but with the following changes:

    1- UNION changed to UNION ALL

    2- SELECT *'s in the OUTER APPLY's changed to select proper columns

    3- removed one of the "WHERE RN=1" so that you get the full result set for the change history

    4- if a customer has NO change history, then the current active data is pulled in

    5- RIGHT JOIN instead of INNER JOIN

    KNOWN BUG - if a customer HAS  a change on only some of the fields, not all fields (for example, the Nick_name is the only change) then the results set is incorrect for the customer. ALL changeable Fields must be changed for the result set to be accurate. I am working on a fix for that.

    • This reply was modified 10 months, 2 weeks ago by  Mr. Brian Gale. Reason: converted tabs to spaces because the forum seems to hate tabs in code
    • This reply was modified 10 months, 2 weeks ago by  Mr. Brian Gale.

    The above is all just my opinion on what you should do. 
    As with all advice you find on a random internet forum - you shouldn't blindly follow it.  Always test on a test server to see if there is negative side effects before making changes to live!
    I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply