January 8, 2024 at 6:40 pm
Hello,
I have a table Customers that contains the current value and another table Customers_History contains the history values.
The customers have more that 10 fields
AS Bellow a sample of data :
CREATE TABLE #CUSTOMERS(
ID int primary key identity,
First_Name varchar(255),
Last_Name varchar(255),
Nick_name varchar(255),
Email varchar(255),
Date_Bithday date,
Gender varchar(1),
Is_Married int,
Has_kids int,
Customer_Type varchar(50),
Account_Status varchar(50)
)
insert into #CUSTOMERS values('xxxxxxx','yyyyyyyy','zzzzzzzz','y.x@gmail.com','1986-06-01','M',1,1,'premium','actif')
CREATE TABLE #CUSTOMERS_HISTORY(
CUSTOMER_ID int,
ColumnName varchar(50),
PreviousValue varchar(50),
CurrentValue varchar(50),
DateChange datetime
)
insert into #CUSTOMERS_HISTORY Values(1,'Nick_name',NULL,'zzzzzzzz','2023-01-05 17:04:03.000')
insert into #CUSTOMERS_HISTORY Values(1,'Email',NULL,'yyyyy.xxxx@gmail.com','2023-01-06 17:04:03.000')
insert into #CUSTOMERS_HISTORY Values(1,'Email','yyyyy.xxxx@gmail.com','y.x@gmail.com','2023-01-06 18:04:03.000')
insert into #CUSTOMERS_HISTORY Values(1,'Is_Married','0','1','2023-01-07 11:04:03.000')
insert into #CUSTOMERS_HISTORY Values(1,'Has_kids','0','1','2023-01-07 11:04:03.000')
insert into #CUSTOMERS_HISTORY Values(1,'Account_Status','actif','non actif','2023-01-08 11:04:03.000')
insert into #CUSTOMERS_HISTORY Values(1,'Account_Status','non actif','suspend','2023-01-08 11:04:33.000')
insert into #CUSTOMERS_HISTORY Values(1,'Account_Status','suspend','actif','2023-01-08 11:05:05.000')
I want grouping the data based on chronological periods from Historical table (Like a Slowly changing dimension Type 2)
The logic is taking the ColumnName Values from CUSTOMERS_HISTORY based on min DateChange that row will be the first row of my table and for each updated getting the next row
Desired Data :
Thanks for help !
January 8, 2024 at 8:12 pm
This looks like a homework or assignment or an interview question, so you will forgive me for not just spitting out a full TSQL query for this. What is always nice to have is a "what have you tried" script. You MAY be really close to solving this, so knowing that would be helpful. But lacking that, this is my thought process on solving this which could be a completely different direction you were going in.
I would start by breaking it up into multiple steps and solve each problem individually.
Step 1 join on the ID and look at the data you have. From there you have all the data in a format you just need to do some parsing and logic on it. An example, for the Nick_name, do something similar to:
CASE WHEN ColumnName = 'Nick_name'
THEN CurrentValue
ELSE LAG(Nick_name) OVER (PARTION BY ID ORDER BY DateChange)
END AS [Nick_name]
NOTE - offhand, I don't remember the LAG syntax and the above MAY have a typo in it, but basically you want to look at the previous row and I believe that by default LAG is looking at the previous row unless otherwise specified.
Step 2 repeat step 1 for all columns that could be changing in the history table
Step 3 is a bit more tricky as it is a matter of getting that first row. But for that, you know the DateChange appears to be hard coded, so that field is easy and the other fields you are likely going to do something along the lines of:
(SELECT PreviousValue
FROM #CUSTOMERS_HISTORY
WHERE DateChange = (
SELECT MIN(DateChange)
FROM #CUSTOMER_HISTORY
WHERE ColumnName = 'Nick_name'
)
)
Step 4 UNION (PROBABLY want to use UNION ALL as that should get better performance and more accurate data) the results from step 2 and and step 3 and order it by whatever column makes sense to you (likely ID and DateChange, but maybe you care more about First_Name and DateChange... I'm not sure).
Step 5 IF this is going to be used multiple times, then making a VIEW of the query is the next step I'd take. If the query is going to be used multiple times BUT is going to ALWAYS be filtered, then I'd use a stored procedure and likely add an OPTIMIZE FOR UNKNONW query hint.
I would also recommend that you do a dump of some sample data to Excel and build up your own expected output and once you know what it should look like, compare it with what you got.
Something to note though - your sample data doesn't match the output. The sample data says that initially the user is married and has kids, but then row 1 indicates they do not.
Final note - the above approach is likely not the most efficient approach, but in my opinion it is one of the more readable approaches and easiest to explain what you are doing and how you are doing it. If performance is the goal, then I'd have more questions such as is realtime data important or is hour old data sufficient (for example) and how often does the data change?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 8, 2024 at 8:33 pm
Hi @bmg002
Thanks for replying .
I did a query but it doesn't look like efficient and need to add some code to complete it .
Like i said the idea is following the history
DECLARE @Histo AS DATE
SET @Histo ='1900-12-31'
;with Customers as(
SELECT *,GETDATE() AS [CreatedDate] FROM #CUSTOMERS
)
,histo as (
SELECT
*
,ROW_NUMBER() OVER(partition by CUSTOMER_ID,ColumnName order by DateChange ASC) AS RN
FROM #CUSTOMERS_HISTORY
)
,histo_min_pivot as(
SELECT CUSTOMER_ID,DateChange ,Account_Status,Email,Has_kids,Is_Married,Nick_name
FROM
(
select
CUSTOMER_ID,
DateChange,
PreviousValue,
ColumnName
from histo WHERE RN=1
) x
pivot
(
max(PreviousValue)
for ColumnName in (Account_Status,Email,Has_kids,Is_Married,Nick_name)
) p
)
,histo_get_one_row as(
SELECT
CUSTOMER_ID
,MAX(Account_Status) AS Account_Status
,MAX(Email) AS Email
,MAX(Has_kids) AS Has_kids
,MAX(Is_Married) AS Is_Married
,MAX(Nick_name) AS Nick_name
FROM histo_min_pivot
GROUP BY CUSTOMER_ID
)
,first_row_for_history as(
select
s.ID
,s.First_Name
,s.Last_Name
,h.Nick_name
,h.Email
,s.Date_Bithday
,s.Gender
,h.Is_Married
,h.Has_kids
,s.Customer_Type
,h.Account_Status
,@Histo DateChange
from histo_get_one_row h
inner join Customers s
on h.CUSTOMER_ID=s.ID
)
,histo_pivot_between_first_last_row as(
SELECT CUSTOMER_ID,DateChange ,Account_Status,Email,Has_kids,Is_Married,Nick_name
FROM
(
select
CUSTOMER_ID,
DateChange,
CurrentValue,
ColumnName
from histo WHERE RN=1
) x
pivot
(
max(CurrentValue)
for ColumnName in (Account_Status,Email,Has_kids,Is_Married,Nick_name)
) p
)
,between_rows as(
select
s.ID
,s.First_Name
,s.Last_Name
,h.Nick_name
,h.Email
,s.Date_Bithday
,s.Gender
,h.Is_Married
,h.Has_kids
,s.Customer_Type
,h.Account_Status
,h.DateChange
from histo_pivot_between_first_last_row h
inner join Customers s
on h.CUSTOMER_ID=s.ID
)
select * from first_row_for_history
union
select * from between_rows
order by DateChange
January 8, 2024 at 10:06 pm
There was some problems with your CTE's for the between_rows data due to the filter on RN=1. That shouldn't be applied on the between_rows set of data (happens higher up in the CTE) as it then strips out multiple changes on a single column.
BUT I believe this query builds on what you had originally with 1 minor change in the CTE's (I removed 1 of the RN=1 rows) and then built out the final SELECT:
CREATE TABLE #CUSTOMERS
(
IDINT PRIMARY KEY IDENTITY,
First_NameVARCHAR(255),
Last_NameVARCHAR(255),
Nick_nameVARCHAR(255),
EmailVARCHAR(255),
Date_BithdayDATE,
GenderVARCHAR(1),
Is_MarriedINT,
Has_kidsINT,
Customer_TypeVARCHAR(50),
Account_StatusVARCHAR(50)
);
INSERT INTO #CUSTOMERS
VALUES
(
'xxxxxxx',
'yyyyyyyy',
'zzzzzzzz',
'y.x@gmail.com',
'1986-06-01',
'M',
1,
1,
'premium',
'actif'
);
CREATE TABLE #CUSTOMERS_HISTORY
(
CUSTOMER_IDINT,
ColumnNameVARCHAR(50),
PreviousValueVARCHAR(50),
CurrentValueVARCHAR(50),
DateChangeDATETIME
);
INSERT INTO #CUSTOMERS_HISTORY
VALUES
(
1,
'Nick_name',
NULL,
'zzzzzzzz',
'2023-01-05 17:04:03.000'
);
INSERT INTO #CUSTOMERS_HISTORY
VALUES
(
1,
'Email',
NULL,
'yyyyy.xxxx@gmail.com',
'2023-01-06 17:04:03.000'
);
INSERT INTO #CUSTOMERS_HISTORY
VALUES
(
1,
'Email',
'yyyyy.xxxx@gmail.com',
'y.x@gmail.com',
'2023-01-06 18:04:03.000'
);
INSERT INTO #CUSTOMERS_HISTORY
VALUES
(
1,
'Is_Married',
'0',
'1',
'2023-01-07 11:04:03.000'
);
INSERT INTO #CUSTOMERS_HISTORY
VALUES
(
1,
'Has_kids',
'0',
'1',
'2023-01-07 11:04:03.000'
);
INSERT INTO #CUSTOMERS_HISTORY
VALUES
(
1,
'Account_Status',
'actif',
'non actif',
'2023-01-08 11:04:03.000'
);
INSERT INTO #CUSTOMERS_HISTORY
VALUES
(
1,
'Account_Status',
'non actif',
'suspend',
'2023-01-08 11:04:33.000'
);
INSERT INTO #CUSTOMERS_HISTORY
VALUES
(
1,
'Account_Status',
'suspend',
'actif',
'2023-01-08 11:05:05.000'
);
DECLARE @Histo AS DATE;
SET @Histo = '1900-12-31';
WITH Customers
AS
(
SELECT
*,
GETDATE()AS [CreatedDate]
FROM#CUSTOMERS
),
histo
AS
(
SELECT
*,
ROW_NUMBER() OVER ( PARTITION BY
CUSTOMER_ID,
ColumnName
ORDER BY DateChange ASC
) AS RN
FROM#CUSTOMERS_HISTORY
),
histo_min_pivot
AS
(
SELECT
CUSTOMER_ID,
DateChange,
Account_Status,
Email,
Has_kids,
Is_Married,
Nick_name
FROM
(
SELECT
CUSTOMER_ID,
DateChange,
PreviousValue,
ColumnName
FROMhisto
WHERERN = 1
) x
PIVOT
(
MAX(PreviousValue)
FOR ColumnName IN
(
Account_Status,
Email,
Has_kids,
Is_Married,
Nick_name
)
) p
),
histo_get_one_row
AS
(
SELECT
CUSTOMER_ID,
MAX(Account_Status) AS Account_Status,
MAX(Email)AS Email,
MAX(Has_kids)AS Has_kids,
MAX(Is_Married)AS Is_Married,
MAX(Nick_name)AS Nick_name
FROMhisto_min_pivot
GROUP BY CUSTOMER_ID
),
first_row_for_history
AS
(
SELECT
s.ID,
s.First_Name,
s.Last_Name,
h.Nick_name,
h.Email,
s.Date_Bithday,
s.Gender,
h.Is_Married,
h.Has_kids,
s.Customer_Type,
h.Account_Status,
@HistoDateChange
FROMhisto_get_one_rowh
INNER JOINCustomerss
ONh.CUSTOMER_ID= s.ID
),
histo_pivot_between_first_last_row
AS
(
SELECT
CUSTOMER_ID,
DateChange,
Account_Status,
Email,
Has_kids,
Is_Married,
Nick_name
FROM
(
SELECT
CUSTOMER_ID,
DateChange,
CurrentValue,
ColumnName
FROMhisto
) x
PIVOT
(
MAX(CurrentValue)
FOR ColumnName IN
(
Account_Status,
Email,
Has_kids,
Is_Married,
Nick_name
)
) p
),
between_rows
AS
(
SELECT
s.ID,
s.First_Name,
s.Last_Name,
h.Nick_name,
h.Email,
s.Date_Bithday,
s.Gender,
h.Is_Married,
h.Has_kids,
s.Customer_Type,
h.Account_Status,
h.DateChange
FROMhisto_pivot_between_first_last_rowh
INNER JOINCustomerss
ONh.CUSTOMER_ID= s.ID
)
SELECT
first_row_for_history.ID,
first_row_for_history.First_Name,
first_row_for_history.Last_Name,
first_row_for_history.Nick_name,
first_row_for_history.Email,
first_row_for_history.Date_Bithday,
first_row_for_history.Gender,
first_row_for_history.Is_Married,
first_row_for_history.Has_kids,
first_row_for_history.Customer_Type,
first_row_for_history.Account_Status,
first_row_for_history.DateChange
FROMfirst_row_for_history
UNION
SELECT
between_rows.ID,
between_rows.First_Name,
between_rows.Last_Name,
CASE
WHEN between_rows.Nick_name IS NOT NULL THEN
between_rows.Nick_name
WHEN
(
SELECTTOP 1
br2.Nick_name
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Nick_name IS NOT NULL
ORDER BY br2.DateChange DESC
) IS NOT NULL THEN
(
SELECTTOP 1
br2.Nick_name
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Nick_name IS NOT NULL
ORDER BY br2.DateChange DESC
)
ELSE
first_row_for_history.Nick_name
ENDAS Nick_name,
CASE
WHEN between_rows.Email IS NOT NULL THEN
between_rows.Email
WHEN
(
SELECTTOP 1
br2.Email
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Email IS NOT NULL
ORDER BY br2.DateChange DESC
) IS NOT NULL THEN
(
SELECTTOP 1
br2.Email
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Email IS NOT NULL
ORDER BY br2.DateChange DESC
)
ELSE
first_row_for_history.Email
ENDAS email,
between_rows.Date_Bithday,
CASE
WHEN between_rows.Gender IS NOT NULL THEN
between_rows.Gender
WHEN
(
SELECTTOP 1
br2.Gender
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Gender IS NOT NULL
ORDER BY br2.DateChange DESC
) IS NOT NULL THEN
(
SELECTTOP 1
br2.Gender
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Gender IS NOT NULL
ORDER BY br2.DateChange DESC
)
ELSE
first_row_for_history.Gender
ENDAS Gender,
CASE
WHEN between_rows.Is_Married IS NOT NULL THEN
between_rows.Is_Married
WHEN
(
SELECTTOP 1
br2.Is_Married
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Is_Married IS NOT NULL
ORDER BY br2.DateChange DESC
) IS NOT NULL THEN
(
SELECTTOP 1
br2.Is_Married
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Is_Married IS NOT NULL
ORDER BY br2.DateChange DESC
)
ELSE
first_row_for_history.Is_Married
ENDAS Is_Married,
CASE
WHEN between_rows.Has_kids IS NOT NULL THEN
between_rows.Has_kids
WHEN
(
SELECTTOP 1
br2.Has_kids
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Has_kids IS NOT NULL
ORDER BY br2.DateChange DESC
) IS NOT NULL THEN
(
SELECTTOP 1
br2.Has_kids
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Has_kids IS NOT NULL
ORDER BY br2.DateChange DESC
)
ELSE
first_row_for_history.Has_kids
ENDAS Has_kids,
CASE
WHEN between_rows.Customer_Type IS NOT NULL THEN
between_rows.Customer_Type
WHEN
(
SELECTTOP 1
br2.Customer_Type
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Customer_Type IS NOT NULL
ORDER BY br2.DateChange DESC
) IS NOT NULL THEN
(
SELECTTOP 1
br2.Customer_Type
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Customer_Type IS NOT NULL
ORDER BY br2.DateChange DESC
)
ELSE
first_row_for_history.Customer_Type
ENDAS Customer_Type,
CASE
WHEN between_rows.Account_Status IS NOT NULL THEN
between_rows.Account_Status
WHEN
(
SELECTTOP 1
br2.Account_Status
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Account_Status IS NOT NULL
ORDER BY br2.DateChange DESC
) IS NOT NULL THEN
(
SELECTTOP 1
br2.Account_Status
FROMbetween_rows br2
WHEREbr2.DateChange< between_rows.DateChange
AND br2.Account_Status IS NOT NULL
ORDER BY br2.DateChange DESC
)
ELSE
first_row_for_history.Account_Status
ENDAS Account_Status,
between_rows.DateChange
FROMbetween_rows
JOINfirst_row_for_history
ONfirst_row_for_history.ID= between_rows.ID
ORDER BY DateChange;
DROP TABLE #CUSTOMERS;
DROP TABLE #CUSTOMERS_HISTORY;
Looks a bit crazy, but basically IF a field that is changing (I used Nick_name, Email, Gender, Is_Married, Has_kids, Customer_Type, Account_Status, but you could expand this out to any column that makes sense... name could change I imagine too, so may want to include those), it first checks is this row the row it changed on. If so, put the new value. Otherwise, we look to find the most recent value it was. If both of these are blank, then we use the value from the first row.
Now there are some problems with the above query that you can test out yourself pretty easily. Try deleting a value from #Customers_History that has a value in the #Customers table such as Nick_name (row 1 of #Customers_history) - you will notice that #Customers still has a Nick_name assigned, but the result set has it NULL the whole way down. This is not expected (to me), but may be expected to you.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 8, 2024 at 10:37 pm
Hi @bmg002,
Thank you for your time.
Yes i am agree with your comment , also it is not expected to me in that case i need to take the value from customers Table .
Secondly , i have more than 10 columns to track in history , is there away to do it dynamically ? or i need to do the test 10 times ?
Thanks 🙂
January 9, 2024 at 7:36 am
This was removed by the editor as SPAM
January 9, 2024 at 7:04 pm
For handling it dynamically, I imagine it would be possible, but dynamic SQL is always risky. Since this would be a "write once and never again" (hopefully), my opinion is that 10 columns is not that many to track with semi-copy pasted code.
If you do decide to go with a dynamic SQL route, the query is going to get a LOT more messy as you have to have all of your CTE's in the dynamic SQL for the columns and it'll probably get messy very quickly and be a lot more work to debug and diagnose issues. PLUS you'd want to make sure nobody made a column that could result in SQL injection and if they did, you'd need to be careful how you handle it.
My opinion, I would create this as a view or stored procedure (depending on how you plan to use it) and the query will be a write-once query and then just need to do updates if more columns are added.
Are you needing help fixing the bug or are you on top of that?
Also, are you needing anything optimized or does it run "good enough"? Not sure if I see any obvious optimizations, but I didn't really look too hard either. I was just taking your query and making it give the results you expected.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 9, 2024 at 8:21 pm
That is a long run time... so it sounds like it needs some optimizing.
How much data are you working with and where are you presenting the data? If it is millions of rows coming back to the screen in SSMS (for example), the bottleneck is likely presenting it on screen. If it is closer to 1000 rows then the problem is probably on the data retrieval and calculation side.
Also, have you reviewed the execution plan? If not, that's where I'd start to see where the bottleneck is.
It would be good to know where the problems reside and correct the problem area. If the slowness is due to presenting it to screen, then the only way to make it faster is to reduce the result set. If the slowness is due to the calculations, then reducing calculations and CTE's may help.
Was it just as slow before my proposed changes or did those make it drastically worse?
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 9, 2024 at 8:41 pm
Well i have 1 millions on customers and 14900 on customers history.
The first query that i share was speedy , i did a second query and better and looks like optimized.
Unfortunately can see the execution plan because i don't have access only DBA users can do it
AS Bellow the new code :
DECLARE @Histo AS DATE
SET @Histo ='1900-12-31'
;with Customers as(
SELECT *,GETDATE() AS [CreatedDate] FROM #CUSTOMERS
)
,histo as (
SELECT
*
,ROW_NUMBER() OVER(partition by CUSTOMER_ID,ColumnName order by DateChange ASC) AS RN
FROM #CUSTOMERS_HISTORY
)
,histo_min_pivot as(
SELECT CUSTOMER_ID,DateChange ,Account_Status,Email,Has_kids,Is_Married,Nick_name
FROM
(
select
CUSTOMER_ID,
DateChange,
PreviousValue,
ColumnName
from histo WHERE RN=1
) x
pivot
(
max(PreviousValue)
for ColumnName in (Account_Status,Email,Has_kids,Is_Married,Nick_name)
) p
)
,histo_get_one_row as(
SELECT
CUSTOMER_ID
,MAX(Account_Status) AS Account_Status
,MAX(Email) AS Email
,MAX(Has_kids) AS Has_kids
,MAX(Is_Married) AS Is_Married
,MAX(Nick_name) AS Nick_name
FROM histo_min_pivot
GROUP BY CUSTOMER_ID
)
,first_row_for_history as(
select
s.ID
,s.First_Name
,s.Last_Name
,h.Nick_name
,h.Email
,s.Date_Bithday
,s.Gender
,h.Is_Married
,h.Has_kids
,s.Customer_Type
,h.Account_Status
,@Histo DateChange
from histo_get_one_row h
inner join Customers s
on h.CUSTOMER_ID=s.ID
)
,histo_pivot_between_first_last_row as(
SELECT CUSTOMER_ID,DateChange ,Account_Status,Email,Has_kids,Is_Married,Nick_name
FROM
(
select
CUSTOMER_ID,
DateChange,
CurrentValue,
ColumnName
from histo WHERE RN=1
) x
pivot
(
max(CurrentValue)
for ColumnName in (Account_Status,Email,Has_kids,Is_Married,Nick_name)
) p
)
,between_rows as(
select
s.ID
,s.First_Name
,s.Last_Name
,h.Nick_name
,h.Email
,s.Date_Bithday
,s.Gender
,h.Is_Married
,h.Has_kids
,s.Customer_Type
,h.Account_Status
,h.DateChange
from histo_pivot_between_first_last_row h
inner join Customers s
on h.CUSTOMER_ID=s.ID
)
,all_rows as(
select * from first_row_for_history
union
select * from between_rows
)
,end_q as(
select
t.ID
,t.First_Name
,t.Last_Name
,t2.Nick_name
,t3.Email
,t.Date_Bithday
,t.Gender
,t4.Is_Married
,t5.Has_kids
,t.Customer_Type
,t6.Account_Status
,t.DateChange
from all_rows t
outer apply(
select top(1) t2.* from all_rows t2
where (t.ID = t2.ID and t2.DateChange<=t.DateChange and t2.Nick_name is not null )
order by t2.DateChange desc
) t2
outer apply(
select top(1) t3.* from all_rows t3
where (t.ID = t3.ID and t3.DateChange<=t.DateChange and t3.Email is not null )
order by t3.DateChange desc
) t3
outer apply(
select top(1) t4.* from all_rows t4
where (t.ID = t4.ID and t4.DateChange<=t.DateChange and t4.Is_Married is not null )
order by t4.DateChange desc
) t4
outer apply(
select top(1) t5.* from all_rows t5
where (t.ID = t5.ID and t5.DateChange<=t.DateChange and t5.Has_kids is not null )
order by t5.DateChange desc
) t5
outer apply(
select top(1) t6.* from all_rows t6
where (t.ID = t6.ID and t6.DateChange<=t.DateChange and t6.Account_Status is not null )
order by t6.DateChange desc
) t6
)
select * from end_q
order by DateChange
January 9, 2024 at 9:14 pm
I am glad you got it optimized. One thing that MAY also help with your optimization is to change your UNION to a UNION ALL. That will remove an implicit "DISTINCT" which is a slower operation.
One word of warning though - if you ARE putting that into a view or stored procedure, I would be VERY hesitant to use "SELECT *". The reason being if anyone changes the table structure and adds a column (for example), the query will likely fail. It is much safer to only select the columns you need.
Also, you will often get a performance boost by SQL using better indexes if you only select the rows you need. What I mean is in "end_q", all of the t# outer applied tables are doing a SELECT * BUT you are only looking at a single column. You will probably get a performance boost by only selecting the column you care about. There are other SELECT *'s that you may want to clean up too. Best practice is to NOT use SELECT * in production level code.
I was also doing a quick test of your query and your new query has an issue in that it is missing the "suspend" account status in the result set and ends with the "non actif" status instead of "actif". Is that expected? If not, in the CTE "histo_pivot_between_first_last_row" you need to take out the "WHERE RN=1".
Lastly, I did do a review of the actual execution plan on my system (with the temp tables) and your query (with my adjustment for the RN=1 line to ensure the same result set and UNION ALL instead of UNION, but I left the SELECT *'s in there) is faster than mine! Unfortunately, with my testing anyways, if a customer has no history, they do not show up in the result set it seems. So if I add a second customer to the customers table with no history, they do not show up in the results. I haven't had a chance to fix/review that bit yet, but will review it shortly.
EDIT - to get the results of Customers with no history, change your INNER JOINs to RIGHT JOINs.
Now all that is left is to get the fields IF they were set at creation time instead of due to a change. What I mean is if you add a customer with all fields populated, currently all fields that are changeable come back as NULL instead of the initial value.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
January 9, 2024 at 9:42 pm
How is this (performance wise and for what you need):
DECLARE @Histo AS DATE;
SET @Histo = '1900-12-31';
;WITH Customers
AS
(
SELECT
*,
GETDATE() AS [CreatedDate]
FROM #CUSTOMERS
),
histo
AS
(
SELECT
*,
ROW_NUMBER() OVER ( PARTITION BY
CUSTOMER_ID,
ColumnName
ORDER BY DateChange ASC
) AS RN
FROM #CUSTOMERS_HISTORY
),
histo_min_pivot
AS
(
SELECT
CUSTOMER_ID,
DateChange,
Account_Status,
Email,
Has_kids,
Is_Married,
Nick_name
FROM
(
SELECT
CUSTOMER_ID,
DateChange,
PreviousValue,
ColumnName
FROM histo
WHERE RN = 1
) x
PIVOT
(
MAX(PreviousValue)
FOR ColumnName IN
(
Account_Status,
Email,
Has_kids,
Is_Married,
Nick_name
)
) p
UNION
SELECT
Customers.ID,
@Histo,
Customers.Account_Status,
Customers.Email,
Customers.Has_kids,
Customers.Is_Married,
Customers.Nick_name
FROM Customers
WHERE Customers.ID NOT IN
(
SELECT
customer_ID
FROM histo
)
),
histo_get_one_row
AS
(
SELECT
CUSTOMER_ID,
MAX(Account_Status) AS Account_Status,
MAX(Email) AS Email,
MAX(Has_kids) AS Has_kids,
MAX(Is_Married) AS Is_Married,
MAX(Nick_name) AS Nick_name
FROM histo_min_pivot
GROUP BY CUSTOMER_ID
),
first_row_for_history
AS
(
SELECT
s.ID,
s.First_Name,
s.Last_Name,
h.Nick_name,
h.Email,
s.Date_Bithday,
s.Gender,
h.Is_Married,
h.Has_kids,
s.Customer_Type,
h.Account_Status,
@Histo DateChange
FROM histo_get_one_row h
RIGHT JOIN Customers s
ON h.CUSTOMER_ID = s.ID
),
histo_pivot_between_first_last_row
AS
(
SELECT
CUSTOMER_ID,
DateChange,
Account_Status,
Email,
Has_kids,
Is_Married,
Nick_name
FROM
(
SELECT
CUSTOMER_ID,
DateChange,
CurrentValue,
ColumnName
FROM histo --WHERE RN=1
) x
PIVOT
(
MAX(CurrentValue)
FOR ColumnName IN
(
Account_Status,
Email,
Has_kids,
Is_Married,
Nick_name
)
) p
),
between_rows
AS
(
SELECT
s.ID,
s.First_Name,
s.Last_Name,
h.Nick_name,
h.Email,
s.Date_Bithday,
s.Gender,
h.Is_Married,
h.Has_kids,
s.Customer_Type,
h.Account_Status,
h.DateChange
FROM histo_pivot_between_first_last_row h
RIGHT JOIN Customers s
ON h.CUSTOMER_ID = s.ID
),
all_rows
AS
(
SELECT
*
FROM first_row_for_history
UNION ALL
SELECT
*
FROM between_rows
),
end_q
AS
(
SELECT
t.ID,
t.First_Name,
t.Last_Name,
t2.Nick_name,
t3.Email,
t.Date_Bithday,
t.Gender,
t4.Is_Married,
t5.Has_kids,
t.Customer_Type,
t6.Account_Status,
t.DateChange
FROM all_rows t
OUTER APPLY
(
SELECT TOP ( 1 )
t2.Nick_name
FROM all_rows t2
WHERE (
t.ID = t2.ID
AND t2.DateChange <= t.DateChange
AND t2.Nick_name IS NOT NULL
)
ORDER BY t2.DateChange DESC
) t2
OUTER APPLY
(
SELECT TOP ( 1 )
t3.*
FROM all_rows t3
WHERE (
t.ID = t3.ID
AND t3.DateChange <= t.DateChange
AND t3.Email IS NOT NULL
)
ORDER BY t3.DateChange DESC
) t3
OUTER APPLY
(
SELECT TOP ( 1 )
t4.*
FROM all_rows t4
WHERE (
t.ID = t4.ID
AND t4.DateChange <= t.DateChange
AND t4.Is_Married IS NOT NULL
)
ORDER BY t4.DateChange DESC
) t4
OUTER APPLY
(
SELECT TOP ( 1 )
t5.*
FROM all_rows t5
WHERE (
t.ID = t5.ID
AND t5.DateChange <= t.DateChange
AND t5.Has_kids IS NOT NULL
)
ORDER BY t5.DateChange DESC
) t5
OUTER APPLY
(
SELECT TOP ( 1 )
t6.*
FROM all_rows t6
WHERE (
t.ID = t6.ID
AND t6.DateChange <= t.DateChange
AND t6.Account_Status IS NOT NULL
)
ORDER BY t6.DateChange DESC
) t6
)
SELECT
*
FROM end_q
WHERE DateChange IS NOT NULL
ORDER BY DateChange;
Similar to what you had but with the following changes:
1- UNION changed to UNION ALL
2- SELECT *'s in the OUTER APPLY's changed to select proper columns
3- removed one of the "WHERE RN=1" so that you get the full result set for the change history
4- if a customer has NO change history, then the current active data is pulled in
5- RIGHT JOIN instead of INNER JOIN
KNOWN BUG - if a customer HAS a change on only some of the fields, not all fields (for example, the Nick_name is the only change) then the results set is incorrect for the customer. ALL changeable Fields must be changed for the result set to be accurate. I am working on a fix for that.
The above is all just my opinion on what you should do.
As with all advice you find on a random internet forum - you shouldn't blindly follow it. Always test on a test server to see if there is negative side effects before making changes to live!
I recommend you NEVER run "random code" you found online on any system you care about UNLESS you understand and can verify the code OR you don't care if the code trashes your system.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply