January 29, 2019 at 7:03 am
Hi All,
I am trying to work out the number of days for each (CODE) from the Code (REQUEST DATE) till next Code appear and update the ( No.Days) .it was working fine using LAG/LEAD function but requirement changed so if same codes coming repeatedly then we need number of days from first appearance i.e 'Urgent Needs' appear first on (14/08/2018) until (24/01/2019) then calculate all days and if recent record still active then count days up to today's date instead of 24/01/2019. Hope you would be able to help.
Regards
Nadeem
January 29, 2019 at 8:48 am
Have you looked at first_Value? You can partition this and get the first lead date for a code item.
January 29, 2019 at 9:10 am
Steve Jones - SSC Editor - Tuesday, January 29, 2019 8:48 AMHave you looked at first_Value? You can partition this and get the first lead date for a code item.
Thanks Steve; I will check it out . Regards
January 29, 2019 at 9:17 am
They say that a picture is worth a thousand words, but not in the case of data. Actual code is worth a thousand pictures. Please provide sample data and expected results as outlined in the first link in my signature. The goal is to allow someone to copy the code and paste it into SSMS and hit "Run" and see your sample data.
DO Post your code directly in your reply. DO NOT include it as an attached file. People are hesitant to open any file that may contain malware no matter how remote the possibility.
DO use temp tables or table variables. This makes it much easier to clean up.
DO use table value constructors. The instructions in my signature were written before table value constructors were implemented, so the instructions use individual INSERT statements, but it's much cleaner to use table value constructors. DO NOT use a SELECT statement to populate the tables. Unless you want to open up your tables to the entire Internet and open yourself up to a data breach, we won't be able to execute SELECT statements that read your tables.
DO provide expected results. DO make sure that the results are consistent with the sample data.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 29, 2019 at 9:20 am
nadeem161 - Tuesday, January 29, 2019 9:10 AMSteve Jones - SSC Editor - Tuesday, January 29, 2019 8:48 AMHave you looked at first_Value? You can partition this and get the first lead date for a code item.Thanks Steve; I will check it out . Regards
FIRST_VALUE will work if you can ignore gaps. I suspect that you will need to account for gaps. If so, we would still need sample data.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
January 29, 2019 at 9:28 am
Thanks much Drew, Valid points and It looks FIRST_VALUE will do the trick with some alteration.
Regards
Nadeem
January 29, 2019 at 7:24 pm
To emphasize the problem that Drew brings up...
First_Value can certainly used to solve the given problem but it won't correctly solve the problem that will occur if there's a gap in the dates for any given group of the "Code" column. You should probably count on that problem happening. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
January 31, 2019 at 8:34 am
Hi ,
First of all thanks for your help.
I have tried to do a code snippet. So to elaborate the issue again I want to work out the NODAYS based on CODE and REQUEST_DATE . For each time if a specific CODE logged then I want to count NODAYS based start REQUEST_DATE of that occurrence. Below is dummy code example and worked out NODAYS for illustration.
--===== If the output test table already exists, drop it
IF OBJECT_ID('TempDB..#LUCOUTPUT','U') IS NOT NULL
DROP TABLE #LUCOUTPUT
--===== Create the test table with required columns
CREATE TABLE #LUCOUTPUT
(
ROW_NUM INT,
ID INT,
REQUEST_DATE DATETIME,
CODE varchar(50),
NODAYS int
)
INSERT INTO #LUCOUTPUT
(ROW_NUM,ID, REQUEST_DATE, CODE,NODAYS)
SELECT '1','1','Oct 17 2007 12:00AM','Urgent','0' UNION ALL
SELECT '2','1','Oct 18 2007 12:00AM','Urgent','1' UNION ALL
SELECT '3','1','Oct 19 2007 12:00AM','Delayed','0' UNION ALL
SELECT '4','1','Oct 20 2007 12:00AM','Delayed','1' UNION ALL
SELECT '5','1','Oct 21 2007 12:00AM','Delayed','2' UNION ALL
SELECT '6','1','Oct 25 2007 12:00AM','Delayed','6' UNION ALL
SELECT '7','1','Oct 26 2007 12:00AM','Requested','0' UNION ALL
SELECT '8','1','Oct 27 2007 12:00AM','Requested','1' UNION ALL
SELECT '9','1','Oct 28 2007 12:00AM','Delayed','0' UNION ALL
SELECT '10','1','Oct 30 2007 12:00AM','Urgent','2'
--===== Select output data from output table
Select ROW_NUM,ID, REQUEST_DATE, CODE,NODAYS from #LUCOUTPUT
January 31, 2019 at 12:14 pm
I took a stab at this one. I needed sample data, so I generated it using the following code. Because RAND is used with a predictable seed, the values will come out the same every time, meaning the sample data should be predictable.CREATE -- DROP -- TRUNCATE
TABLE #SampleData
(
Row_Num INTEGER NOT NULL,
Request_Date DATE NOT NULL,
Code VARCHAR(50) NOT NULL
);
INSERT INTO #SampleData (Row_Num, Request_Date, Code)
SELECT N,
DATEADD(DAY, N, '20190101'),
CASE WHEN RIGHT(CAST(RAND(N) AS VARCHAR(50)), 1) LIKE '[13579]' THEN 'Urgent Needs' ELSE 'Awaiting' END
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY name) AS N
FROM sys.objects
) AS TempTally
WHERE N <= 20;
SELECT *
FROM #SampleData
ORDER BY Row_Num
Row_Num Request_Date Code
----------- ------------ --------------------------------------------------
1 2019-01-02 Awaiting
2 2019-01-03 Urgent Needs
3 2019-01-04 Urgent Needs
4 2019-01-05 Awaiting
5 2019-01-06 Urgent Needs
6 2019-01-07 Urgent Needs
7 2019-01-08 Awaiting
8 2019-01-09 Awaiting
9 2019-01-10 Urgent Needs
10 2019-01-11 Awaiting
11 2019-01-12 Awaiting
12 2019-01-13 Urgent Needs
13 2019-01-14 Awaiting
14 2019-01-15 Awaiting
15 2019-01-16 Urgent Needs
16 2019-01-17 Urgent Needs
17 2019-01-18 Urgent Needs
18 2019-01-19 Urgent Needs
19 2019-01-20 Urgent Needs
20 2019-01-21 Awaiting
(20 rows affected)
This is my attempt.WITH BaseData
AS (
SELECT Row_Num, Request_Date, Code,
CASE WHEN LAG(Code, 1) OVER (ORDER BY Request_Date) = Code THEN 'N' ELSE 'Y' END AS First_In_Block
FROM #SampleData
)
SELECT Row_Num,
Request_Date,
Code,
CASE WHEN LEAD(Request_Date, 1) OVER (ORDER BY Request_Date) IS NULL
AND LAG(Code, 1) OVER (ORDER BY Request_Date) = Code THEN First_Date ELSE NULL END AS [No. Days]
FROM (
SELECT b.Row_Num,
b.Request_Date,
b.Code,
MAX(x.Request_Date) AS First_Date
FROM BaseData b
LEFT JOIN BaseData x
ON x.Request_Date < b.Request_Date
AND x.Code = b.Code
AND x.First_In_Block = 'N'
WHERE b.Row_Num <= 19 /* TESTING: Change selection to change which rows included */
GROUP BY b.Row_Num,
b.Request_Date,
b.Code
) q
ORDER BY Row_Num,
Request_Date;
In my sample data, row 20 is "Awaiting", so I limited the data to rows 1-19 so that "Urgent Needs" has the final five rows. When this happens, "2019-01-16" shows in the "No. Days" column. When all 20 rows are included, the final rows is "Awaiting" but the prior row is "Urgent Needs", so no value is shown in the "No. Days" column. Hopefully I have understood the issue correctly. If not, please clarify what is needed, and I'll try again.
February 1, 2019 at 2:55 am
fahey.jonathan - Thursday, January 31, 2019 12:14 PMI took a stab at this one. I needed sample data, so I generated it using the following code. Because RAND is used with a predictable seed, the values will come out the same every time, meaning the sample data should be predictable.CREATE -- DROP -- TRUNCATE
TABLE #SampleData
(
Row_Num INTEGER NOT NULL,
Request_Date DATE NOT NULL,
Code VARCHAR(50) NOT NULL
);INSERT INTO #SampleData (Row_Num, Request_Date, Code)
SELECT N,
DATEADD(DAY, N, '20190101'),
CASE WHEN RIGHT(CAST(RAND(N) AS VARCHAR(50)), 1) LIKE '[13579]' THEN 'Urgent Needs' ELSE 'Awaiting' END
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY name) AS N
FROM sys.objects
) AS TempTally
WHERE N <= 20;SELECT *
FROM #SampleData
ORDER BY Row_NumRow_Num Request_Date Code
----------- ------------ --------------------------------------------------
1 2019-01-02 Awaiting
2 2019-01-03 Urgent Needs
3 2019-01-04 Urgent Needs
4 2019-01-05 Awaiting
5 2019-01-06 Urgent Needs
6 2019-01-07 Urgent Needs
7 2019-01-08 Awaiting
8 2019-01-09 Awaiting
9 2019-01-10 Urgent Needs
10 2019-01-11 Awaiting
11 2019-01-12 Awaiting
12 2019-01-13 Urgent Needs
13 2019-01-14 Awaiting
14 2019-01-15 Awaiting
15 2019-01-16 Urgent Needs
16 2019-01-17 Urgent Needs
17 2019-01-18 Urgent Needs
18 2019-01-19 Urgent Needs
19 2019-01-20 Urgent Needs
20 2019-01-21 Awaiting(20 rows affected)
This is my attempt.
WITH BaseData
AS (
SELECT Row_Num, Request_Date, Code,
CASE WHEN LAG(Code, 1) OVER (ORDER BY Request_Date) = Code THEN 'N' ELSE 'Y' END AS First_In_Block
FROM #SampleData
)
SELECT Row_Num,
Request_Date,
Code,
CASE WHEN LEAD(Request_Date, 1) OVER (ORDER BY Request_Date) IS NULL
AND LAG(Code, 1) OVER (ORDER BY Request_Date) = Code THEN First_Date ELSE NULL END AS [No. Days]
FROM (
SELECT b.Row_Num,
b.Request_Date,
b.Code,
MAX(x.Request_Date) AS First_Date
FROM BaseData b
LEFT JOIN BaseData x
ON x.Request_Date < b.Request_Date
AND x.Code = b.Code
AND x.First_In_Block = 'N'
WHERE b.Row_Num <= 19 /* TESTING: Change selection to change which rows included */
GROUP BY b.Row_Num,
b.Request_Date,
b.Code
) q
ORDER BY Row_Num,
Request_Date;
In my sample data, row 20 is "Awaiting", so I limited the data to rows 1-19 so that "Urgent Needs" has the final five rows. When this happens, "2019-01-16" shows in the "No. Days" column. When all 20 rows are included, the final rows is "Awaiting" but the prior row is "Urgent Needs", so no value is shown in the "No. Days" column. Hopefully I have understood the issue correctly. If not, please clarify what is needed, and I'll try again.
Hi Thanks for your reply; I am looking to workout Number of days each time a code logged . If a code logged more than once in contiguous rows then I want to calculate days based on the first date its logged and last date its logged before the new code start. So from above example of your expected result will be like below
Row_Num Request_Date Code No. Days
----------- ------------ -------------------------------------------------- ----------
1 2019-01-02 Awaiting 0
2 2019-01-03 Urgent Needs 0
3 2019-01-04 Urgent Needs 1
4 2019-01-05 Awaiting 0
5 2019-01-06 Urgent Needs 0
6 2019-01-07 Urgent Needs 1
7 2019-01-08 Awaiting 0
8 2019-01-09 Awaiting 1
9 2019-01-10 Urgent Needs 0
10 2019-01-11 Awaiting 0
11 2019-01-12 Awaiting 1
12 2019-01-13 Urgent Needs 0
13 2019-01-14 Awaiting 0
14 2019-01-15 Awaiting 1
15 2019-01-16 Urgent Needs 0
16 2019-01-17 Urgent Needs 1
17 2019-01-18 Urgent Needs 2
18 2019-01-19 Urgent Needs 3
19 2019-01-20 Urgent Needs 4
20 2019-01-21 Awaiting 0
Thanks much
February 1, 2019 at 7:42 am
OK, attempt #2. The CTE tells which rows are first and last in their block, the "Blocks" query puts those values into ranges, which can then be joined to the main data. Because we now know the start date of each block, we can calculate the number of days difference between the block start date and the "current" date. The ISNULL is needed on the LEAD and LAG statements to ensure that the first row returns IsFirst = 1 and the last row returns IsLast = 1.WITH BaseData
AS (
SELECT Request_Date,
Code,
CASE WHEN ISNULL(LAG(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsFirst,
CASE WHEN ISNULL(LEAD(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsLast
FROM #SampleData
)
SELECT s.Row_Num,
s.Request_Date,
s.Code,
DATEDIFF(DAY, b.Block_Start_Date, s.Request_Date) AS [No. Days]
/* TESTING ONLY */ --, b.Block_Start_Date, b.Block_End_Date
FROM #SampleData s
JOIN (
SELECT f.Request_Date AS Block_Start_Date,
f.Code,
MIN(l.Request_Date) AS Block_End_Date
FROM (
SELECT Code, Request_Date
FROM BaseData
WHERE IsFirst <> 0
) f
JOIN (
SELECT Code, Request_Date
FROM BaseData
WHERE IsLast <> 0
) l
ON l.Code = f.Code
AND l.Request_Date >= f.Request_Date
GROUP BY f.Request_Date,
f.Code
) b /* Blocks */
ON s.Request_Date BETWEEN b.Block_Start_Date AND b.Block_End_Date
ORDER BY s.Row_Num;
February 1, 2019 at 8:57 am
fahey.jonathan - Friday, February 1, 2019 7:42 AMOK, attempt #2. The CTE tells which rows are first and last in their block, the "Blocks" query puts those values into ranges, which can then be joined to the main data. Because we now know the start date of each block, we can calculate the number of days difference between the block start date and the "current" date. The ISNULL is needed on the LEAD and LAG statements to ensure that the first row returns IsFirst = 1 and the last row returns IsLast = 1.WITH BaseData
AS (
SELECT Request_Date,
Code,
CASE WHEN ISNULL(LAG(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsFirst,
CASE WHEN ISNULL(LEAD(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsLast
FROM #SampleData
)
SELECT s.Row_Num,
s.Request_Date,
s.Code,
DATEDIFF(DAY, b.Block_Start_Date, s.Request_Date) AS [No. Days]
/* TESTING ONLY */ --, b.Block_Start_Date, b.Block_End_Date
FROM #SampleData s
JOIN (
SELECT f.Request_Date AS Block_Start_Date,
f.Code,
MIN(l.Request_Date) AS Block_End_Date
FROM (
SELECT Code, Request_Date
FROM BaseData
WHERE IsFirst <> 0
) f
JOIN (
SELECT Code, Request_Date
FROM BaseData
WHERE IsLast <> 0
) l
ON l.Code = f.Code
AND l.Request_Date >= f.Request_Date
GROUP BY f.Request_Date,
f.Code
) b /* Blocks */
ON s.Request_Date BETWEEN b.Block_Start_Date AND b.Block_End_Date
ORDER BY s.Row_Num;
Excellent. That works like a charm. Thank much
February 1, 2019 at 9:32 am
fahey.jonathan - Friday, February 1, 2019 7:42 AMOK, attempt #2. The CTE tells which rows are first and last in their block, the "Blocks" query puts those values into ranges, which can then be joined to the main data. Because we now know the start date of each block, we can calculate the number of days difference between the block start date and the "current" date. The ISNULL is needed on the LEAD and LAG statements to ensure that the first row returns IsFirst = 1 and the last row returns IsLast = 1.WITH BaseData
AS (
SELECT Request_Date,
Code,
CASE WHEN ISNULL(LAG(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsFirst,
CASE WHEN ISNULL(LEAD(Code, 1) OVER (ORDER BY Request_Date), 'Missing') <> Code THEN 1 ELSE 0 END AS IsLast
FROM #SampleData
)
SELECT s.Row_Num,
s.Request_Date,
s.Code,
DATEDIFF(DAY, b.Block_Start_Date, s.Request_Date) AS [No. Days]
/* TESTING ONLY */ --, b.Block_Start_Date, b.Block_End_Date
FROM #SampleData s
JOIN (
SELECT f.Request_Date AS Block_Start_Date,
f.Code,
MIN(l.Request_Date) AS Block_End_Date
FROM (
SELECT Code, Request_Date
FROM BaseData
WHERE IsFirst <> 0
) f
JOIN (
SELECT Code, Request_Date
FROM BaseData
WHERE IsLast <> 0
) l
ON l.Code = f.Code
AND l.Request_Date >= f.Request_Date
GROUP BY f.Request_Date,
f.Code
) b /* Blocks */
ON s.Request_Date BETWEEN b.Block_Start_Date AND b.Block_End_Date
ORDER BY s.Row_Num;
First, you do not need the ISNULL if you use the third (optional) parameter for LEAD/LAG, e.g., LAG(Code, 1, 'Missing') OVER(....).
Second, there is a simpler approach. This only requires one scan of the table as opposed to the three that yours requires.
WITH Code_Groups AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY Request_Date) - ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Request_Date) AS code_grp
FROM #SampleData
)
SELECT Row_Num, Request_Date, Code, DATEDIFF(DAY, MIN(Request_Date) OVER(PARTITION BY Code, code_grp), Request_Date) AS Num_Days
FROM Code_Groups
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
February 1, 2019 at 12:05 pm
drew.allen - Friday, February 1, 2019 9:32 AMFirst, you do not need the ISNULL if you use the third (optional) parameter for LEAD/LAG, e.g., LAG(Code, 1, 'Missing') OVER(....).Second, there is a simpler approach. This only requires one scan of the table as opposed to the three that yours requires.
WITH Code_Groups AS
(
SELECT *, ROW_NUMBER() OVER(ORDER BY Request_Date) - ROW_NUMBER() OVER(PARTITION BY Code ORDER BY Request_Date) AS code_grp
FROM #SampleData
)
SELECT Row_Num, Request_Date, Code, DATEDIFF(DAY, MIN(Request_Date) OVER(PARTITION BY Code, code_grp), Request_Date) AS Num_Days
FROM Code_GroupsDrew
See, this is why I like the forums. I learn new things all the time. Thanks for the lesson.
February 2, 2019 at 4:48 am
HI
ANOTHER WAY OF RESOLVING THIS
USING RECURSIVE CTE
-- GENERALLY RECURSIVE CTE IS NOT GOOD FOR PERFORMANCE
IF IT HELPS GREAT
🙂
🙂
DROP TABLE #SampleData
go
CREATE TABLE #SampleData
(
Row_Num INTEGER NOT NULL,
Request_Date DATE NOT NULL,
Code VARCHAR(50) NOT NULL
);
INSERT INTO #SampleData (Row_Num, Request_Date, Code)
SELECT N,
DATEADD(DAY, N, '20190101'),
CASE WHEN RIGHT(CAST(RAND(N) AS VARCHAR(50)), 1) LIKE '[13579]' THEN 'Urgent Needs' ELSE 'Awaiting' END
FROM (
SELECT ROW_NUMBER() OVER (ORDER BY name) AS N
FROM sys.objects
) AS TempTally
WHERE N <= 20;
SELECT *
FROM #SampleData
ORDER BY Row_Num
MY RECURSIVE CTE SOLUTION ; WITH RECURSIVE_CTE AS
(
SELECT *,0 AS GRP FROM #SampleData WHERE Row_Num = 1
UNION ALL
SELECT A.*, CASE WHEN A.Code = B.Code THEN B.GRP + 1 WHEN A.Code <> B.Code THEN 0 END
FROM #SampleData A JOIN RECURSIVE_CTE B ON A.Row_Num = B.Row_Num + 1
)
SELECT * FROM RECURSIVE_CTE
GO
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply