September 27, 2018 at 9:26 pm
I have calculate how man ids are in a certain status for the first day of each month for the past 36 months. Its hard to explain so I tried to create some sample data and detailed explanation below of what the expected results are. I was able to get the highest by using the row_number over trick. However, I can't figure out how to run that for the first day of each month. The closest I've gotten is attempting some window functions, but I'm still at a loss. Any help or direction is much appreciated!
For the first day of each month for the past n month, find the ids chgstat=20 with their latest chgdate before or equal to the first day of that month.
On 2018-09-01 we have...
id 1: highest chgstat on/before 2018-09-01 is 20 (went into this chgstat on 2018-08-25 and didn't come out until 2018-09-10). This one counts.
id 2: went into chgstat=20 on 2018-07-26 but went into chgstat=30 on 2018-08-29. Since id 2 highest chgstat on or before 2018-09-01 wasn't chgstat=20, this does NOT count.
id 3: highest chgstat on/before 2018-09-01 is 30, so this does not count.
id 4: highest chgstat on/before 2018-09-01 is 30 (on 2018-08-29). This one does not count.
id 5: highest chgstat on/before 2018-09-01 is 10. The id did go into chgstat=20, but it was on 2018-09-03. Therefore, this one does not count.
ON 2018-08-01 we have...
id 1: All chgstats occur after 2018-08-01. Does not count
id 2: Highest change status on/before 2018-08-01 is 20. Therefore, this one counts.
id 3: Did go into chgstat=20 before 2018-08-01. However, the highest on/before 2018-08-01 is 30. Does not count
id 4: Highest change status on/before 2018-08-01 is 20. Therefore, this one counts.
id 5: Highest chgstat on/before 2018-08-01 is 10. Does not count.
For the first day of each month for the past n month, find the ids chgstat=20 with their latest chgdate before or equal to the first day of that month.
CREATE TABLE stathist (
id INT,
chgstat INT,
chgdate DATE,
)
INSERT INTO stathist (id, chgstat, chgdate) VALUES
(1, 30, '2018-09-10'),
(1, 20, '2018-08-25'),
(1, 10, '2018-08-23'),
(2, 30, '2018-08-29'),
(2, 20, '2018-07-26'),
(2, 10, '2018-07-25'),
(3, 30, '2018-07-29'),
(3, 20, '2018-07-18'),
(3, 10, '2018-07-15'),
(4, 30, '2018-08-29'),
(4, 20, '2018-07-09'),
(4, 10, '2018-07-05'),
(5, 30, '2018-09-20'),
(5, 20, '2018-09-03'),
(5, 10, '2018-07-05')
October 3, 2018 at 5:49 am
Here are some options, getting to the final result in 3 stages to make it easier to understand. The third one should meet your requirements.
You'll need to test it carefully as there is only a little test data.
-- List of Ids & Dates for each StartOfMonth (so you can check the base data):
WITH Months AS
(SELECT 1 as Value, DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0) AS StartOfMonth
UNION ALL
SELECT value + 1 AS value, DATEADD(month, -1, StartOfMonth) AS StartOfMonth
FROM Months
WHERE Months.value <= 35)
SELECT M.StartOfMonth, S20.Id, S20.chgdate, S30.Id, S30.chgdate
FROM Months M
LEFT JOIN (SELECT * FROM stathist WHERE chgstat=20) S20 ON M.StartOfMonth >= S20.chgdate
LEFT JOIN (SELECT * FROM stathist WHERE chgstat=30) S30 ON S20.id = S30.id
WHERE M.StartOfMonth <= S30.chgdate;
-- Finished query with counts:
WITH Months AS
(SELECT 1 as Value, DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0) AS StartOfMonth
UNION ALL
SELECT value + 1 AS value, DATEADD(month, -1, StartOfMonth) AS StartOfMonth
FROM Months
WHERE Months.value <= 35)
SELECT M.StartOfMonth, COUNT(S20.id) as IdCount
FROM Months M
LEFT JOIN (SELECT * FROM stathist WHERE chgstat=20) S20 ON M.StartOfMonth >= S20.chgdate
LEFT JOIN (SELECT * FROM stathist WHERE chgstat=30) S30 ON S20.id = S30.id
WHERE M.StartOfMonth <= S30.chgdate
GROUP BY StartOfMonth
ORDER BY StartOfMonth DESC;
-- Finished query with counts, listing all 36 months even if 0 count:
WITH Months AS
(SELECT 1 as Value, DATEADD(month, DATEDIFF(month, 0, GETDATE()), 0) AS StartOfMonth
UNION ALL
SELECT value + 1 AS value, DATEADD(month, -1, StartOfMonth) AS StartOfMonth
FROM Months
WHERE Months.value <= 35)
, Data AS
(
SELECT M.StartOfMonth, COUNT(S20.id) as IdCount
FROM Months M
LEFT JOIN (SELECT * FROM stathist WHERE chgstat=20) S20 ON M.StartOfMonth >= S20.chgdate
LEFT JOIN (SELECT * FROM stathist WHERE chgstat=30) S30 ON S20.id = S30.id
WHERE M.StartOfMonth <= S30.chgdate
GROUP BY StartOfMonth
)
SELECT M.StartOfMonth, ISNULL(IdCount, 0) AS IdCount
FROM Months M
LEFT JOIN Data D ON D.StartOfMonth = m.StartOfMonth
ORDER BY M.StartOfMonth DESC;
October 3, 2018 at 8:37 am
Ah... be careful, Laurie. You're using rCTEs (Recursive CTEs) that produce an incremental count and they're slower than a well written While Loop and use 9 times the number of reads. Even seemingly low row counts suffer a lot and you can't predict how many times such code will be hit in the future. Worse yet, someone in a pinch for something similar may borrow the rCTE methods from your code and end up in deep Kimchi for both performance and resource usage.
Please see the following article where you'll find 3 other methods that smoke rCTEs in all areas.
Hidden RBAR: Counting with Recursive CTE's
--Jeff Moden
Change is inevitable... Change for the better is not.
October 3, 2018 at 8:41 am
DataAnalyst011 - Thursday, September 27, 2018 9:26 PMI have calculate how man ids are in a certain status for the first day of each month for the past 36 months. Its hard to explain so I tried to create some sample data and detailed explanation below of what the expected results are. I was able to get the highest by using the row_number over trick. However, I can't figure out how to run that for the first day of each month. The closest I've gotten is attempting some window functions, but I'm still at a loss. Any help or direction is much appreciated!
For the first day of each month for the past n month, find the ids chgstat=20 with their latest chgdate before or equal to the first day of that month.On 2018-09-01 we have...
id 1: highest chgstat on/before 2018-09-01 is 20 (went into this chgstat on 2018-08-25 and didn't come out until 2018-09-10). This one counts.
id 2: went into chgstat=20 on 2018-07-26 but went into chgstat=30 on 2018-08-29. Since id 2 highest chgstat on or before 2018-09-01 wasn't chgstat=20, this does NOT count.
id 3: highest chgstat on/before 2018-09-01 is 30, so this does not count.
id 4: highest chgstat on/before 2018-09-01 is 30 (on 2018-08-29). This one does not count.
id 5: highest chgstat on/before 2018-09-01 is 10. The id did go into chgstat=20, but it was on 2018-09-03. Therefore, this one does not count.ON 2018-08-01 we have...
id 1: All chgstats occur after 2018-08-01. Does not count
id 2: Highest change status on/before 2018-08-01 is 20. Therefore, this one counts.
id 3: Did go into chgstat=20 before 2018-08-01. However, the highest on/before 2018-08-01 is 30. Does not count
id 4: Highest change status on/before 2018-08-01 is 20. Therefore, this one counts.
id 5: Highest chgstat on/before 2018-08-01 is 10. Does not count.
For the first day of each month for the past n month, find the ids chgstat=20 with their latest chgdate before or equal to the first day of that month.CREATE TABLE stathist (
id INT,
chgstat INT,
chgdate DATE,
)INSERT INTO stathist (id, chgstat, chgdate) VALUES
(1, 30, '2018-09-10'),
(1, 20, '2018-08-25'),
(1, 10, '2018-08-23'),
(2, 30, '2018-08-29'),
(2, 20, '2018-07-26'),
(2, 10, '2018-07-25'),
(3, 30, '2018-07-29'),
(3, 20, '2018-07-18'),
(3, 10, '2018-07-15'),
(4, 30, '2018-08-29'),
(4, 20, '2018-07-09'),
(4, 10, '2018-07-05'),
(5, 30, '2018-09-20'),
(5, 20, '2018-09-03'),
(5, 10, '2018-07-05')
Please post what the return for that data would look like according to your rules. Thanks.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 4 posts - 1 through 3 (of 3 total)
You must be logged in to reply to this topic. Login to reply