June 18, 2015 at 1:03 pm
I'm looking at a table with the following fields and I can't grasp how to design something to retrieve what I need:
ID, Stat1, Stat2, Stat3, Stat4, Stat5, Date.
There are multiple rows with the same ID because the stats are rerun every 3 - 6 months. Sometimes when the stats are rerun though, not all of them will be entered so you will have NULLs in those stat fields...
What I'm trying to figure out how to do is run a view or a CTE query that returns only the MOST RECENT stat for each one of the stat fields on one line with the ID.
How do I return distinct ID, most recent stat1, most recent stat2, most recent stat3, most recent stat4, and most recent stat5?
June 18, 2015 at 1:28 pm
Try this:
WITH YT1 AS (
SELECT YT.ID, YT.[Date], YT.Stat1
FROM YourTable AS YT
WHERE YT.Stat1 IS NOT NULL
AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)
),
YT2 AS (
SELECT YT.ID, YT.[Date], YT.Stat2
FROM YourTable AS YT
WHERE YT.Stat2 IS NOT NULL
AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)
),
YT3 AS (
SELECT YT.ID, YT.[Date], YT.Stat3
FROM YourTable AS YT
WHERE YT.Stat3 IS NOT NULL
AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)
),
YT4 AS (
SELECT YT.ID, YT.[Date], YT.Stat4
FROM YourTable AS YT
WHERE YT.Stat4 IS NOT NULL
AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)
),
YT5 AS (
SELECT YT.ID, YT.[Date], YT.Stat5
FROM YourTable AS YT
WHERE YT.Stat5 IS NOT NULL
AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)
)
SELECT COALESCE(YT1.ID, YT2.ID, YT3.ID, YT4.ID, YT5.ID) AS ID, YT1.Stat1, YT2.Stat2, YT3.Stat3, YT4.Stat4, YT4.Stat5
FROM YT1
FULL OUTER JOIN YT2
ON YT1.ID = YT2.ID
FULL OUTER JOIN YT3
ON YT1.ID = YT3.ID
FULL OUTER JOIN YT4
ON YT1.ID = YT4.ID
FULL OUTER JOIN YT5
ON YT1.ID = YT5.ID
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
June 18, 2015 at 2:35 pm
Thank you Steve - I was having a hell of a time trying to figure out the join that didn't have a join in the CTE. Are there any resources (books, or tutorials) you would recommend I read for these types of problems? I'm a bit of a reluctant DBA and these types of reports are going to be commonplace I'm afraid. Most of my experience is in the Analytics and BA roles, with some years of trying to figure out SQL on my own.
I've been elevated to the production DBA role, and I need to get up to speed quickly...
June 18, 2015 at 2:50 pm
Travis Dean (6/18/2015)
Thank you Steve - I was having a hell of a time trying to figure out the join that didn't have a join in the CTE. Are there any resources (books, or tutorials) you would recommend I read for these types of problems? I'm a bit of a reluctant DBA and these types of reports are going to be commonplace I'm afraid. Most of my experience is in the Analytics and BA roles, with some years of trying to figure out SQL on my own.I've been elevated to the production DBA role, and I need to get up to speed quickly...
Books on how to figure out what kind of problem you have, to my knowledge, really just don't exist. The key to getting a functional piece of SQL is to be able to break down your problem into digestable pieces that you can then link together using the tools in your head. Thus you need to know what's possible, such as how to use FULL OUTER JOIN to bring ALL the records together when some may be NULL, or how to use LEFT OUTER JOIN to exclude records, and other JOIN techniques. It almost always helps to use CTEs instead of subqueries so that you can more easily read your code and in many cases actually give the optimizer a better clue on how to handle your query. it also helps you break down the problem into manageable pieces. Knowing that it's best to do all your INNER JOINs before any LEFT OUTER joins helps keep your queries from doing strange things. Try to think in terms of what you do to a column instead of what you do to a row.
However, there are a number of great books to read, such as pretty much anything written by Itzhik Ben-Gan. He's a superb writer and makes things very clear. Glad I could provide a solution. Don't be afraid to get help in this forum. It's been a fantastic resource for me.
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
June 18, 2015 at 2:56 pm
Without test data, it's hard to say, but I believe that this query will give you the same results, while being MUCH, MUCH simpler. I'm guessing it will also be much more efficient.
SELECT TOP (1) YT.ID
,LAST_VALUE(yt.Stat1) OVER(ORDER BY CASE WHEN yt.Stat1 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat1
,LAST_VALUE(yt.Stat2) OVER(ORDER BY CASE WHEN yt.Stat2 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat2
,LAST_VALUE(yt.Stat3) OVER(ORDER BY CASE WHEN yt.Stat3 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat3
,LAST_VALUE(yt.Stat4) OVER(ORDER BY CASE WHEN yt.Stat4 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat4
,LAST_VALUE(yt.Stat5) OVER(ORDER BY CASE WHEN yt.Stat5 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat5
,yt.[Date]
FROM YourTable AS YT
ORDER BY yt.[Date] DESC
Edited to add TOP and ORDER BY clauses and the date field.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
June 18, 2015 at 3:03 pm
I created minimum sample data, which I've included after the main code. I assumed you have a table that has a list of IDs -- if not, just uncomment the code to do a DISTINCT list from the stats table itself. Hopefully the clustered index is on ( ID, Date ), as that is logical anyway and will speed processing for this query.
SELECT ti.ID, ts_1.Stat1, ts_2.Stat2, ts_3.Stat3, ts_4.Stat4, ts_5.Stat5
FROM table_with_IDs ti
/* --if a separate table with the IDs doesn't exist, uncomment the next SELECT:
( SELECT DISTINCT ID FROM table_with_stats ) AS ti
*/
OUTER APPLY (
SELECT TOP (1) *
FROM table_with_stats ts1
WHERE
ts1.ID = ti.ID AND
ts1.Stat1 IS NOT NULL
ORDER BY ts1.Date DESC
) AS ts_1
OUTER APPLY (
SELECT TOP (1) *
FROM table_with_stats ts2
WHERE
ts2.ID = ti.ID AND
ts2.Stat2 IS NOT NULL
ORDER BY ts2.Date DESC
) AS ts_2
OUTER APPLY (
SELECT TOP (1) *
FROM table_with_stats ts3
WHERE
ts3.ID = ti.ID AND
ts3.Stat3 IS NOT NULL
ORDER BY ts3.Date DESC
) AS ts_3
OUTER APPLY (
SELECT TOP (1) *
FROM table_with_stats ts4
WHERE
ts4.ID = ti.ID AND
ts4.Stat4 IS NOT NULL
ORDER BY ts4.Date DESC
) AS ts_4
OUTER APPLY (
SELECT TOP (1) *
FROM table_with_stats ts5
WHERE
ts5.ID = ti.ID AND
ts5.Stat5 IS NOT NULL
ORDER BY ts5.Date DESC
) AS ts_5
/*
CREATE TABLE table_with_IDs (
ID int PRIMARY KEY
)
INSERT INTO table_with_IDs VALUES(1)
INSERT INTO table_with_IDs VALUES(2)
CREATE TABLE table_with_stats (
ID int NOT NULL,
Date datetime NOT NULL,
Stat1 decimal(9, 4) NULL,
Stat2 decimal(9, 4) NULL,
Stat3 decimal(9, 4) NULL,
Stat4 decimal(9, 4) NULL,
Stat5 decimal(9, 4) NULL
)
CREATE UNIQUE CLUSTERED INDEX table_with_stats__CL ON table_with_stats ( ID, Date ) WITH ( FILLFACTOR = 99 )
TRUNCATE TABLE table_with_stats
--the last values are always odd, so no even values should be listed.
INSERT INTO table_with_stats VALUES(1, '20150601', 2, 2, 6, 7, 10)
INSERT INTO table_with_stats VALUES(1, '20150603', 1, 4, NULL, NULL, NULL)
INSERT INTO table_with_stats VALUES(1, '20150607', NULL, 10, NULL, NULL, NULL)
INSERT INTO table_with_stats VALUES(1, '20150611', NULL, 11, 15, NULL, NULL)
INSERT INTO table_with_stats VALUES(1, '20150615', NULL, NULL, 13, NULL, 17)
*/
SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".
June 18, 2015 at 3:40 pm
Yeah, Steve - when I'm looking at something like this I definitely know where the cutoff point is in my head. I can translate the requirement only so far and then I don't have anything in my toolbox to push it further. I've been doing things like this for years, but if I haven't come across the situation before, I get stuck. Google doesn't really help either if you don't know the correct keywords you need to solve the requirement.! I have many many folders with SQL scripts I've saved over the years, but there will always be new situations and requests that arise.
June 18, 2015 at 3:41 pm
Thanks J,
I'll try this out tomorrow and look at the usage.
June 18, 2015 at 4:06 pm
I may have simply misunderstood the problem (or my test data set may not be indicative of the actual data set in question) but I didn't get the expected results from Steve's solution...
Here's the test data I threw together...
IF OBJECT_ID('tempdb..#temp') IS NOT NULL
DROP TABLE #temp;
SELECT
ID, Stat1, Stat2, Stat3, Stat4, Stat5, [Date]
INTO #temp
FROM (VALUES
(1, 123, NULL, NULL, 433, 554, '2015-01-01'),
(2, NULL, 258, NULL, 364, 895, '2015-01-01'),
(3, 101, 572, 335, 485, 535, '2015-01-01'),
(4, 125, 235, 385, 435, 522, '2015-01-01'),
(5, 100, 200, NULL, 400, 500, '2015-01-01'),
(1, 155, 277, NULL, 475, 557, '2015-02-01'),
(2, NULL, 222, 434, 444, NULL, '2015-02-01'),
(3, 154, NULL, 314, 445, 575, '2015-02-01'),
(4, 137, 225, 336, 474, 557, '2015-02-01'),
(5, 145, 225, 315, 400, NULL, '2015-02-01'),
(1, 124, NULL, 333, 444, 555, '2015-03-01'),
(2, 168, 242, NULL, 442, NULL, '2015-03-01'),
(3, 112, NULL, 368, 496, 536, '2015-03-01'),
(4, NULL, NULL, 335, 414, NULL, '2015-03-01'),
(5, 111, 225, NULL, 436, NULL, '2015-03-01'),
(1, 175, NULL, NULL, 425, 502, '2015-04-01'),
(2, NULL, 202, NULL, 435, NULL, '2015-04-01'),
(3, 133, NULL, 386, NULL, 524, '2015-04-01'),
(4, NULL, NULL, 314, 441, NULL, '2015-04-01'),
(5, 361, 244, NULL, 364, NULL, '2015-04-01')
) x (ID, Stat1, Stat2, Stat3, Stat4, Stat5, [Date]);
SELECT * FROM #temp;
Here is the output I got from Steve's code...
IDStat1Stat2Stat3Stat4Stat5
1175NULLNULL425502
3133NULL386NULL524
5361244NULL364NULL
2NULL202NULLNULLNULL
4NULLNULL314NULLNULL
4NULLNULLNULL441NULL
2NULLNULLNULL435NULL
The following solution does seem to produce the expected results. (at least what I understand the desired results to be)
SELECT
id.ID,
MAX(CASE WHEN st.StatType = 1 THEN z.StatValue END) AS Stat1,
MAX(CASE WHEN st.StatType = 2 THEN z.StatValue END) AS Stat2,
MAX(CASE WHEN st.StatType = 3 THEN z.StatValue END) AS Stat3,
MAX(CASE WHEN st.StatType = 4 THEN z.StatValue END) AS Stat4,
MAX(CASE WHEN st.StatType = 5 THEN z.StatValue END) AS Stat5
FROM
(SELECT t.ID FROM #temp t GROUP BY t.ID) id
CROSS JOIN (SELECT x.StatType FROM (VALUES (1), (2), (3), (4), (5)) x (StatType)) st
CROSS APPLY (
SELECT TOP 1
x.StatValue,
x.[Date]
FROM
#temp t
CROSS APPLY (
VALUES
(t.ID, 1, t.Stat1, [Date]), (t.ID, 2, t.Stat2, [Date]), (t.ID, 3, t.Stat3, [Date]), (t.ID, 4, t.Stat4, [Date]), (t.ID, 5, t.Stat5, [Date])
) x (ID, StatType, StatValue, [Date])
WHERE
x.StatValue IS NOT NULL
AND id.ID = x.ID
AND st.StatType = x.StatType
ORDER BY
x.[Date] DESC
) z
GROUP BY
id.ID
There is the output from the above query...
IDStat1Stat2Stat3Stat4Stat5
1175277333425502
2168202434435895
3133572386496524
4137225314441557
5361244315364500
HTH,
Jason
June 19, 2015 at 7:25 am
This gives me duplicates now that I look a little bit harder.
June 19, 2015 at 7:25 am
My SQL 2014 box doesn't like this at all.
June 19, 2015 at 7:27 am
This too is giving me duplicates... I don't need every line from the left table - only distinct IDs.
June 19, 2015 at 7:48 am
If neither script is working for you, you may need to post some representative test data... Otherwise we're just stabbing around in the dark.
June 19, 2015 at 7:50 am
Travis Dean (6/19/2015)
This too is giving me duplicates... I don't need every line from the left table - only distinct IDs.
If you check Post #1695927, you'll see that the final output of the script I post does produce a distinct list of IDs. And it was created and tested using SQL Server 2014 Dev Edition.
June 19, 2015 at 10:13 am
I realized late yesterday that my query needed some updates. With the sample data that was posted, I was able to update the script and get something that worked.
;
WITH your_table_ordered AS (
SELECT yt.id
,LAST_VALUE(yt.Stat1) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat1 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat1
,LAST_VALUE(yt.Stat2) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat2 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat2
,LAST_VALUE(yt.Stat3) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat3 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat3
,LAST_VALUE(yt.Stat4) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat4 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat4
,LAST_VALUE(yt.Stat5) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat5 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat5
,yt.[Date]
,ROW_NUMBER() OVER(PARTITION BY yt.ID ORDER BY yt.[Date] DESC) AS rn
FROM your_table AS yt
)
SELECT id
,Stat1
,Stat2
,Stat3
,Stat4
,Stat5
,[Date]
FROM your_table_ordered
WHERE rn = 1
My initial code had problems, because LAST_VALUE defaults to ROWS UNBOUNDED PRECEDING AND CURRENT ROW, so I needed to explicitly set it to include all rows. I was also sorting DESC, so it was taking the "first" value by date instead of the last value. I also added a PARTITION BY clause on the ID. Finally, I added a ROW_NUMBER also partitioned on the ID, so that I could pull out one row per ID.
Drew
J. Drew Allen
Business Intelligence Analyst
Philadelphia, PA
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply