CTE (?) or query help

  • I'm looking at a table with the following fields and I can't grasp how to design something to retrieve what I need:

    ID, Stat1, Stat2, Stat3, Stat4, Stat5, Date.

    There are multiple rows with the same ID because the stats are rerun every 3 - 6 months. Sometimes when the stats are rerun though, not all of them will be entered so you will have NULLs in those stat fields...

    What I'm trying to figure out how to do is run a view or a CTE query that returns only the MOST RECENT stat for each one of the stat fields on one line with the ID.

    How do I return distinct ID, most recent stat1, most recent stat2, most recent stat3, most recent stat4, and most recent stat5?

  • Try this:

    WITH YT1 AS (

    SELECT YT.ID, YT.[Date], YT.Stat1

    FROM YourTable AS YT

    WHERE YT.Stat1 IS NOT NULL

    AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)

    ),

    YT2 AS (

    SELECT YT.ID, YT.[Date], YT.Stat2

    FROM YourTable AS YT

    WHERE YT.Stat2 IS NOT NULL

    AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)

    ),

    YT3 AS (

    SELECT YT.ID, YT.[Date], YT.Stat3

    FROM YourTable AS YT

    WHERE YT.Stat3 IS NOT NULL

    AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)

    ),

    YT4 AS (

    SELECT YT.ID, YT.[Date], YT.Stat4

    FROM YourTable AS YT

    WHERE YT.Stat4 IS NOT NULL

    AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)

    ),

    YT5 AS (

    SELECT YT.ID, YT.[Date], YT.Stat5

    FROM YourTable AS YT

    WHERE YT.Stat5 IS NOT NULL

    AND YT.[Date] = (SELECT MAX(Y.[Date]) FROM YourTable AS Y WHERE Y.ID = YT.ID)

    )

    SELECT COALESCE(YT1.ID, YT2.ID, YT3.ID, YT4.ID, YT5.ID) AS ID, YT1.Stat1, YT2.Stat2, YT3.Stat3, YT4.Stat4, YT4.Stat5

    FROM YT1

    FULL OUTER JOIN YT2

    ON YT1.ID = YT2.ID

    FULL OUTER JOIN YT3

    ON YT1.ID = YT3.ID

    FULL OUTER JOIN YT4

    ON YT1.ID = YT4.ID

    FULL OUTER JOIN YT5

    ON YT1.ID = YT5.ID

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thank you Steve - I was having a hell of a time trying to figure out the join that didn't have a join in the CTE. Are there any resources (books, or tutorials) you would recommend I read for these types of problems? I'm a bit of a reluctant DBA and these types of reports are going to be commonplace I'm afraid. Most of my experience is in the Analytics and BA roles, with some years of trying to figure out SQL on my own.

    I've been elevated to the production DBA role, and I need to get up to speed quickly...

  • Travis Dean (6/18/2015)


    Thank you Steve - I was having a hell of a time trying to figure out the join that didn't have a join in the CTE. Are there any resources (books, or tutorials) you would recommend I read for these types of problems? I'm a bit of a reluctant DBA and these types of reports are going to be commonplace I'm afraid. Most of my experience is in the Analytics and BA roles, with some years of trying to figure out SQL on my own.

    I've been elevated to the production DBA role, and I need to get up to speed quickly...

    Books on how to figure out what kind of problem you have, to my knowledge, really just don't exist. The key to getting a functional piece of SQL is to be able to break down your problem into digestable pieces that you can then link together using the tools in your head. Thus you need to know what's possible, such as how to use FULL OUTER JOIN to bring ALL the records together when some may be NULL, or how to use LEFT OUTER JOIN to exclude records, and other JOIN techniques. It almost always helps to use CTEs instead of subqueries so that you can more easily read your code and in many cases actually give the optimizer a better clue on how to handle your query. it also helps you break down the problem into manageable pieces. Knowing that it's best to do all your INNER JOINs before any LEFT OUTER joins helps keep your queries from doing strange things. Try to think in terms of what you do to a column instead of what you do to a row.

    However, there are a number of great books to read, such as pretty much anything written by Itzhik Ben-Gan. He's a superb writer and makes things very clear. Glad I could provide a solution. Don't be afraid to get help in this forum. It's been a fantastic resource for me.

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Without test data, it's hard to say, but I believe that this query will give you the same results, while being MUCH, MUCH simpler. I'm guessing it will also be much more efficient.

    SELECT TOP (1) YT.ID

    ,LAST_VALUE(yt.Stat1) OVER(ORDER BY CASE WHEN yt.Stat1 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat1

    ,LAST_VALUE(yt.Stat2) OVER(ORDER BY CASE WHEN yt.Stat2 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat2

    ,LAST_VALUE(yt.Stat3) OVER(ORDER BY CASE WHEN yt.Stat3 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat3

    ,LAST_VALUE(yt.Stat4) OVER(ORDER BY CASE WHEN yt.Stat4 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat4

    ,LAST_VALUE(yt.Stat5) OVER(ORDER BY CASE WHEN yt.Stat5 IS NULL THEN 0 ELSE yt.[Date] END DESC) AS Stat5

    ,yt.[Date]

    FROM YourTable AS YT

    ORDER BY yt.[Date] DESC

    Edited to add TOP and ORDER BY clauses and the date field.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

  • I created minimum sample data, which I've included after the main code. I assumed you have a table that has a list of IDs -- if not, just uncomment the code to do a DISTINCT list from the stats table itself. Hopefully the clustered index is on ( ID, Date ), as that is logical anyway and will speed processing for this query.

    SELECT ti.ID, ts_1.Stat1, ts_2.Stat2, ts_3.Stat3, ts_4.Stat4, ts_5.Stat5

    FROM table_with_IDs ti

    /* --if a separate table with the IDs doesn't exist, uncomment the next SELECT:

    ( SELECT DISTINCT ID FROM table_with_stats ) AS ti

    */

    OUTER APPLY (

    SELECT TOP (1) *

    FROM table_with_stats ts1

    WHERE

    ts1.ID = ti.ID AND

    ts1.Stat1 IS NOT NULL

    ORDER BY ts1.Date DESC

    ) AS ts_1

    OUTER APPLY (

    SELECT TOP (1) *

    FROM table_with_stats ts2

    WHERE

    ts2.ID = ti.ID AND

    ts2.Stat2 IS NOT NULL

    ORDER BY ts2.Date DESC

    ) AS ts_2

    OUTER APPLY (

    SELECT TOP (1) *

    FROM table_with_stats ts3

    WHERE

    ts3.ID = ti.ID AND

    ts3.Stat3 IS NOT NULL

    ORDER BY ts3.Date DESC

    ) AS ts_3

    OUTER APPLY (

    SELECT TOP (1) *

    FROM table_with_stats ts4

    WHERE

    ts4.ID = ti.ID AND

    ts4.Stat4 IS NOT NULL

    ORDER BY ts4.Date DESC

    ) AS ts_4

    OUTER APPLY (

    SELECT TOP (1) *

    FROM table_with_stats ts5

    WHERE

    ts5.ID = ti.ID AND

    ts5.Stat5 IS NOT NULL

    ORDER BY ts5.Date DESC

    ) AS ts_5

    /*

    CREATE TABLE table_with_IDs (

    ID int PRIMARY KEY

    )

    INSERT INTO table_with_IDs VALUES(1)

    INSERT INTO table_with_IDs VALUES(2)

    CREATE TABLE table_with_stats (

    ID int NOT NULL,

    Date datetime NOT NULL,

    Stat1 decimal(9, 4) NULL,

    Stat2 decimal(9, 4) NULL,

    Stat3 decimal(9, 4) NULL,

    Stat4 decimal(9, 4) NULL,

    Stat5 decimal(9, 4) NULL

    )

    CREATE UNIQUE CLUSTERED INDEX table_with_stats__CL ON table_with_stats ( ID, Date ) WITH ( FILLFACTOR = 99 )

    TRUNCATE TABLE table_with_stats

    --the last values are always odd, so no even values should be listed.

    INSERT INTO table_with_stats VALUES(1, '20150601', 2, 2, 6, 7, 10)

    INSERT INTO table_with_stats VALUES(1, '20150603', 1, 4, NULL, NULL, NULL)

    INSERT INTO table_with_stats VALUES(1, '20150607', NULL, 10, NULL, NULL, NULL)

    INSERT INTO table_with_stats VALUES(1, '20150611', NULL, 11, 15, NULL, NULL)

    INSERT INTO table_with_stats VALUES(1, '20150615', NULL, NULL, 13, NULL, 17)

    */

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Yeah, Steve - when I'm looking at something like this I definitely know where the cutoff point is in my head. I can translate the requirement only so far and then I don't have anything in my toolbox to push it further. I've been doing things like this for years, but if I haven't come across the situation before, I get stuck. Google doesn't really help either if you don't know the correct keywords you need to solve the requirement.! I have many many folders with SQL scripts I've saved over the years, but there will always be new situations and requests that arise.

  • Thanks J,

    I'll try this out tomorrow and look at the usage.

  • I may have simply misunderstood the problem (or my test data set may not be indicative of the actual data set in question) but I didn't get the expected results from Steve's solution...

    Here's the test data I threw together...

    IF OBJECT_ID('tempdb..#temp') IS NOT NULL

    DROP TABLE #temp;

    SELECT

    ID, Stat1, Stat2, Stat3, Stat4, Stat5, [Date]

    INTO #temp

    FROM (VALUES

    (1, 123, NULL, NULL, 433, 554, '2015-01-01'),

    (2, NULL, 258, NULL, 364, 895, '2015-01-01'),

    (3, 101, 572, 335, 485, 535, '2015-01-01'),

    (4, 125, 235, 385, 435, 522, '2015-01-01'),

    (5, 100, 200, NULL, 400, 500, '2015-01-01'),

    (1, 155, 277, NULL, 475, 557, '2015-02-01'),

    (2, NULL, 222, 434, 444, NULL, '2015-02-01'),

    (3, 154, NULL, 314, 445, 575, '2015-02-01'),

    (4, 137, 225, 336, 474, 557, '2015-02-01'),

    (5, 145, 225, 315, 400, NULL, '2015-02-01'),

    (1, 124, NULL, 333, 444, 555, '2015-03-01'),

    (2, 168, 242, NULL, 442, NULL, '2015-03-01'),

    (3, 112, NULL, 368, 496, 536, '2015-03-01'),

    (4, NULL, NULL, 335, 414, NULL, '2015-03-01'),

    (5, 111, 225, NULL, 436, NULL, '2015-03-01'),

    (1, 175, NULL, NULL, 425, 502, '2015-04-01'),

    (2, NULL, 202, NULL, 435, NULL, '2015-04-01'),

    (3, 133, NULL, 386, NULL, 524, '2015-04-01'),

    (4, NULL, NULL, 314, 441, NULL, '2015-04-01'),

    (5, 361, 244, NULL, 364, NULL, '2015-04-01')

    ) x (ID, Stat1, Stat2, Stat3, Stat4, Stat5, [Date]);

    SELECT * FROM #temp;

    Here is the output I got from Steve's code...

    IDStat1Stat2Stat3Stat4Stat5

    1175NULLNULL425502

    3133NULL386NULL524

    5361244NULL364NULL

    2NULL202NULLNULLNULL

    4NULLNULL314NULLNULL

    4NULLNULLNULL441NULL

    2NULLNULLNULL435NULL

    The following solution does seem to produce the expected results. (at least what I understand the desired results to be)

    SELECT

    id.ID,

    MAX(CASE WHEN st.StatType = 1 THEN z.StatValue END) AS Stat1,

    MAX(CASE WHEN st.StatType = 2 THEN z.StatValue END) AS Stat2,

    MAX(CASE WHEN st.StatType = 3 THEN z.StatValue END) AS Stat3,

    MAX(CASE WHEN st.StatType = 4 THEN z.StatValue END) AS Stat4,

    MAX(CASE WHEN st.StatType = 5 THEN z.StatValue END) AS Stat5

    FROM

    (SELECT t.ID FROM #temp t GROUP BY t.ID) id

    CROSS JOIN (SELECT x.StatType FROM (VALUES (1), (2), (3), (4), (5)) x (StatType)) st

    CROSS APPLY (

    SELECT TOP 1

    x.StatValue,

    x.[Date]

    FROM

    #temp t

    CROSS APPLY (

    VALUES

    (t.ID, 1, t.Stat1, [Date]), (t.ID, 2, t.Stat2, [Date]), (t.ID, 3, t.Stat3, [Date]), (t.ID, 4, t.Stat4, [Date]), (t.ID, 5, t.Stat5, [Date])

    ) x (ID, StatType, StatValue, [Date])

    WHERE

    x.StatValue IS NOT NULL

    AND id.ID = x.ID

    AND st.StatType = x.StatType

    ORDER BY

    x.[Date] DESC

    ) z

    GROUP BY

    id.ID

    There is the output from the above query...

    IDStat1Stat2Stat3Stat4Stat5

    1175277333425502

    2168202434435895

    3133572386496524

    4137225314441557

    5361244315364500

    HTH,

    Jason

  • This gives me duplicates now that I look a little bit harder.

  • My SQL 2014 box doesn't like this at all.

  • This too is giving me duplicates... I don't need every line from the left table - only distinct IDs.

  • If neither script is working for you, you may need to post some representative test data... Otherwise we're just stabbing around in the dark.

  • Travis Dean (6/19/2015)


    This too is giving me duplicates... I don't need every line from the left table - only distinct IDs.

    If you check Post #1695927, you'll see that the final output of the script I post does produce a distinct list of IDs. And it was created and tested using SQL Server 2014 Dev Edition.

  • I realized late yesterday that my query needed some updates. With the sample data that was posted, I was able to update the script and get something that worked.

    ;

    WITH your_table_ordered AS (

    SELECT yt.id

    ,LAST_VALUE(yt.Stat1) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat1 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat1

    ,LAST_VALUE(yt.Stat2) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat2 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat2

    ,LAST_VALUE(yt.Stat3) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat3 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat3

    ,LAST_VALUE(yt.Stat4) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat4 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat4

    ,LAST_VALUE(yt.Stat5) OVER(PARTITION BY ID ORDER BY CASE WHEN yt.Stat5 IS NULL THEN CAST('1900-01-01' AS DATE) ELSE yt.[Date] END ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS Stat5

    ,yt.[Date]

    ,ROW_NUMBER() OVER(PARTITION BY yt.ID ORDER BY yt.[Date] DESC) AS rn

    FROM your_table AS yt

    )

    SELECT id

    ,Stat1

    ,Stat2

    ,Stat3

    ,Stat4

    ,Stat5

    ,[Date]

    FROM your_table_ordered

    WHERE rn = 1

    My initial code had problems, because LAST_VALUE defaults to ROWS UNBOUNDED PRECEDING AND CURRENT ROW, so I needed to explicitly set it to include all rows. I was also sorting DESC, so it was taking the "first" value by date instead of the last value. I also added a PARTITION BY clause on the ID. Finally, I added a ROW_NUMBER also partitioned on the ID, so that I could pull out one row per ID.

    Drew

    J. Drew Allen
    Business Intelligence Analyst
    Philadelphia, PA

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply