Trouble filling missing months of data with zero values

  • Hello,

    Hopefully someone can help a novice with the following problem i have. We have as system that records monthly reported figures and for several accounts, quite often the sites don't report their figures. So for a 12 month period there can often be gaps.

    I would like to show all months in a result set and just show a zero '0' if they haven't reported. I am sure that this should be simple but google is not helping me and i am going round in circles. I have tried to use a cross join but can't get close and i have also tried creating a CTE of months (01--12) and left joining to the source data and replacing the nulls. Both ways i have failed.

    Please can you help.

    I have provided some very basic DDL, test data (not the actual source data) to highlight the problem and also a screenshot of the result set that i desire (partial).

    CREATE TABLE TestData
    (
    [Year]INT
    ,[Month]CHAR (2)
    ,[Site]CHAR (5)
    ,[Account]CHAR (4)
    ,[Value]INT
    )

    INSERT INTO TestData ([Year],[Month],[Site],[Account],[Value])
    Values('18','06','Site1','Acc1','75')
    ,('18','12','Site1','Acc1','150')
    ,('18','01','Site1','Acc2','25')
    ,('18','05','Site1','Acc2','50')
    ,('18','01','Site2','Acc1','65')
    ,('18','03','Site2','Acc1','10')
    ,('18','08','Site2','Acc1','70')
    ,('18','03','Site2','Acc2','35')
    ,('18','04','Site2','Acc2','15')

    sqlss1

    This is my desired result for this test data (Partial):

    sqlss2l

    Thanks in advance,

     

    • This topic was modified 3 years, 4 months ago by  scienceof8.
  • Is your data really broken out by year and month?  With the month value set as char(2)?

    Will you only ever be querying for a single year - or will it be cross year?

     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • No the source data is actually in financial periods that i split out into 2 columns. For example 1801, 1802, 1803, etc.

    First 2 digits are the financial year and the second 2 digits are the financial month (not calendar month). I quite often split them.

    The table will only have one years worth of data in it at source, as each financial year will have its own table. So it will only ever be querying for a single year.

  • DECLARE @start_year int
    DECLARE @start_month char(2)
    DECLARE @month_count smallint
    SET @start_year = 18
    SET @start_month = '01'
    SET @month_count = 12

    ;WITH
    cte_tally10 AS (
    SELECT * FROM (VALUES(0),(0),(0),(0),(0),(0),(0),(0),(0),(0)) AS numbers(number)
    ),
    cte_tally1000 AS (
    SELECT 0 AS number UNION ALL
    SELECT ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS number
    FROM cte_tally10 c1 CROSS JOIN cte_tally10 c2 CROSS JOIN cte_tally10 c3
    ),
    cte_dates AS (
    SELECT DATEADD(MONTH, t.number, CAST((@start_year + 2000 ) * 10000 + @start_month * 100 + 01 AS varchar(8))) AS date
    FROM cte_tally1000 t
    WHERE t.number BETWEEN 0 AND @month_count - 1
    )
    SELECT d2.Year, d2.Month, CD.Site, CD.Account, ISNULL(TD.Value, 0) AS Value
    FROM cte_dates d
    CROSS APPLY (
    SELECT YEAR(d.date) % 100 AS year, RIGHT('0' + CAST(MONTH(date) AS varchar(2)), 2) AS month
    ) AS d2
    CROSS JOIN (
    SELECT DISTINCT Site, Account
    FROM TestData
    ) AS CD /*ControlData*/
    LEFT OUTER JOIN TestData TD ON TD.year = d2.year AND TD.month = d2.month AND TD.Site = CD.site AND TD.Account = CD.Account
    ORDER BY year, month, Site, Account

    SQL DBA,SQL Server MVP(07, 08, 09) "It's a dog-eat-dog world, and I'm wearing Milk-Bone underwear." "Norm", on "Cheers". Also from "Cheers", from "Carla": "You need to know 3 things about Tortelli men: Tortelli men draw women like flies; Tortelli men treat women like flies; Tortelli men's brains are in their flies".

  • Thanks Scott,

    That is certainly the result set that i am after. Is there are way to achieve this without using Tally's? i am a novice and struggle with understanding code around tallys. I will dive deeper to understand, but right now that looks complicated and i like to understand the code. Thank you for your help.

  • ;WITH ReportMatrix AS (
    SELECT d.[Year], x.[Month], d.[Site], d.[Account]
    FROM (
    SELECT [Year], [Site], [Account]
    FROM #TestData
    GROUP BY [Year], [Site], [Account]
    ) d
    CROSS APPLY(VALUES ('01'),('02'),('03'),('04'),('05'),('06'),('07'),('08'),('09'),('10'),('11'),('12')) x ([Month])
    )

    SELECT m.[Year], m.[Month], m.[Site], m.[Account], t.[Value]
    FROM ReportMatrix m
    LEFT JOIN #TestData t
    ON t.[Year] = m.[Year]
    AND t.[Month] = m.[Month]
    AND t.[Site] = m.[Site]
    AND t.[Account] = m.[Account]
    ORDER BY m.[Year], m.[Site], m.[Account], m.[Month]
    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Here is an alternate version

       With accountData
    As (
    Select Distinct
    td.[Year]
    , t.[Month]
    , td.Site
    , td.Account
    From #testData td
    Cross Apply (Values ('01'), ('02'), ('03'), ('04'), ('05'), ('06')
    , ('07'), ('08'), ('09'), ('10'), ('11'), ('12')) As t(Month)
    )
    Select ad.[Year]
    , ad.[Month]
    , ad.Site
    , ad.Account
    , [Value] = coalesce(td.[Value], 0)
    From accountData ad
    Left Join #testData td On td.[Year] = ad.[Year]
    And td.[Month] = ad.[Month]
    And td.Site = ad.Site
    And td.Account = ad.Account
    Order By
    ad.[Year]
    , ad.Site
    , ad.Account
    , ad.[Month];


     

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Thank you, this is very clear and exactly what i was looking for.

  • Jeffrey Williams wrote:

    Here is an alternate version

       With accountData
    As (
    Select Distinct
    td.[Year]
    , t.[Month]
    , td.Site
    , td.Account
    From #testData td
    Cross Apply (Values ('01'), ('02'), ('03'), ('04'), ('05'), ('06')
    , ('07'), ('08'), ('09'), ('10'), ('11'), ('12')) As t(Month)
    )
    Select ad.[Year]
    , ad.[Month]
    , ad.Site
    , ad.Account
    , [Value] = coalesce(td.[Value], 0)
    From accountData ad
    Left Join #testData td On td.[Year] = ad.[Year]
    And td.[Month] = ad.[Month]
    And td.Site = ad.Site
    And td.Account = ad.Account
    Order By
    ad.[Year]
    , ad.Site
    , ad.Account
    , ad.[Month];


    You might want to take a look at the execution plan for that.  With just the 9 rows of data the OP posted, it does 48 table scans and has an internal row count of 432 rows.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    You might want to take a look at the execution plan for that.  With just the 9 rows of data the OP posted, it does 48 table scans and has an internal row count of 432 rows.

    And how is that different from what Chris posted?  And - if that is an issue what is your solution?

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

  • Disregard my previous from this post if you end up seeing it in your email.  I did some bad math.  The 432 rows are completely necessary.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeffrey Williams wrote:

    Jeff Moden wrote:

    You might want to take a look at the execution plan for that.  With just the 9 rows of data the OP posted, it does 48 table scans and has an internal row count of 432 rows.

    And how is that different from what Chris posted?  And - if that is an issue what is your solution?

    Nah... my bad, Jeffrey.  I was hoping I was fast enough with my "takeback" but wasn't.  I did some bad math.  You're fine and my apologies for the incorrect call out.

     

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden wrote:

    Nah... my bad, Jeffrey.  I was hoping I was fast enough with my "takeback" but wasn't.  I did some bad math.  You're fine and my apologies for the incorrect call out. 

    No worries - using distinct vs group by does have differences and was wondering if I was missing something.  Looking at the differences - it isn't clear which would be the better option, so I would recommend testing both against a large set of data.  I just don't have time right now to perform that test.

    Jeffrey Williams
    “We are all faced with a series of great opportunities brilliantly disguised as impossible situations.”

    ― Charles R. Swindoll

    How to post questions to get better answers faster
    Managing Transaction Logs

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply