Hourly Group By

  • There's just one problem in all of that... try the following code and see the potential error... especially in SQL Server 2000.

    SELECT DISTINCT Number

    FROM master..spt_values WHERE Number >= 0 and Number <= 32000

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Whoops, I mistakenly thought that spt_values was accurate up to 32K instead of 2048. I should have double-checked that one. The error will manifest as a problem if you try to do hourly reporting where there's more than 5.6 years between @StartDate and @EndDate.

    As mentioned, use of a dedicated number table is preferable, and won't limit you to just 2048 values/5.6 years. I'll jump up and edit my code. Thanks for the spotting, Jeff 🙂

  • setlan1983 ,

    First, you owe me an explanation. You wrote that you want it done in a single query... would you mind telling me why? There's a reason I need to know...

    Second, you're kind of new to the forum. Take a look at how to present data to get better solutions faster in the article at the following URL, please...

    http://www.sqlservercentral.com/articles/Best+Practices/61537/

    Ok... on to the business at hand. Here's an SQL Server 2005 solution. First, let's build some test data... a lot of it... details are in the comments...

    --===== Create and populate a 1,000,000 row test table.

    -- Note that none of this is a part of the solution. It's just test data.

    -- Column "SalesDt" has a range of >=01/01/2000 and <01/01/2010 non-unique date/times

    -- Column "SalesAmt has a range of 0.0000 to 99.9999 non-unique numbers

    -- Column "OperatorID" has a range of 1 to 100 non-unique numbers

    -- Jeff Moden

    SELECT TOP 1000000

    SalesDt = CAST(RAND(CHECKSUM(NEWID()))*3653.0+36524.0 AS DATETIME),

    SalesAmt = CAST(ABS(CHECKSUM(NEWID()))%10000 /100.0 AS FLOAT),

    OperatorID = ABS(CHECKSUM(NEWID()))%100+1

    INTO #Sales

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    GO

    --===== Create an index for performance. Doesn't have to be clustered but

    -- seemed like a good candidate.

    CREATE CLUSTERED INDEX IX_Sales_SalesDt_OperatorID

    ON #Sales (SalesDt,OperatorID)

    --===== Delete any date/times =6:00PM just to demo the outer join

    DELETE #Sales

    WHERE DATEPART(hh,SalesDt) = 18

    GO

    Now, let's build a nice parameterized stored procedure to solve your problem "all in a single query"...

    --=============================================================================

    --===== Create a stored procedure to do the summaries by given dates

    --=============================================================================

    CREATE PROCEDURE dbo.HourlySales

    --===== Declare the procedure parameters

    @pStartDate DATETIME,

    @pEndDate DATETIME

    AS

    WITH

    cteDates AS

    (--==== Create all hours within the date range

    -- The Cross Join and ROW_NUMBER acts like a Tally table

    SELECT TOP (DATEDIFF(hh,@pStartDate,@pEndDate+1))

    DATEADD(hh, ROW_NUMBER() OVER (ORDER BY GETDATE())-1, @pStartDate) AS DateLoHour,

    DATEADD(hh, ROW_NUMBER() OVER (ORDER BY GETDATE()) , @pStartDate) AS DateHiHour

    FROM Master.sys.All_Columns ac1

    CROSS JOIN Master.sys.All_Columns ac2

    )

    ,

    cteSales AS

    (--==== Get only the data in the date range so we don't have to join on so much

    SELECT SalesDt, SalesAmt

    FROM #Sales s

    WHERE s.SalesDt >= @pStartDate

    AND s.SalesDt = d.DateLoHour

    AND s.SalesDt = d.DateLoHour

    AND s.SalesDt < d.DateHiHour

    GROUP BY d.DateLoHour

    ORDER BY d.DateLoHour

    ;

    GO

    Now, let's summarize a week of sales by date and hour.

    EXEC dbo.HourlySales '20090823', '20090829'

    I strongly recommend that you save the date formatting to be done in the GUI (ie. Presentation Layer).

    I also recommend that you read the following article on how to build a "Tally" or "Numbers" table and how it works. Mr. McLeod was spot on when he spoke of a "Numbers" table. The first CTE uses a Cross Join and ROW_NUMBER to act as a Tally table.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jim McLeod (8/30/2009)


    Whoops, I mistakenly thought that spt_values was accurate up to 32K instead of 2048. I should have double-checked that one. The error will manifest as a problem if you try to do hourly reporting where there's more than 5.6 years between @StartDate and @EndDate.

    As mentioned, use of a dedicated number table is preferable, and won't limit you to just 2048 values/5.6 years. I'll jump up and edit my code. Thanks for the spotting, Jeff 🙂

    You certainly had the right idea. Just so you know, though... Since you were striving for a 2k solution, be advised that spt_Values only has 0-255 contiguous numbers where the Type = "P" in SQL Server 2000... that's only about 8.4 months if each count is in days. 😉

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (8/30/2009)


    You certainly had the right idea. Just so you know, though... Since you were striving for a 2k solution, be advised that spt_Values only has 0-255 contiguous numbers where the Type = "P" in SQL Server 2000... that's only about 8.4 months if each count is in days. 😉

    Double-whoops. There's three morals of this story.

    Number #1, when testing with multiple instances on one machine, be careful when you accidentally connect to your 2005 instance when you think you're under 2000 while checking the values of spt_values.

    Number #2, don't use spt_values. Use your own Numbers table, or generate a new one each time. It will cause fewer problems!

    Number #3, Nothing gets by Jeff Moden 😉 I especially like your use of Master.sys.All_Columns for test data generation.

  • BWAA-HAAA!!! I had to laugh, Jim. I've violated #1 and #2 in the past which is the only reason why I caught it. 😉 Well done and thanks for the feedback. 🙂

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 6 posts - 16 through 20 (of 20 total)

You must be logged in to reply to this topic. Login to reply