Introduction
A common request that many of us have had is our manager comes and wants a simple report on total sales for a day summarized by each hour. If you do a simple SUM and GROUP BY hour, you will only show the hours that have data. But your manager wants to see a 0 for the hours with no data. Instead of seeing this:
He wants to see this:
Creating the Test Data
In order to show how to do this I will need some test data. I will create a #SalesTestData table and populate it with 1,000,000 records with dates between Jan 1, 2007 and Dec 31, 2011. Then I will delete all the records that have a time in the 8 or 17 hours. I am also deleting data from 3 days of each month. This is being done so when we run the demo code, there will be some hours that have no data. This way I can demonstrate showing a 0 when there is no data.
The code that I use to populate the test data was taken from scripts written by Jeff Moden.
IF (SELECT OBJECT_ID('tempdb..#SalesTestData') ) IS NOT NULL DROP TABLE #SalesTestData /*Create the table and populate it with random dates from Jan 1, 2007 to Dec 31, 2011*/SELECT TOP 1000000 CAST(RAND(CHECKSUM(NEWID())) * 1826.0 + 39081.0 AS DATETIME) AS SaleTime ,ABS(CHECKSUM(NEWID())) % 50000 + 1 AS SaleQty INTO dbo.#SalesTestData FROM master.sys.all_columns AS ac CROSS JOIN master.sys.all_columns AS ac1 /*Creating a clustered index for performance*/CREATE CLUSTERED INDEX IXC_#SalesTestData_SaleTime ON #SalesTestData (SaleTime) /*Delete all data where the hour is 8 or 17 and where the day is the 9th, 16th or 23rd. This is to show that 0's will be returned for the missing data.*/DELETE #SalesTestData WHERE DATEPART(HOUR,SaleTime) IN (8,17) DELETE #SalesTestData WHERE DATEPART(day,SaleTime) IN (9,16,23) GO
Summing the Data by Hour
Now that we have the test data I will sum the data for one day by hour using a table variable. A table variable is like a temp table, but you have to declare it first and it will only exist during the current session. The reason for the table variable is so I can create a basic table with all the hours of the day. This table will just be populated for one day with a start hour and a next hour. I will then left join the #SalesTestDate table to this table. I use a left join so when I use the SUM and GROUP BY hour I will get a 0 for those hours without a sale.
DECLARE @Date DATETIME = '08/13/2009' ; /*Create a table variable and populate it with StartHour for one day*/DECLARE @hour TABLE (StartHour DATETIME ,NextHour DATETIME ) ; INSERT INTO @hour (StartHour ,NextHour ) SELECT DATEADD(hh,Number,@Date) AS StartHour ,DATEADD(hh,Number + 1,@Date) AS NextHour FROM Master.dbo.spt_Values WHERE Type = 'P' AND Number BETWEEN 0 AND 23 ; SELECT h.StartHour ,COALESCE(SUM(SalesTestData.SaleQty),0) AS Qty FROM @hour AS h LEFT OUTER JOIN #SalesTestData AS SalesTestData ON SalesTestData.SaleTime >= h.StartHour AND SalesTestData.SaleTime < NextHour AND SalesTestData.SaleTime >= @Date AND SalesTestData.SaleTime < DATEADD(dd,1,@Date) GROUP BY h.StartHour GO
Summarizing by Month with a Table Variable
Now I will sum the data by day for one month. I am running for April 2010, but you can change this to run for any month that there is data for. In this example I am creating a table variable and populating it with the days of the month. I then join this table to the #SalesTestData table and do my summary. I am using a left join on the #SalesTestData table. This way I will still get a 0 record for the days that there is no data. If I did an inner join then I would not get a record returned for those days.
/*To get the Sales count by day without a calendar table*/DECLARE @DataMonth INT ,@StartDate DATETIME ,@NextDate DATETIME ,@LastDay INT ; /*Calculating the start date based on the month */SET @DataMonth = 201004 ; SET @StartDate = CONVERT(DATETIME,LEFT(@DataMonth,4) + '-' + RIGHT(@DataMonth,2) + '-01') ; SET @NextDate = DATEADD(MONTH,1,@StartDate) ; SET @LastDay = DATEPART(d,DATEADD(d,-1,@NextDate)) -1 ; DECLARE @date TABLE (StartDate DATETIME ,NextDate DATETIME ) ; INSERT INTO @date (StartDate ,NextDate ) SELECT DATEADD(d,Number,@StartDate) AS StartDate ,DATEADD(d,Number + 1,@StartDate) AS NextDate FROM Master.dbo.spt_Values WHERE Type = 'P' AND Number BETWEEN 0 AND @LastDay ; SELECT h.StartDate AS [Date] ,COALESCE(SUM(SalesTestData.SaleQty),0) AS Qty FROM @date AS h LEFT OUTER JOIN #SalesTestData AS SalesTestData ON SalesTestData.SaleTime >= h.StartDate AND SalesTestData.SaleTime < h.NextDate and SalesTestData.SaleTime >= @StartDate AND SalesTestData.SaleTime < @NextDate GROUP BY h.StartDate ; GO
Creating a Calendar Table
Next let's create a calendar table and populate it. I'm populating the calendar table for the years 2007 to 2011, this way it will match the dates in the #SaleTestData table. It is a simple calendar table with a start date and next date for each day, and a MonthId. If I was going to create calendar table for my production system it would probably be a little more complicated, but for my demonstration this is all I need. I will then use this table to do a sum with it instead of the table variable.
DECLARE @StartDate DATETIME ; /*Date to start populating the calendar table. Populates for 5 years*/SELECT @StartDate = '01/01/2007' ; IF (SELECT OBJECT_ID('tempdb..#CalendarTest') ) IS NOT NULL DROP TABLE #CalendarTest SELECT CAST (CONVERT(CHAR(8),DATEADD(d,v.Number,@StartDate),112) AS INT) AS CalendarId ,DATEADD(d,v.Number,@StartDate) AS StartDate ,DATEADD(d,v.Number,@StartDate)+1 AS NextDate ,DATEPART(dd,DATEADD(d,v.Number,@StartDate)) AS DateOfMonth ,CONVERT(INT,REPLACE(CONVERT(CHAR(7),DATEADD(d,v.Number,@StartDate),102),'.','')) AS MonthId INTO #CalendarTest FROM Master.dbo.spt_Values v WHERE Type = 'P' AND v.Number BETWEEN 0 AND 1825 ; --Creating a clustered index for performance CREATE CLUSTERED INDEX IXC_#CalendarTest_CalendarId ON #CalendarTest (CalendarId) GO
Summarizing by Month with a Calendar Table
In this example I am using a calendar table instead of the table variable. Again I am using a left join on the calendar table. This code is a little cleaner than using the table variable plus if you have a calendar table it can be reused in other code, while the table variable is only good for the script that it was written in.
/*To get the Sales count by day with a Calendar table*/DECLARE @DataMonth INT ,@StartDate DATETIME ,@NextDate DATETIME ; SET @DataMonth = 201004 ; SELECT @StartDate = MIN(StartDate) ,@NextDate = MAX(NextDate) FROM #CalendarTest WHERE MonthId = @DataMonth ; SELECT CalendarTest.StartDate AS [Date] ,COALESCE(SUM(SalesTestData.SaleQty),0) AS Qty FROM #CalendarTest AS CalendarTest LEFT OUTER JOIN #SalesTestData AS SalesTestData ON DATEPART(DAY,SalesTestData.SaleTime) = CalendarTest.DateOfMonth AND SalesTestData.SaleTime >= @StartDate AND SalesTestData.SaleTime < @NextDate WHERE CalendarTest.MonthId = @DataMonth GROUP BY CalendarTest.StartDate ORDER BY CalendarTest.StartDate ; GO
I know there are other ways to accomplish this. This is just one example of how to do it.