Adding empty rows for items which do not exist

  • I have a simple query like this:

    SELECT

    Count(DISTINCT(ProcessID)) As ProcessCount,

    Substring(Cast(LogDate as nvarchar), 1, 11) As ReportDate

    FROM LogEntry

    WHERE LogDate BETWEEN 'Nov 01 2008' AND 'Dec 01 2008'

    GROUP BY Substring(Cast(LogDate as nvarchar), 1, 11)

    Now this works fine, it gives me the number of requests for each date. However for graphing this I need to ensure that there is an entry for every date. At the moment, if there are zero requests for the 5th Nov, no row will be returned for it in the resultset. I need a row to come back showing zero in the ProcessCount column.

    Any ideas?

    Thanks everyone

  • Hi Chris

    Do you have a tally (numbers) table? If so, try this:

    [font="Courier New"]DECLARE @StartDate DATETIME, @EndDate DATETIME

    SET @StartDate = 'Nov 01 2008'

    SET @EndDate = 'Dec 01 2008'

    SELECT DATEADD(DAY, n.number-1, @StartDate) AS CalcLogDate, l.LogDate

    FROM Numbers n

    LEFT JOIN #LogEntry l ON l.LogDate = DATEADD(DAY, n.number-1, @StartDate)

    WHERE DATEADD(DAY, n.number-1, @StartDate) < @EndDate

    [/font]

    If not, read this:

    http://www.sqlservercentral.com/articles/TSQL/62867/

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Hi Chris, thanks for the help!

    My table looks like this (simplified):

    CREATE TABLE [dbo].[LogEntry](

    [ID] [bigint] IDENTITY(1,1) NOT NULL,

    [LogDate] [smalldatetime] NOT NULL,

    [Message] [text] NOT NULL,

    [ProcessID] [nvarchar](50) NULL

    )

    Some test data:

    SET IDENTITY_INSERT LogEntry ON

    INSERT INTO LogEntry (ID, LogDate, [Message], ProcessID)

    SELECT '669', '2008-11-19 10:21:00', 'LogMessage', 'a3d655dd-9470-4de4-944e-a862edece434' UNION ALL

    SELECT '670', '2008-11-19 10:21:00', 'LogMessage', 'a3d655dd-9470-4de4-944e-a862edece434' UNION ALL

    SELECT '671', '2008-11-19 10:21:01', 'LogMessage', 'a3d655dd-9470-4de4-944e-a862edece434' UNION ALL

    SELECT '672', '2008-11-19 10:21:01', 'LogMessage', 'a3d655dd-9470-4de4-944e-a862edece434' UNION ALL

    SELECT '673', '2008-11-19 10:21:01', 'LogMessage', 'a3d655dd-9470-4de4-944e-a862edece434' UNION ALL

    SELECT '674', '2008-11-19 11:15:00', 'LogMessage', 'd7f997ae-a8d3-43f2-aab6-d99e11d54889' UNION ALL

    SELECT '675', '2008-11-19 11:15:00', 'LogMessage', 'd7f997ae-a8d3-43f2-aab6-d99e11d54889' UNION ALL

    SELECT '675', '2008-11-19 11:15:01', 'LogMessage', 'd7f997ae-a8d3-43f2-aab6-d99e11d54889' UNION ALL

    SELECT '675', '2008-11-19 11:15:01', 'LogMessage', 'd7f997ae-a8d3-43f2-aab6-d99e11d54889'

    SET IDENTITY_INSERT LogEntry ON

    I have a new tally table called Tally with and column called N

    Heres my newly edited query:

    DECLARE @StartDate DATETIME, @EndDate DATETIME

    SET @StartDate = 'Nov 01 2008'

    SET @EndDate = 'Dec 01 2008'

    SELECT

    DATEADD(DAY, n.N-1, @StartDate) AS CalcLogDate,

    Count(DISTINCT(ProcessID)) As ProcessCount

    FROM Tally n

    LEFT JOIN LogEntry l ON l.LogDate = DATEADD(DAY, n.N-1, @StartDate)

    WHERE DATEADD(DAY, n.N-1, @StartDate) < @EndDate

    GROUP BY DATEADD(DAY, n.N-1, @StartDate)

    I get a record for each day, but zero in the ProcessCount column for every record, where I should have 2 in the record for the 19th.

    Any ideas?

    Thanks

  • Hi Chris

    It's because there's a time component in LogDate. Using a range takes care of this:

    [font="Courier New"]SELECT DATEADD(DAY, n.number-1, @StartDate) AS CalcLogDate,

            COUNT(DISTINCT(ProcessID)) AS ProcessCount

    FROM Numbers n

    LEFT JOIN #LogEntry l ON l.LogDate > DATEADD(DAY, n.number-1, @StartDate) AND l.LogDate < DATEADD(DAY, n.number, @StartDate)

    WHERE DATEADD(DAY, n.number-1, @StartDate) < @EndDate

    GROUP BY DATEADD(DAY, n.number-1, @StartDate)

    ORDER BY DATEADD(DAY, n.number-1, @StartDate)

    [/font]

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Thanks very much, thats sopt on!

    For info, a collegue of mine came up with this alternative solution:

    declare @days as int

    declare @startDate as datetime

    declare @endDate as datetime

    set @startDate = '2008-11-01'

    set @endDate = '2008-12-01'

    set @days = datediff(dd, @startDate, @endDate)

    --print @days

    create table #tempVals

    (

    logdate varchar(11),

    numProcessID int

    )

    declare @i as int

    set @i = 0

    --temp table

    while(@i < @days)

    begin

    insert into #tempVals values

    (substring(cast(dateadd(dd, @i, @startDate) as varchar), 1, 11), 0);

    set @i = @i + 1;

    end

    SELECT

    count(distinct l.ProcessID) numProcesses,

    Substring(Cast(t.LogDate as nvarchar), 1, 11)

    FROM LogEntry l right outer join #tempVals t on

    Substring(Cast(l.LogDate as nvarchar), 1, 11) = t.logdate

    GROUP BY Substring(Cast(t.LogDate as nvarchar), 1, 11)

    drop table #tempVals;

    I'm going to run them both side-by-side for a few days until I have more entries in the LogEntry table and I can get a meaningfull timing out, although I'm pretty sure your version will be faster.

    Thanks very much for the help

  • Your colleagues version is exactly what I would have done pre tally-tables.

    Incidentally, this...

    Substring(Cast(l.LogDate as nvarchar), 1, 11) = t.logdate

    could be a performance killer, SQL Server won't know if a logdate value in l will match a value in t, or not, without first converting it.

    Here's how to set up the calendar table using a tally table:

    SELECT substring(cast(dateadd(dd, number, @startDate) as varchar), 1, 11) AS logdate,

    CAST(0 AS INT) AS numProcessID

    INTO #tempVals

    FROM Numbers n

    WHERE number <= @days

    Cheers

    ChrisM

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply