Blog Post

Including Missing Values in Data Ranges Without a Lookup Table

,

Including Missing Values in Data Ranges Without a Lookup Table

Including Missing Data

Including Missing Data

One of the ways that I learn new things is by helping others with their issues or questions. In my career, this has served me well by exposing me to a vast array of experiences that I may not have otherwise seen. I try to be active on various internal SQL discussion lists at work, and an interesting question came in today. My gut reaction was to go with the “stock answer” for this kind of question, but somehow it seemed like there should be a better way.

The question was about how to include missing sets of data from a range of results (i.e., include missing data with a value of null). The stock answer is to create a lookup table and do a LEFT JOIN to the data so that that values not represented in the data set gets returned with null values. In this particular scenario, the possible data values would not all be known ahead of time and could be different values each time you check it. That makes it difficult to pre-populate a lookup table. Fortunately, the increment of the values in the range is known (5 seconds), so it turns out that it is very simple to generate the lookup values dynamically using a recursive CTE.

The Email Exchange

Initial Email

Subject: Adding extra rows of data to a query result


Hello all. First I want to send a huge thank you to all on this alias. I have asked many questions recently while trying to get a reporting project completed for Visual Studio Load Testing. The feedback has been invaluable and I have made great strides on this effort.

So here’s my current predicament. I am querying a table that contains perfmon counters, collected throughout the run. However, for various different reasons it is not uncommon for there to be short intervals throughout the run where values are not collected for various counters. The DB does not add an entry for these, so when I get the results, I will have an incomplete result set (based on graphing the results). Example below.

Is there a way to add something to the query (or a view or whatever) that will add extra rows of data with the missing timestamps and null data for the counter values?

I get this from the DB:

2:13:08 AM0.150998
2:13:13 AM1.476516
2:13:18 AM0.094423
2:13:28 AM0.724203
2:13:33 AM0.175526

I want this in the final set:

2:13:08 AM0.150998
2:13:13 AM1.476516
2:13:18 AM0.094423
2:13:23 AM
2:13:28 AM0.724203
2:13:33 AM0.175526
My Initial Reply

The easiest way is to create a lookup table with all of the possible time values and then perform a Left Join to include entries from the lookup table that do not have a corresponding value in the data table. In the example you gave, how do you know that the missing time is 2:13:23 AM and not, for example, 2:13:22 AM?

I ask because if that time is in the table for some counters and not others, then you wouldn’t need a lookup table, you could simply use a CTE or derived table to query the distinct list of times and then left join that to the data table.

Next Reply

I “know” the time is 2:13:23 because the intervals are based on a sampling rate entered into the test run (in this case 5 seconds). I thought about doing the distinct time list option, but the only hiccup with that is there may be one or more intervals where NO counters got collected, which leaves me right back where I was before.

I will play with the lookup table idea. Thx.

My Final Solution

I played with the recursive CTE idea for a bit and quickly had a solution. I start by grabbing the MIN and MAX values from the data table into variables to define the overall range. Then I simply recurse the range in 5 second intervals until I reach the end of the range. Creating the range of values turned out to be simpler than I expected.

-- Create dummy table for testing and populate with smaple data
If OBJECT_ID('dbo.PerfMonData') Is Not Null
  Begin
    Drop Table dbo.PerfMonData;
  End
Go
Create Table dbo.PerfMonData (
    DataID int identity(1, 1) not null primary key,
    TrackTime time(0) not null,
    TrackValue decimal(9, 6))
Go
Insert Into dbo.PerfMonData (TrackTime, TrackValue)
Values ('2:13:08 AM', 0.150998),
    ('2:13:13 AM', 1.476516),
    ('2:13:18 AM', 0.094423),
    ('2:13:28 AM', 0.724203),
    ('2:13:33 AM', 0.175526)
Go
-- Query the data
Declare @MaxTT time(0),
    @MinTT time(0);
-- Get the MIN and MAX values for track time
Select @MaxTT = MAX(TrackTime),
    @MinTT = MIN(TrackTime)
From dbo.PerfMonData;
-- Build list of all times between MIN and MAX
-- values in 5 second intervals
With TrackTime (TrackTime)
As (Select @MinTT As TrackTime
    Union All
    Select DateAdd(ss, 5, TrackTime) As TrackTime
    From TrackTime
    Where TrackTime < @MaxTT
    )
Select TT.TrackTime, PMD.TrackValue
From TrackTime TT
Left Join dbo.PerfMonData PMD On TT.TrackTime = PMD.TrackTime;

Final Results

TrackTimeTrackValue
02:13:08 0.150998
02:13:13 1.476516
02:13:18 0.094423
02:13:23 NULL
02:13:28 0.724203
02:13:33 0.175526

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating