Need to fill the Gaps with previous value

  • ChrisM@Work (10/8/2012)


    Jeff Moden (10/8/2012)


    dwain.c (10/7/2012)[hrJeff - I don't think this solution works with your test harness.

    I did say that I needed to make a change to get it to work and described the change.

    dwain.c (10/8/2012)


    ...

    You'll notice it's exactly the same as Nagaram's query - except for the number generator.

    I can't find anything wrong with it?

    My mistake! I didn't take into account that row ordering of the OP's query was different. Correct number of rows is returned by his, yours and my Quirky Update...

    Jeff - Dwain was a bit cranky when he wrote that, his comb-over took off in the wind on the way to work 😀

    BWAAHAHAHA! You're assuming I have enough to comb over!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • try this also

    CREATE TABLE #SAMPLETABLE

    (

    DATECOL DATETIME,

    WEIGHTS float

    )

    INSERT INTO #SAMPLETABLE

    SELECT '08/09/2012',8.2 UNION ALL

    SELECT '08/10/2012',9.4 UNION ALL

    SELECT '08/14/2012',10 UNION ALL

    SELECT '08/15/2012',9.6 UNION ALL

    SELECT '08/16/2012',9.3 UNION ALL

    SELECT '08/19/2012',9.7

    declare @min-2 datetime,@max datetime,@WEIGHTS varchar(100)

    select @min-2=min(DATECOL),@max=max(DATECOL) from #SAMPLETABLE

    while @min-2<>@max

    begin

    select @WEIGHTS=WEIGHTS from #SAMPLETABLE where DATECOL=@MIN

    if not exists (select 1 from #SAMPLETABLE where DATECOL=@MIN)

    insert into #SAMPLETABLE select @min-2,@WEIGHTS

    set @min-2=@MIN+1

    end

    select * from #SAMPLETABLE order by DATECOL

    drop table #SAMPLETABLE

  • CREATE TABLE #SAMPLETABLE

    (

    DATECOL DATETIME,

    WEIGHTS float

    )

    INSERT INTO #SAMPLETABLE

    SELECT '08/09/2012',8.2 UNION ALL

    SELECT '08/10/2012',9.4 UNION ALL

    SELECT '08/14/2012',10 UNION ALL

    SELECT '08/15/2012',9.6 UNION ALL

    SELECT '08/16/2012',9.3 UNION ALL

    SELECT '08/19/2012',9.7

    declare @min-2 datetime,@max datetime,@WEIGHTS varchar(100)

    select @min-2=min(DATECOL),@max=max(DATECOL) from #SAMPLETABLE

    while @min-2<>@max

    begin

    select @WEIGHTS=WEIGHTS from #SAMPLETABLE where DATECOL=@MIN

    if not exists (select 1 from #SAMPLETABLE where DATECOL=@MIN)

    insert into #SAMPLETABLE select @min-2,@WEIGHTS

    set @min-2=@MIN+1

    end

    select * from #SAMPLETABLE order by DATECOL

    drop table #SAMPLETABLE

  • Hopefully there are still some people that get updates regarding this thread. I have a follow up question as I recently encountered a similar situation to the original problem. I have the exact root problem as this thread's original topic, but my issue is the requested date range generally contains tens if not hundreds of thousands of rows to house the vast amount of readings.

    As possibly suspected, this slows down the solution's script considerably. Any thoughts on how to speed things up a bit?

    Thanks a bunch,

    -Leif

  • Leifton (9/24/2013)


    Hopefully there are still some people that get updates regarding this thread. I have a follow up question as I recently encountered a similar situation to the original problem. I have the exact root problem as this thread's original topic, but my issue is the requested date range generally contains tens if not hundreds of thousands of rows to house the vast amount of readings.

    As possibly suspected, this slows down the solution's script considerably. Any thoughts on how to speed things up a bit?

    Thanks a bunch,

    -Leif

    There are may proposed solutions on this thread. Which one did you use? Also, if you could provide what the basic relevent parts of your table are and 10 or so rows of test data, that would be a big help. PLEASE see the first "Helpful Link" in my signature line below for the right way to do that to get the quickest help.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • cooljagadeesh (10/8/2012)


    CREATE TABLE #SAMPLETABLE

    (

    DATECOL DATETIME,

    WEIGHTS float

    )

    INSERT INTO #SAMPLETABLE

    SELECT '08/09/2012',8.2 UNION ALL

    SELECT '08/10/2012',9.4 UNION ALL

    SELECT '08/14/2012',10 UNION ALL

    SELECT '08/15/2012',9.6 UNION ALL

    SELECT '08/16/2012',9.3 UNION ALL

    SELECT '08/19/2012',9.7

    declare @min-2 datetime,@max datetime,@WEIGHTS varchar(100)

    select @min-2=min(DATECOL),@max=max(DATECOL) from #SAMPLETABLE

    while @min-2<>@max

    begin

    select @WEIGHTS=WEIGHTS from #SAMPLETABLE where DATECOL=@MIN

    if not exists (select 1 from #SAMPLETABLE where DATECOL=@MIN)

    insert into #SAMPLETABLE select @min-2,@WEIGHTS

    set @min-2=@MIN+1

    end

    select * from #SAMPLETABLE order by DATECOL

    drop table #SAMPLETABLE

    Sorry I missed this before. You should performance test this against some of the other solutions. You'll find that it doesn't do so well in most cases.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks for the speedy response Jeff,

    My apologies for my response being late. Here is some further expounding on my problem...

    I have a table that is structured as follows:

    Create Table dbo.SensorHistory

    (

    [time] as DATETIME,

    [SensorID] as INT,

    [Value] as Float

    )

    Ihave a percent change system setup so the timestamp-value pair for a particular sensor is only saved if it has changed relative to its previous value by so many percent. As a result, there are times where some sensors have history and others do not. In order to align this data later, we need to fill in some gaps upon retrieval (we fill in gaps with the previously known value before a change. A sample of data contained in the historical table is as follows:

    **ACTUAL**

    '2013-09-24 08:20:00', 01, 1000

    '2013-09-24 08:21:00', 01, 1003

    '2013-09-24 08:23:00', 01, NULL

    '2013-09-24 08:24:00', 01, 1009

    '2013-09-24 08:25:00', 01, NULL

    '2013-09-24 08:27:00', 01, NULL

    **DESIRED**

    '2013-09-24 08:20:00', 01, 1000

    '2013-09-24 08:21:00', 01, 1003

    '2013-09-24 08:23:00', 01, 1003

    '2013-09-24 08:24:00', 01, 1009

    '2013-09-24 08:25:00', 01, 1009

    '2013-09-24 08:27:00', 01, 1009

    Thanks,

    -Leif

  • Leifton - For your case I might just use a correlated sub-query due to its simplicity and the fact that it has widespread understanding.

    Create Table #SensorHistory

    (

    [time] DATETIME,

    [SensorID] INT,

    [Value] Float

    );

    INSERT INTO #SensorHistory

    SELECT '2013-09-24 08:20:00', 01, 1000

    UNION ALL SELECT '2013-09-24 08:21:00', 01, 1003

    UNION ALL SELECT '2013-09-24 08:23:00', 01, NULL

    UNION ALL SELECT '2013-09-24 08:24:00', 01, 1009

    UNION ALL SELECT '2013-09-24 08:25:00', 01, NULL

    UNION ALL SELECT '2013-09-24 08:27:00', 01, NULL;

    SELECT [time], [SensorID]

    ,[value]=ISNULL(value,

    (

    SELECT TOP 1 [value]

    FROM #SensorHistory b

    WHERE a.[SensorID]=b.[SensorID] AND b.[time] < a.[time] AND

    b.value IS NOT NULL

    ORDER BY b.[time] DESC

    ))

    FROM #SensorHistory a

    GO

    DROP TABLE #SensorHistory;

    That doesn't mean it would be the fastest. Other solutions proposed could also be adapted, and I suggest you try a few to see which one works best for your data.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (9/24/2013)


    Leifton - For your case I might just use a correlated sub-query due to its simplicity and the fact that it has widespread understanding.

    Create Table #SensorHistory

    (

    [time] DATETIME,

    [SensorID] INT,

    [Value] Float

    );

    INSERT INTO #SensorHistory

    SELECT '2013-09-24 08:20:00', 01, 1000

    UNION ALL SELECT '2013-09-24 08:21:00', 01, 1003

    UNION ALL SELECT '2013-09-24 08:23:00', 01, NULL

    UNION ALL SELECT '2013-09-24 08:24:00', 01, 1009

    UNION ALL SELECT '2013-09-24 08:25:00', 01, NULL

    UNION ALL SELECT '2013-09-24 08:27:00', 01, NULL;

    SELECT [time], [SensorID]

    ,[value]=ISNULL(value,

    (

    SELECT TOP 1 [value]

    FROM #SensorHistory b

    WHERE a.[SensorID]=b.[SensorID] AND b.[time] < a.[time] AND

    b.value IS NOT NULL

    ORDER BY b.[time] DESC

    ))

    FROM #SensorHistory a

    GO

    DROP TABLE #SensorHistory;

    That doesn't mean it would be the fastest. Other solutions proposed could also be adapted, and I suggest you try a few to see which one works best for your data.

    That's good code for this task especially since I'd expect the number of NULLs in the [Value] column to be comparatively low.

    The only thing that I'd change is to change the following line...

    ,[value]=ISNULL(value,

    ... to ...

    ,[value]=COALESCE(value,

    ...because ISNULL doesn't provide for the necessary "Short Circuit". The Correlated Sub Query (CSQ for short) is executed (as an Index Seek if one is available and as a Table Scan if one is not) n-1 times regardless of the number of NULLs. Changing it to COALESCE will cause it to "Short Circuit" if [Value] is not NULL just as if you had used a CASE statement. It's about the only thing that I don't like about ISNULL compared to COALESCE.

    Guess I'll have to write a "Hidden RBAR" article on the subject.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff - I see now how you manage to keep your post count so high. By working over time on holidays looking for old threads! 😛

    Interesting point about ISNULL vs. COALESCE that I did not know.

    Where in the world did you come up with such a thing?

    Waiting to see that article published.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • dwain.c (12/1/2013)


    Where in the world did you come up with such a thing?

    Heh... after spending hours on posts like this one wondering why the hell the code won't short circuit like I want it to. 😛 A CASE statement worked a treat but I very much like the shorter code. I'd run into this problem in real life before but forgot about it until I spent an hour trying to figure out why your code wasn't short circuiting and then I kicked myself for forgetting. I don't care for COALESCE because of the datatype problems it can cause, the fact that it's a touch slower than ISNULL, and requires at least one cup of coffee for me to type correctly. 😀

    To be honest, they could actually change COALESCE so that it doesn't short circuit (I believe that's as undocumented as the non-short circuit problem with ISNULL is) as it does now and we'd never know except for all the performance problems that would suddenly rear their ugly head. It would make the code much more bullet-proof if we used CASE like in the following...

    --===== The CASE method DOES provide "Short Circuiting".

    -- The CSQ is only executed when needed.

    -- It's a bit messier than just using COALESCE, though

    -- it may be more obvious to read/troubleshoot.

    SELECT hi.[Time]

    ,hi.[SensorID]

    , [Value] =

    CASE

    WHEN [Value] IS NOT NULL THEN [Value]

    ELSE

    ( --=== The Correlated Sub Query (CSQ)

    SELECT TOP 1 lo.[Value]

    FROM #SensorHistory lo

    WHERE lo.[SensorID] = hi.[SensorID]

    AND lo.[time] < hi.[time]

    AND lo.[Value] IS NOT NULL

    ORDER BY lo.[time] DESC

    )

    END

    FROM #SensorHistory hi

    ;

    You'll be able to tell that I spent some time on this again from the following code (still kicking myself for forgetting in the first place). These are the final "demo" code snippets that I saved for posterity. They'll figure prominently in the article to come.

    --===== Create the test table.

    -- This is not a part of the solution.

    CREATE TABLE #SensorHistory

    (

    [Time] DATETIME,

    [SensorID] INT,

    [Value] Float

    )

    ;

    --===== Populate the test table with some test data

    -- This is not a part of the solution.

    INSERT INTO #SensorHistory

    ([Time], [SensorID], [Value])

    SELECT '2013-09-24 08:20:00', 01, 1000 UNION ALL

    SELECT '2013-09-24 08:21:00', 01, 1003 UNION ALL

    SELECT '2013-09-24 08:23:00', 01, NULL UNION ALL

    SELECT '2013-09-24 08:24:00', 01, 1009 UNION ALL

    SELECT '2013-09-24 08:25:00', 01, NULL UNION ALL

    SELECT '2013-09-24 08:27:00', 01, NULL

    ;

    --===== Create the expected Clustered Index or at least one that

    -- will enhance the performance of code without causing

    -- Page-Splits during inserts.

    -- If this index is missing, then this IS a part of the solution.

    CREATE UNIQUE CLUSTERED INDEX IXC_SensorHistory_Time_SensorID

    ON #SensorHistory ([Time],[SensorID])

    ;

    --===== The ISNULL method DOESN'T provide "Short Circuiting".

    -- The CSQ is executed n-1 times, which can be very expensive.

    SELECT hi.[Time]

    ,hi.[SensorID]

    , [Value] =

    ISNULL([Value],

    ( --=== The Correlated Sub Query (CSQ)

    SELECT TOP 1 lo.[Value]

    FROM #SensorHistory lo

    WHERE lo.[SensorID] = hi.[SensorID]

    AND lo.[time] < hi.[time]

    AND lo.[Value] IS NOT NULL

    ORDER BY lo.[time] DESC

    )

    )

    FROM #SensorHistory hi

    ;

    GO

    --===== The COALESCE method DOES provide "Short Circuiting".

    -- The CSQ is only executed when needed.

    SELECT hi.[Time]

    ,hi.[SensorID]

    , [Value] =

    COALESCE([Value],

    ( --=== The Correlated Sub Query (CSQ)

    SELECT TOP 1 lo.[Value]

    FROM #SensorHistory lo

    WHERE lo.[SensorID] = hi.[SensorID]

    AND lo.[time] < hi.[time]

    AND lo.[Value] IS NOT NULL

    ORDER BY lo.[time] DESC

    )

    )

    FROM #SensorHistory hi

    ;

    GO

    --===== The CASE method DOES provide "Short Circuiting".

    -- The CSQ is only executed when needed.

    -- It's a bit messier than just using COALESCE, though

    -- it may be more obvious to read/troubleshoot.

    SELECT hi.[Time]

    ,hi.[SensorID]

    , [Value] =

    CASE

    WHEN [Value] IS NOT NULL THEN [Value]

    ELSE

    ( --=== The Correlated Sub Query (CSQ)

    SELECT TOP 1 lo.[Value]

    FROM #SensorHistory lo

    WHERE lo.[SensorID] = hi.[SensorID]

    AND lo.[time] < hi.[time]

    AND lo.[Value] IS NOT NULL

    ORDER BY lo.[time] DESC

    )

    END

    FROM #SensorHistory hi

    ;

    GO

    --===== The CROSS APPLY method DOESN'T provide "Short Circuiting"

    -- because everything in the FROM clause is executed before anything else.

    -- This is why a CSQ is sometimes much better than a CROSS APPLY.

    -- To get it to "Short Circuit", you'd have to add a CASE or a similar

    -- COALESCE to the CROSS APPLY.

    SELECT hi.[Time]

    ,hi.[SensorID]

    , [Value] = COALESCE(hi.[Value],ca.[Value])

    FROM #SensorHistory hi

    CROSS APPLY

    (

    SELECT TOP 1 lo.[Value]

    FROM #SensorHistory lo

    WHERE lo.[SensorID] = hi.[SensorID]

    AND lo.[time] < hi.[time]

    AND lo.[Value] IS NOT NULL

    ORDER BY lo.[time] DESC

    ) ca ([Value])

    ;

    GO

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Your answers to short-circuiting the CSQ is something I've been wondering about for quite some time, without really knowing how to go about testing for it.

    So thanks!


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

Viewing 12 posts - 31 through 41 (of 41 total)

You must be logged in to reply to this topic. Login to reply