How to find gaps between date ranges

  • Thanks Michael, Peso and everyone! My question is now resolved

    regards,

    Abhishek

  • I hope you were able to use the SS 2005 solution.

    When limited to SS 2000, our queries really start to slow down after a while.

  • Here is a script for testing, Jeff.

    Results for the three previous methods and a new method is included below

    Peso 1 - Traditional t-sqlSQL Profiler - Rowcount 949, reads 43,512, CPU 688 ms, duration 1,711 ms

    Peso 2 - SQL Server 2005 CTESQL Profiler - Rowcount 949, reads 31,826, CPU 1,312 ms, duration 3,518 ms

    Peso 3 - Clustered index trickSQL Profiler - Rowcount 949, reads 26, CPU 0 ms, duration 118 ms

    Michael - Traditional t-sqlSQL Profiler - Rowcount 571, reads 1,012,187, CPU 66,438 ms, duration 197,139 ms

    The fastest here (Peso 3) is ~ 1,100 times faster than the slowest.

    ** Peso 1

    Table '#ProcessCellAllocation'. Scan count 19064, logical reads 43628, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    >> Peso 1 - 750 milliseconds.

    ** Peso 2

    Table 'Worktable'. Scan count 4, logical reads 11902, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#ProcessCellAllocation'. Scan count 2002, logical reads 16016, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    >> Peso 2 - 1390 milliseconds.

    ** Peso 3

    Table '#ProcessCellAllocation'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#ProcessCellAllocation'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#ProcessCellAllocation'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    >> Peso 3 - 80 milliseconds.

    ** Michael

    Table 'Worktable'. Scan count 1001, logical reads 1012497, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#ProcessCellAllocation'. Scan count 3, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    >> Michael - 87713 milliseconds.

    I mean no harm in posting the results. I just find it interesting that different methods and approaches to this problem have significantly different timings.


    N 56°04'39.16"
    E 12°55'05.25"

  • For 10,000 date pairs, I couldn't run Peso 1 and Michael in a timely fashion.

    Here are however the results for Peso 2 and Peso 3

    ** Peso 2

    Table 'Worktable'. Scan count 4, logical reads 119902, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#ProcessCellAllocation'. Scan count 20002, logical reads 1080108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    >> Peso 2 - 138890 milliseconds.

    ** Peso 3

    Table '#ProcessCellAllocation'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#ProcessCellAllocation'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#ProcessCellAllocation'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    Table '#ProcessCellAllocation'. Scan count 2, logical reads 108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

    >> Peso 3 - 360 milliseconds.


    N 56°04'39.16"
    E 12°55'05.25"

  • Michael Meierruth (5/13/2008)


    I hope you were able to use the SS 2005 solution.

    When limited to SS 2000, our queries really start to slow down after a while.

    Try the Peso 3 solution and you'll change your mind 😀

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Peso (5/13/2008)


    Here is a script for testing, Jeff.

    ...

    I mean no harm in posting the results. I just find it interesting that different methods and approaches to this problem have significantly different timings.

    Thanks, Peter...

    I, and many others on this and the "other" forum, frequently post such results just to show people the proof and the proof is in the performance of the code. The only harm that has come is that you stole my thunder with the Peso 3 code... The Clustered Index Update is exactly what I had in mind. You did it a bit differently than I would have (I'm still going to give it a whirl because it's an interesting problem), but the concept is precisely the same. Well done, Peter.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks Jeff.

    It sure is an interesting problem!

    I tested with 100,000 date pairs and Peso 3 run in 900 ms. With 1,000,000 date pairs Peso 3 run in 2.2 seconds.

    I will try to modify the code to get the "islands" or "collapsed date ranges".


    N 56°04'39.16"
    E 12°55'05.25"

  • Turned out to be easier than I thought.

    The "collapsed" date ranges are already sequenced 🙂

    -- Get the collapsed date ranges

    SELECTProcessCell,

    MIN(DateFrom) AS DateFrom,

    MAX(DateTo) AS DateTo

    FROM#ProcessCellAllocation

    GROUP BYProcessCell,

    Seq

    ORDER BYSeq


    N 56°04'39.16"
    E 12°55'05.25"

  • Peso, taking a hard look at your query and wondering why you are linking on ProcessCell I realized that I had misunderstood the op's problem. I didn't understand that each ProcessCell is treated as a unit. Thus how long does you query take on your test data of 1000 rows with all ProcessCell values set to 'A'?

  • Michael Meierruth (5/13/2008)


    Thus how long does you query take on your test data of 1000 rows with all ProcessCell values set to 'A'?

    With Peso 3 it takes about 76 milliseconds.

    With Peso 2 it takes about 1670 milliseconds.

    I took the liberty to make a blog topic about this discussion


    N 56°04'39.16"
    E 12°55'05.25"

  • And Peso1? Or are you still waiting for it to come back?:D

    What do you mean by 'making blog'. Where and how?

  • Heh... I never use the first release of anything... not even from Peter 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • The "blog topic" text is a link to the actual blog topic entry.

    http://weblogs.sqlteam.com/peterl/archive/2008/05/13/Lightning-fast-collapsed-date-ranges-and-missing-date-ranges.aspx

    With Peso 1 it takes about 458,426 milliseconds for all 1,000 ProcessCell set to same value

    Give or take few milliseconds 🙂


    N 56°04'39.16"
    E 12°55'05.25"

  • So now I know the speed of your computer - give or take a milisecond. Pretty soon I'll know what color it is.;)

    I finally understand your 'wicked' update in Peso3. The clustered index makes the update be performed in a certain row order. BTW, is this guaranteed? I think there was a thread somewhere that discussed whether this is always true or never fair or sometimes unfair , etc.

    And then you essentially loop sequentially through the date pairs in DateFrom,DateTo order using the @variables to pass data to the next row - discovering the gaps as you go.

    As a technique, I will remember this.

    As for your term 'documented trick' what precisely are you refering to? The use of @variables or the order guaranteed by the clustered index, or what?

  • Michael,

    They don't come right out with an example that I can easily find in BOL, but if you put a couple of bread-crumbs together, you come up with the following trail...

    1. From "variables, assigning, SELECT @local_variable" in BOL 2000

    SELECT { @local_variable = expression }

    expression

    Is any valid Microsoft® SQL Server™ expression...

    SELECT @local_variable is usually used to return a single value into the variable. It can return multiple values if, for example, expression is the name of a column. If the SELECT statement returns more than one value, the variable is assigned the last value returned.

    2. From "UPDATE, UPDATE (described)" in BOL 2000

    SET @variable = column = expression sets the variable to the same value as the column. This differs from SET @variable = column, column = expression, which sets the variable to the pre-update value of the column.

    From the same BOL article... and I guess this one is what you're really looking for...

    expression

    Is a variable, literal value, expression, or a parenthesized subSELECT statement that returns a single value. The value returned by expression replaces the existing value in column_name or @variable.

    The keys in the above to what is being done are...

    "If the SELECT statement returns more than one value, the variable is assigned the last value returned." That means that if a variable is used as a "target" more than once, however it is done, the variable will retain the last value for that particular "row" or "interation" of the SELECT (and make no mistake about it, SELECT is actually an iteration behind the scenes).

    The other important key is that "expression" can be just about anything including the same variable that you just used.

    So far as the "order of processing" being guaranteed, you have to read about the behind the scenes construction of a clustered index. It can take a long time to figure out that if the clustered index is used in a query, the processing will occur in that order. To force the use of a clustered index, you can use the WITH(INDEX(clustered_index_name)) "hint"... it's NOT a hint, it's a DIRECTIVE that will always force the use... they even warn that it overrides the optimizer and it may have an impact on performance because it WILL use the indexes given in the "hint".

    The "wicked" or "trick" or "quirky" update is the fastest way to process data in a "procedural" fashion using a "set based" technique. ROW_NUMBER and RANK functionality from SQL Server 2005 can be done in SQL Server 2000 using this technique... please see the following article that explains how.

    http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

    Obviously, it's not just limited to solving those problems...

    The part I can't find is the part that says a "Select list is executed in the order that it appears", just like a Case statement would be. The last time I had to look that up on the MS site was many years ago and the URL I had dried up with, apparently, no replacement. My appologies...

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply