How to find gaps between date ranges

Question

Post reply

How to find gaps between date ranges

Viewing 15 posts - 16 through 30 (of 38 total)

You must be logged in to reply to this topic. Login to reply

Abhishek Jain-593926 Old Hand Points: 322 More actions · Answer 1

Thanks Michael, Peso and everyone! My question is now resolved

regards,

Abhishek

Michael Meierruth SSChampion Points: 10051 More actions · Answer 2

I hope you were able to use the SS 2005 solution.

When limited to SS 2000, our queries really start to slow down after a while.

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 3

Here is a script for testing, Jeff.

Results for the three previous methods and a new method is included below

Peso 1 - Traditional t-sqlSQL Profiler - Rowcount 949, reads 43,512, CPU 688 ms, duration 1,711 ms

Peso 2 - SQL Server 2005 CTESQL Profiler - Rowcount 949, reads 31,826, CPU 1,312 ms, duration 3,518 ms

Peso 3 - Clustered index trickSQL Profiler - Rowcount 949, reads 26, CPU 0 ms, duration 118 ms

Michael - Traditional t-sqlSQL Profiler - Rowcount 571, reads 1,012,187, CPU 66,438 ms, duration 197,139 ms

The fastest here (Peso 3) is ~ 1,100 times faster than the slowest.

** Peso 1

Table '#ProcessCellAllocation'. Scan count 19064, logical reads 43628, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

>> Peso 1 - 750 milliseconds.

** Peso 2

Table 'Worktable'. Scan count 4, logical reads 11902, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ProcessCellAllocation'. Scan count 2002, logical reads 16016, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

>> Peso 2 - 1390 milliseconds.

** Peso 3

Table '#ProcessCellAllocation'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ProcessCellAllocation'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ProcessCellAllocation'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

>> Peso 3 - 80 milliseconds.

** Michael

Table 'Worktable'. Scan count 1001, logical reads 1012497, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ProcessCellAllocation'. Scan count 3, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

>> Michael - 87713 milliseconds.

I mean no harm in posting the results. I just find it interesting that different methods and approaches to this problem have significantly different timings.

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 4

For 10,000 date pairs, I couldn't run Peso 1 and Michael in a timely fashion.

Here are however the results for Peso 2 and Peso 3

** Peso 2

Table 'Worktable'. Scan count 4, logical reads 119902, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ProcessCellAllocation'. Scan count 20002, logical reads 1080108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

>> Peso 2 - 138890 milliseconds.

** Peso 3

Table '#ProcessCellAllocation'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ProcessCellAllocation'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ProcessCellAllocation'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

Table '#ProcessCellAllocation'. Scan count 2, logical reads 108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

>> Peso 3 - 360 milliseconds.

N 56°04'39.16"
E 12°55'05.25"

Jeff Moden SSC Guru Points: 1004365 More actions · Answer 5

Michael Meierruth (5/13/2008)
I hope you were able to use the SS 2005 solution.
When limited to SS 2000, our queries really start to slow down after a while.

Try the Peso 3 solution and you'll change your mind 😀

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004365 More actions · Answer 6

Peso (5/13/2008)
Here is a script for testing, Jeff.
...
I mean no harm in posting the results. I just find it interesting that different methods and approaches to this problem have significantly different timings.

Thanks, Peter...

I, and many others on this and the "other" forum, frequently post such results just to show people the proof and the proof is in the performance of the code. The only harm that has come is that you stole my thunder with the Peso 3 code... The Clustered Index Update is exactly what I had in mind. You did it a bit differently than I would have (I'm still going to give it a whirl because it's an interesting problem), but the concept is precisely the same. Well done, Peter.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 7

Thanks Jeff.

It sure is an interesting problem!

I tested with 100,000 date pairs and Peso 3 run in 900 ms. With 1,000,000 date pairs Peso 3 run in 2.2 seconds.

I will try to modify the code to get the "islands" or "collapsed date ranges".

N 56°04'39.16"
E 12°55'05.25"

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 8

Turned out to be easier than I thought.

The "collapsed" date ranges are already sequenced 🙂

-- Get the collapsed date ranges

SELECTProcessCell,

MIN(DateFrom) AS DateFrom,

MAX(DateTo) AS DateTo

FROM#ProcessCellAllocation

GROUP BYProcessCell,

Seq

ORDER BYSeq

N 56°04'39.16"
E 12°55'05.25"

Michael Meierruth SSChampion Points: 10051 More actions · Answer 9

Peso, taking a hard look at your query and wondering why you are linking on ProcessCell I realized that I had misunderstood the op's problem. I didn't understand that each ProcessCell is treated as a unit. Thus how long does you query take on your test data of 1000 rows with all ProcessCell values set to 'A'?

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 10

Michael Meierruth (5/13/2008)
Thus how long does you query take on your test data of 1000 rows with all ProcessCell values set to 'A'?

With Peso 3 it takes about 76 milliseconds.

With Peso 2 it takes about 1670 milliseconds.

I took the liberty to make a blog topic about this discussion

N 56°04'39.16"
E 12°55'05.25"

Michael Meierruth SSChampion Points: 10051 More actions · Answer 11

And Peso1? Or are you still waiting for it to come back?:D

What do you mean by 'making blog'. Where and how?

Jeff Moden SSC Guru Points: 1004365 More actions · Answer 12

Heh... I never use the first release of anything... not even from Peter 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

SwePeso SSC-Dedicated Points: 39757 More actions · Answer 13

The "blog topic" text is a link to the actual blog topic entry.

http://weblogs.sqlteam.com/peterl/archive/2008/05/13/Lightning-fast-collapsed-date-ranges-and-missing-date-ranges.aspx

With Peso 1 it takes about 458,426 milliseconds for all 1,000 ProcessCell set to same value

Give or take few milliseconds 🙂

N 56°04'39.16"
E 12°55'05.25"

Michael Meierruth SSChampion Points: 10051 More actions · Answer 14

So now I know the speed of your computer - give or take a milisecond. Pretty soon I'll know what color it is.;)

I finally understand your 'wicked' update in Peso3. The clustered index makes the update be performed in a certain row order. BTW, is this guaranteed? I think there was a thread somewhere that discussed whether this is always true or never fair or sometimes unfair , etc.

And then you essentially loop sequentially through the date pairs in DateFrom,DateTo order using the @variables to pass data to the next row - discovering the gaps as you go.

As a technique, I will remember this.

As for your term 'documented trick' what precisely are you refering to? The use of @variables or the order guaranteed by the clustered index, or what?

Jeff Moden SSC Guru Points: 1004365 More actions · Answer 15

Michael,

They don't come right out with an example that I can easily find in BOL, but if you put a couple of bread-crumbs together, you come up with the following trail...

1. From "variables, assigning, SELECT @local_variable" in BOL 2000

SELECT { @local_variable = expression }

expression

Is any valid Microsoft® SQL Server™ expression...

SELECT @local_variable is usually used to return a single value into the variable. It can return multiple values if, for example, expression is the name of a column. If the SELECT statement returns more than one value, the variable is assigned the last value returned.

2. From "UPDATE, UPDATE (described)" in BOL 2000

SET @variable = column = expression sets the variable to the same value as the column. This differs from SET @variable = column, column = expression, which sets the variable to the pre-update value of the column.

From the same BOL article... and I guess this one is what you're really looking for...

expression

Is a variable, literal value, expression, or a parenthesized subSELECT statement that returns a single value. The value returned by expression replaces the existing value in column_name or @variable.

The keys in the above to what is being done are...

"If the SELECT statement returns more than one value, the variable is assigned the last value returned." That means that if a variable is used as a "target" more than once, however it is done, the variable will retain the last value for that particular "row" or "interation" of the SELECT (and make no mistake about it, SELECT is actually an iteration behind the scenes).

The other important key is that "expression" can be just about anything including the same variable that you just used.

So far as the "order of processing" being guaranteed, you have to read about the behind the scenes construction of a clustered index. It can take a long time to figure out that if the clustered index is used in a query, the processing will occur in that order. To force the use of a clustered index, you can use the WITH(INDEX(clustered_index_name)) "hint"... it's NOT a hint, it's a DIRECTIVE that will always force the use... they even warn that it overrides the optimizer and it may have an impact on performance because it WILL use the indexes given in the "hint".

The "wicked" or "trick" or "quirky" update is the fastest way to process data in a "procedural" fashion using a "set based" technique. ROW_NUMBER and RANK functionality from SQL Server 2005 can be done in SQL Server 2000 using this technique... please see the following article that explains how.

http://www.sqlservercentral.com/articles/Advanced+Querying/61716/

Obviously, it's not just limited to solving those problems...

The part I can't find is the part that says a "Select list is executed in the order that it appears", just like a Case statement would be. The last time I had to look that up on the MS site was many years ago and the URL I had dried up with, apparently, no replacement. My appologies...

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)