May 13, 2008 at 4:26 am
Thanks Michael, Peso and everyone! My question is now resolved
regards,
Abhishek
May 13, 2008 at 4:50 am
I hope you were able to use the SS 2005 solution.
When limited to SS 2000, our queries really start to slow down after a while.
May 13, 2008 at 6:47 am
Here is a script for testing, Jeff.
Results for the three previous methods and a new method is included below
Peso 1 - Traditional t-sqlSQL Profiler - Rowcount 949, reads 43,512, CPU 688 ms, duration 1,711 ms
Peso 2 - SQL Server 2005 CTESQL Profiler - Rowcount 949, reads 31,826, CPU 1,312 ms, duration 3,518 ms
Peso 3 - Clustered index trickSQL Profiler - Rowcount 949, reads 26, CPU 0 ms, duration 118 ms
Michael - Traditional t-sqlSQL Profiler - Rowcount 571, reads 1,012,187, CPU 66,438 ms, duration 197,139 ms
The fastest here (Peso 3) is ~ 1,100 times faster than the slowest.
** Peso 1
Table '#ProcessCellAllocation'. Scan count 19064, logical reads 43628, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
>> Peso 1 - 750 milliseconds.
** Peso 2
Table 'Worktable'. Scan count 4, logical reads 11902, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ProcessCellAllocation'. Scan count 2002, logical reads 16016, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
>> Peso 2 - 1390 milliseconds.
** Peso 3
Table '#ProcessCellAllocation'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ProcessCellAllocation'. Scan count 1, logical reads 8, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ProcessCellAllocation'. Scan count 2, logical reads 16, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
>> Peso 3 - 80 milliseconds.
** Michael
Table 'Worktable'. Scan count 1001, logical reads 1012497, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ProcessCellAllocation'. Scan count 3, logical reads 24, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
>> Michael - 87713 milliseconds.
I mean no harm in posting the results. I just find it interesting that different methods and approaches to this problem have significantly different timings.
N 56°04'39.16"
E 12°55'05.25"
May 13, 2008 at 7:19 am
For 10,000 date pairs, I couldn't run Peso 1 and Michael in a timely fashion.
Here are however the results for Peso 2 and Peso 3
** Peso 2
Table 'Worktable'. Scan count 4, logical reads 119902, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ProcessCellAllocation'. Scan count 20002, logical reads 1080108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
>> Peso 2 - 138890 milliseconds.
** Peso 3
Table '#ProcessCellAllocation'. Scan count 1, logical reads 15, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ProcessCellAllocation'. Scan count 1, logical reads 2, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ProcessCellAllocation'. Scan count 1, logical reads 54, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table '#ProcessCellAllocation'. Scan count 2, logical reads 108, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
>> Peso 3 - 360 milliseconds.
N 56°04'39.16"
E 12°55'05.25"
May 13, 2008 at 7:33 am
Michael Meierruth (5/13/2008)
I hope you were able to use the SS 2005 solution.When limited to SS 2000, our queries really start to slow down after a while.
Try the Peso 3 solution and you'll change your mind 😀
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 7:39 am
Peso (5/13/2008)
Here is a script for testing, Jeff....
I mean no harm in posting the results. I just find it interesting that different methods and approaches to this problem have significantly different timings.
Thanks, Peter...
I, and many others on this and the "other" forum, frequently post such results just to show people the proof and the proof is in the performance of the code. The only harm that has come is that you stole my thunder with the Peso 3 code... The Clustered Index Update is exactly what I had in mind. You did it a bit differently than I would have (I'm still going to give it a whirl because it's an interesting problem), but the concept is precisely the same. Well done, Peter.
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 7:51 am
Thanks Jeff.
It sure is an interesting problem!
I tested with 100,000 date pairs and Peso 3 run in 900 ms. With 1,000,000 date pairs Peso 3 run in 2.2 seconds.
I will try to modify the code to get the "islands" or "collapsed date ranges".
N 56°04'39.16"
E 12°55'05.25"
May 13, 2008 at 7:53 am
Turned out to be easier than I thought.
The "collapsed" date ranges are already sequenced 🙂
-- Get the collapsed date ranges
SELECTProcessCell,
MIN(DateFrom) AS DateFrom,
MAX(DateTo) AS DateTo
FROM#ProcessCellAllocation
GROUP BYProcessCell,
Seq
ORDER BYSeq
N 56°04'39.16"
E 12°55'05.25"
May 13, 2008 at 8:17 am
Peso, taking a hard look at your query and wondering why you are linking on ProcessCell I realized that I had misunderstood the op's problem. I didn't understand that each ProcessCell is treated as a unit. Thus how long does you query take on your test data of 1000 rows with all ProcessCell values set to 'A'?
May 13, 2008 at 8:20 am
Michael Meierruth (5/13/2008)
Thus how long does you query take on your test data of 1000 rows with all ProcessCell values set to 'A'?
With Peso 3 it takes about 76 milliseconds.
With Peso 2 it takes about 1670 milliseconds.
I took the liberty to make a blog topic about this discussion
N 56°04'39.16"
E 12°55'05.25"
May 13, 2008 at 8:28 am
And Peso1? Or are you still waiting for it to come back?:D
What do you mean by 'making blog'. Where and how?
May 13, 2008 at 8:37 am
Heh... I never use the first release of anything... not even from Peter 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
May 13, 2008 at 8:45 am
The "blog topic" text is a link to the actual blog topic entry.
With Peso 1 it takes about 458,426 milliseconds for all 1,000 ProcessCell set to same value
Give or take few milliseconds 🙂
N 56°04'39.16"
E 12°55'05.25"
May 14, 2008 at 5:52 am
So now I know the speed of your computer - give or take a milisecond. Pretty soon I'll know what color it is.;)
I finally understand your 'wicked' update in Peso3. The clustered index makes the update be performed in a certain row order. BTW, is this guaranteed? I think there was a thread somewhere that discussed whether this is always true or never fair or sometimes unfair , etc.
And then you essentially loop sequentially through the date pairs in DateFrom,DateTo order using the @variables to pass data to the next row - discovering the gaps as you go.
As a technique, I will remember this.
As for your term 'documented trick' what precisely are you refering to? The use of @variables or the order guaranteed by the clustered index, or what?
May 14, 2008 at 8:52 am
Michael,
They don't come right out with an example that I can easily find in BOL, but if you put a couple of bread-crumbs together, you come up with the following trail...
1. From "variables, assigning, SELECT @local_variable" in BOL 2000
SELECT { @local_variable = expression }
expression
Is any valid Microsoft® SQL Server™ expression...
SELECT @local_variable is usually used to return a single value into the variable. It can return multiple values if, for example, expression is the name of a column. If the SELECT statement returns more than one value, the variable is assigned the last value returned.
2. From "UPDATE, UPDATE (described)" in BOL 2000
SET @variable = column = expression sets the variable to the same value as the column. This differs from SET @variable = column, column = expression, which sets the variable to the pre-update value of the column.
From the same BOL article... and I guess this one is what you're really looking for...
expression
Is a variable, literal value, expression, or a parenthesized subSELECT statement that returns a single value. The value returned by expression replaces the existing value in column_name or @variable.
The keys in the above to what is being done are...
"If the SELECT statement returns more than one value, the variable is assigned the last value returned." That means that if a variable is used as a "target" more than once, however it is done, the variable will retain the last value for that particular "row" or "interation" of the SELECT (and make no mistake about it, SELECT is actually an iteration behind the scenes).
The other important key is that "expression" can be just about anything including the same variable that you just used.
So far as the "order of processing" being guaranteed, you have to read about the behind the scenes construction of a clustered index. It can take a long time to figure out that if the clustered index is used in a query, the processing will occur in that order. To force the use of a clustered index, you can use the WITH(INDEX(clustered_index_name)) "hint"... it's NOT a hint, it's a DIRECTIVE that will always force the use... they even warn that it overrides the optimizer and it may have an impact on performance because it WILL use the indexes given in the "hint".
The "wicked" or "trick" or "quirky" update is the fastest way to process data in a "procedural" fashion using a "set based" technique. ROW_NUMBER and RANK functionality from SQL Server 2005 can be done in SQL Server 2000 using this technique... please see the following article that explains how.
http://www.sqlservercentral.com/articles/Advanced+Querying/61716/
Obviously, it's not just limited to solving those problems...
The part I can't find is the part that says a "Select list is executed in the order that it appears", just like a Case statement would be. The last time I had to look that up on the MS site was many years ago and the URL I had dried up with, apparently, no replacement. My appologies...
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 16 through 30 (of 38 total)
You must be logged in to reply to this topic. Login to reply