October 24, 2010 at 5:02 pm
ChrisM@home (10/24/2010)
Thing is, it's so darned easy to get the correct results out of a rCTE. It's only when you want it to run quickly that you have to resort to the same warm cosy cosseted environment that the quirky update requires
.
Hey Chris,
I really do like your recursive CTE method - I remember being amazed by how fast it was when I first saw you use it (http://www.sqlservercentral.com/Forums/FindPost873955.aspx).
It does have a big advantage in simplicity (especially over the SQLCLR solution, which took me over an hour to put together) and for very many problems I'd say it was the method of choice - easily fast enough, fully standard techniques, and quick and easy to modify.
700ms! Paul you do make me laugh! 24hrs --> 5mins --> 16secs --> (Quirky Update, about 2secs) --> 700ms. Not bad at all.
Cheers 🙂
For those edge cases where absolute performance is critical (perhaps you have a few billion rows to process) the SQLCLR solution really shines. As far as I know, it is the only 'running total' solution that can make (effective) use of parallelism - and therein lies the whole of its advantage.
October 24, 2010 at 10:19 pm
ChrisM@home (10/24/2010)
Jeff Moden (10/22/2010)
... we'll have a drag race. 🙂
Quirky Update would win this one Jeff, by a factor of about 7 - we both know that 🙂
I don't think we'll find a requirement which can be met by both methods and won on performance by a rCTE. Had the OP's requirement been for an update then I wouldn't have bothered but since it was for a select, it was marginally worth it. Thing is, it's so darned easy to get the correct results out of a rCTE. It's only when you want it to run quickly that you have to resort to the same warm cosy cosseted environment that the quirky update requires.
700ms! Paul you do make me laugh! 24hrs --> 5mins --> 16secs --> (Quirky Update, about 2secs) --> 700ms. Not bad at all.
Heh... it's actually a darned clever method that you've come up with. And, it has the advantage of being documented! 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
October 24, 2010 at 10:24 pm
Paul White NZ (10/23/2010)
Jeff Moden (10/23/2010)
Do the source (##temp) and destination (##Results) tables need to be global tables in order for the solution that you posted to work?Hi Jeff,
The most accurate answer is to say that #tables and table variables won't work because they are private to a particular session. Normal tables and ##tables work fine. The other reason I chose a ##table is because tempdb is always in SIMPLE mode, so bulk loading will be minimally logged (the SQLCLR code takes a table lock).
The multiple reading and writing threads use different sessions, so a #table or variable would not be visible.
Paul
That's what I thought. My recommendation would be to avoid global temp tables because (as you know) they're visible and usable by anyone. The big problem with their use for stored procedures is that if two people try to run the same thing at the same time (admittedly, kind of tough to do at 700 ms), BOOM!!! It doesn't even have to be the same thing running... just two sprocs using the same names for the global temp tables.
Of course, if this were the real world and I needed to solve a bazillion row problem and needed your good CLR proc to do it, I'd modify the code to "reserve" the global temp names in a "token" table to be able to run. If it couldn't save the name in the table because it was already present, it would be because someone already had the code running from somewhere else. There's always a workaround.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 24, 2010 at 10:50 pm
Jeff Moden (10/24/2010)
That's what I thought. My recommendation would be to avoid global temp tables because (as you know) they're visible and usable by anyone. The big problem with their use for stored procedures is that if two people try to run the same thing at the same time (admittedly, kind of tough to do at 700 ms), BOOM!!! It doesn't even have to be the same thing running... just two sprocs using the same names for the global temp tables.
Yes, the global temporary tables are just to make the demonstration easier.
In the real world, I would use a real table, perhaps with a GUID name.
October 25, 2010 at 2:15 pm
Jeff Moden (10/24/2010)
Heh... it's actually a darned clever method that you've come up with. And, it has the advantage of being documented! 😛
Sorry it's taken so long to reply to this, had to leave work early with man-flu :/
Thanks Jeff. Not sure I can take credit for coming up with this, but I'll take a little for plugging it from time to time 🙂
FWIW I use both methods in anger at work, whichever seems appropriate.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
October 25, 2010 at 2:18 pm
Paul White NZ (10/24/2010)
For those edge cases where absolute performance is critical (perhaps you have a few billion rows to process) the SQLCLR solution really shines. As far as I know, it is the only 'running total' solution that can make (effective) use of parallelism - and therein lies the whole of its advantage.
You're a fast-jet kinda guy Paul 😀 best I can hope to do is a lumbering transport - slow and steady, lots of work.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
October 28, 2010 at 5:05 pm
WayneS (10/23/2010)
gmrose (10/23/2010)
Out of curiosity, how many records are you processing? And did you try out the script that I posted?
Wayne
Thank you (and all of the others) for your suggestions. There were about 300,000 records processed. I looked at your script and saw that it was centered on a field you acctnum. There really isn't any such field in my table. Perhaps you saw that field in my table called BankAcctId which one could could think is an account number. For my application, that field contains the same value for all of my 300,000 records. (It had a second value for the other 100,000+ records that I later excluded.)
Also, since my script might be run by more than one user at the same time, I wanted to avoid having it create and then drop permanent tables.
gmrose
Thanks for the feedback.
In the script I posted here, I was only using the ckid and stat fields. However, you would need to populate a temp table with the data you want processed, and put a clustered index on it. (The other script I posted was to try to get ColdCoffee to practice this, and doesn't apply to your issue.)
Thanks again for everyone's suggestions. I ended up rewriting the script to use Wayne's "quirky update solution". That script processed over 300,000 records in 18 seconds. I don't need anything faster than that. 🙂
October 28, 2010 at 7:40 pm
gmrose (10/28/2010)
Thanks again for everyone's suggestions. I ended up rewriting the script to use Wayne's "quirky update solution". That script processed over 300,000 records in 18 seconds. I don't need anything faster than that. 🙂
No worries - an interesting problem will often get more than one great solution posted for it. I think my preference would have been for the recursive CTE solution overall, but there you go. I'm also a little surprised that the QU solution takes that long on such a small number of records.
Make sure you understand exactly how the QU method works, and that it is an undocumented quirk of SQL Server that might disappear at some point. Be sure to test it explicitly whenever you apply a hotfix, cumulative update, service pack, or upgrade to a new version. You should also plan to handle the 'divide by zero' error that may be thrown if something breaks the QU method.
Viewing 8 posts - 31 through 37 (of 37 total)
You must be logged in to reply to this topic. Login to reply