January 2, 2009 at 9:29 am
Lawrence (1/1/2009)
Jeff, I've been following this post as I am a SQL Noob and walked into a position where cursors are everywhere and I'm trying to understand when/when not to apply them. I applied your speed trials and received proportionately similar speed times. The XML example was very fast...but I haven't seen that example brought up in other posts in this thread or in any of my companies' code. Is there a reason why the XML example isn't applied more often for that speed benefit?Lawrence
It is a problem... Usually, you just don't concatenate data in a database... not for CSV's, not for display, not for anything because the outputing CSV information to files is much better done by things like BCP and display formatting is so much better done by reporting or GUI apps. The only things left after that is either building dynamic SQL or building dynamic commands across multiple tables or databases and not enough people even consider a good set based solution because, as you've seen in some of the posts, there's just not that much of a performance advantage in creating something less than, say, a thousand similar commands to be executed in a dynamic fashion. And, the truth is, so few people actually know about the high speed concatenation methods like using ISNULL and XML, that I'm not surprised to see it as a scarcity in threads like this or in code like you have at work.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 9:33 am
Hmmm. Just wrote a massive reply and the session timed out!!
Oh well. The long and short of it was:
1) It's pretty trendy to bash microsoft, but I'm going to have to get out the bermuda shorts and square glasses and stick up for them. They do try, they don't get it right always, but they do push things forward and force debate. This can only be a good thing - like this thread has done - the info wasn't 100% to start with but I am personally enriched from having had a dialogue with some very knowledgable and astute people.
2) to c# loop guy - have you done any performance tests to back up your thoughts? would be interested to see the results - particularly with 'c# app loop' .v. 'c# sql server clr loop'. Got to say though my gut feel is the same as what Jeff said - i'm pretty sure properly tuned sql would win.
3) directly on the update ... from issue - i don't think anyone said that update ... from statements weren't possible in another form - but i think the concensus is that update ... from is possibly the most efficient. I could do a roast dinner one brussel sprout at a time, but I don't - it's not that I can't - it's just that it wouldn't be the most efficient. Every language lets you do things in a variety of ways from frankly piss poor to really very quick to OMG WTF (e.g. jeff's for XML string concat). T-SQL is no exception.
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 2, 2009 at 9:34 am
Jeff Moden (1/2/2009)
Alexander Kuznetsov (12/24/2008)
Actually in most cases the best alternative is a loop written in a modern language such as C#, not a T-SQL WHILE loop or a CURSOR. Not only you have the advantage of using a convenient modern language, you may get better performance too. I believe that all these WHILE loop vs. CURSOR debates are becoming irrelevant...Except in cases of RegExReplace, I guarantee you that it won't be faster in nearly every case. The problem is that you've added another layer and have virtually guaranteed a RBAR solution. Yes, behind the scenes, SQL Server is all RBAR... but it's at the compiled machine language level where the best you can do in C# is go through an API.
I've got an example where the number of site visits to a web site are to be counted. There is a table of WebLogEvents that contains a row with a Datetime and the SessionId. A visit is defined as a SessionId that hasn't been seen for over 30 minutes on the web site. So the data is processed sorted by SessionId and Datetime and the time between each session visit is measured to see if it is a new session or the same one.
-- *******************************************************************
-- FUNCTION SiteVisits
-- Description: This function creates a table with a row for each site visit
-- for a day.
-- *******************************************************************
CREATE FUNCTION [dbo].[SiteVisits]
(
@Date datetime
)
RETURNS @Table TABLE
(
[Hour] tinyint, -- 0 to 23
[FiveMinute] tinyint -- 0 to 11
)
AS
BEGIN
DECLARE @ASPString nvarchar(100)
DECLARE @Len int
DECLARE @DateMinus1 datetime
DECLARE @ElevenThirty datetime
SET @ElevenThirty = '1899-12-30 23:30:00.000'
SET @ASPString = 'ASP.NET_SessionId='
SET @Len = Len(@ASPString)
SET @Date = Convert(varchar, @Date, 112)
SET @DateMinus1 = DateAdd(dd, -1, @Date)
DECLARE @PrevDateTime datetime
DECLARE @PrevSessionId nvarchar(50)
SET @PrevDateTime = '1999-01-01' --Initialise
SET @PrevSessionId = 'xxxxxxxx' --Initialise
DECLARE @CurrDateTime datetime
DECLARE @CurrSessionId nvarchar(50)
DECLARE @NewVisit bit
DECLARE SessionCursor CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY FOR -- Fulfilment site Downtime Cursor
SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,
CASE
WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN
SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])
+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))
- CharIndex(@ASPString, [cs(Cookie)])- @Len)
ELSE
SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)
END ASPSessionId
FROM dbo.WebLogEvents WITH (NOLOCK)
WHERE (Date = @Date
OR (Date = @DateMinus1
AND [Time] >= @ElevenThirty)
)
AND CharIndex(@ASPString , [cs(Cookie)]) > 0
ORDER BY 2, 1
OPEN SessionCursor
FETCH NEXT -- Get the first values into the @Curr variables
FROM SessionCursor
INTO @CurrDateTime,
@CurrSessionId
WHILE (@@Fetch_Status = 0)
BEGIN
SET @NewVisit = 0 --Initialise
IF @CurrSessionId <> @PrevSessionId -- New @CurrSessionId
BEGIN
IF Convert(varchar, @CurrDateTime, 112) = @Date -- It's for the day we are looking at date
BEGIN
SET @NewVisit = 1
END
--END IF
END
ELSE --It is the same session Id so test at least 30 mins since last logged
BEGIN
IF Convert(varchar, @CurrDateTime, 112) = @Date -- It's for the day we are looking at date
BEGIN
IF DateDiff(minute, @CurrDateTime, @PrevDateTime) > 30
BEGIN
SET @NewVisit = 1
END
--END IF
END
--END IF
END
--END IF
IF (@NewVisit = 1) --Insert a row into the table
BEGIN
INSERT INTO @Table
(
[Hour],
[FiveMinute]
)
VALUES
(
Cast(DatePart(hh, @CurrDateTime) as tinyint),
Cast(DatePart(minute, @CurrDateTime)/5 as tinyint)
)
END
--END IF
SET @PrevDateTime = @CurrDateTime
SET @PrevSessionId = @CurrSessionId
FETCH NEXT -- Get the first values into the @prev variables
FROM SessionCursor
INTO @CurrDateTime,
@CurrSessionId
END
--END WHILE
CLOSE SessionCursor
DEALLOCATE SessionCursor
RETURN
END
January 2, 2009 at 9:41 am
jacroberts, I am pretty sure (>90%) that your problem can be solved with a self-join set-based solution. If the table is properly indexed that solution could well be faster than the cursor one.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 2, 2009 at 9:42 am
Matt Whitfield (1/2/2009)
Hmmm. Just wrote a massive reply and the session timed out!!Oh well. The long and short of it was:
1) It's pretty trendy to bash microsoft, but I'm going to have to get out the bermuda shorts and square glasses and stick up for them. They do try, they don't get it right always, but they do push things forward and force debate. This can only be a good thing - like this thread has done - the info wasn't 100% to start with but I am personally enriched from having had a dialogue with some very knowledgable and astute people.
2) to c# loop guy - have you done any performance tests to back up your thoughts? would be interested to see the results - particularly with 'c# app loop' .v. 'c# sql server clr loop'. Got to say though my gut feel is the same as what Jeff said - i'm pretty sure properly tuned sql would win.
3) directly on the update ... from issue - i don't think anyone said that update ... from statements weren't possible in another form - but i think the concensus is that update ... from is possibly the most efficient. I could do a roast dinner one brussel sprout at a time, but I don't - it's not that I can't - it's just that it wouldn't be the most efficient. Every language lets you do things in a variety of ways from frankly piss poor to really very quick to OMG WTF (e.g. jeff's for XML string concat). T-SQL is no exception.
Heh... Your short version is pretty much spot on and to the point. I like it...
By the way, just before I try to post, I always do a copy of what I've typed because I've been burned by the timeout way to often myself. Shouldn't need to do that, but there it is. Ctrl-A, Ctrl-C saves me a lot of retyping.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 9:44 am
Jeff Moden (1/2/2009)
Ctrl-A, Ctrl-C saves me a lot of retyping.
I did that the second time! 🙂 I didn't know Ctrl-A, Ctrl-C was a shortcut for 'work this time please'. 🙂
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 2, 2009 at 9:50 am
TheSQLGuru (1/2/2009)
jacroberts, I am pretty sure (>90%) that your problem can be solved with a self-join set-based solution. If the table is properly indexed that solution could well be faster than the cursor one.
The reason that the SiteVisits problem is not-possible/very-difficult to do with a set based solution is that the process looks at the difference in data between each pair of rows parsed by the cursor in a particular order. It's not something that set base logic is good at.
January 2, 2009 at 9:50 am
TheSQLGuru (1/2/2009)
jacroberts, I am pretty sure (>90%) that your problem can be solved with a self-join set-based solution. If the table is properly indexed that solution could well be faster than the cursor one.
I agree here - just thinking off the top of my head, i would create a temp table with an identity, insert into that based on the correct sort order, then loj from the temp table.identity to temp table.identity - 1, and include the time differential check on that join.
That way, the join would only match the first entry for each 'visit' - so you could count a visit as any rows where the loj didn't return a row.
Just thinking out loud, feel free to shoot me!
Another thing with the CLR / Cursor thing - i think the point may be being missed. From where i'm standing, properly tuned SQL is set based - and i would say that probably for 95% of cases that cursor would be better being refactored as set based sql than it would being moved to a c# clr loop. And I would say for the cases where you *really* need to RBAR it - then you're probably going to be doing an UPDATE or something in that loop anyway, or you're going to be doing some long running operation (like the original post) whereby it will make very little difference.
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 2, 2009 at 9:54 am
Alexander Kuznetsov (12/24/2008)
Actually in most cases the best alternative is a loop written in a modern language such as C#,
not a T-SQL WHILE loop or a CURSOR. Not only you have the advantage of using a convenient
modern language, you may get better performance too. I believe that all these WHILE loop vs.
CURSOR debates are becoming irrelevant...
[p] C'mon Alex. I'll agree that CLR is a neat way of extending the language for special purposes, and spatial data in 2008is a great example of a case where it is of great value, but until hell freezes over, or C# gets enshrined into the SQL Standard, I'll never agree that it is a good way of avoiding cursors. Think of it! Using 'avoidance of cursors' as a sole reason for recourse to CLR with all the maintenance and debugging overhead. for I think that would be a horrible use of CLR. I don't even accept that C# is any more modern than SQL. from a logical perspective, it is like saying that apples are more modern than tomatoes. [/p]
Best wishes,
Phil Factor
January 2, 2009 at 10:04 am
jacroberts (1/2/2009)
TheSQLGuru (1/2/2009)
jacroberts, I am pretty sure (>90%) that your problem can be solved with a self-join set-based solution. If the table is properly indexed that solution could well be faster than the cursor one.The reason that the SiteVisits problem is not-possible/very-difficult to do with a set based solution is that the process looks at the difference in data between each pair of rows parsed by the cursor in a particular order. It's not something that set base logic is good at.
I'll take that challenge and I'll take all comers... CLR's, C# externals, Cursors, While Loops, and "Poor Man's Cursors". And, no, I'm not trying to be a smart a$$, put anyone down, or show off... I'm interested in showing folks a better way and how to use some of the true, raw, nasty fast power that can be found in T-SQL that's been availible since the days of Sybase that so few even know about never mind use.
Now, here's the catch... I don't even know how to spell "C#" and I sure won't allow CLR's (I don't trust them) on any of my computers at home or at work, so I need some good and honest souls to do the actual testing. I'll be happy to provide a simple script that will generate a million rows of randomized test data and then we can see what we can see. I think it'll be a great learning experience for everyone.
Any volunteers to help out in this very worthwhile bit of computational heaven?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 10:11 am
TheSQLGuru (1/2/2009)
Jeff Moden (1/2/2009)
Alexander Kuznetsov (12/24/2008)
Actually in most cases the best alternative is a loop written in a modern language such as C#, not a T-SQL WHILE loop or a CURSOR. Not only you have the advantage of using a convenient modern language, you may get better performance too. I believe that all these WHILE loop vs. CURSOR debates are becoming irrelevant...Except in cases of RegExReplace, I guarantee you that it won't be faster in nearly every case. The problem is that you've added another layer and have virtually guaranteed a RBAR solution. Yes, behind the scenes, SQL Server is all RBAR... but it's at the compiled machine language level where the best you can do in C# is go through an API.
I think you are wrong on this one Jeff. In the cases where you are replacing a tsql cursor or while loop iteration (which is what I think Alex was espousing) that does simple things like concatenation, sums, etc (NOT updates) the clr loop WILL be faster. See here: http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/28/poor-men-see-sharp-ndash-more-cursor-optimization.aspx. Adam is one of the best guys out there for CLR stuff and Hugo's tests seem to bear out Adam's statement that CLR 'cursoring' is the fastest.
I understand why you say that... I agree with them that the "poor man's cursor" methods pretty much suck and, in the case of things like running totals, are actually thousands of times worse than a cursor. Even Adam went back to the cursor method for running totals.
What most of them haven't tried and those that have tried insist it will fail, is "pseudo-cursors". They're been available in SQL Server since it was Sybase and very few people even know about them, never mind actually use them. I'll use one to solve the challenge that JacRoberts put up with his WebSite Session counter. I can pretty much guarantee that it'll beat everything including a C# program.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 10:13 am
Jeff Moden (1/2/2009)
jacroberts (1/2/2009)
TheSQLGuru (1/2/2009)
jacroberts, I am pretty sure (>90%) that your problem can be solved with a self-join set-based solution. If the table is properly indexed that solution could well be faster than the cursor one.The reason that the SiteVisits problem is not-possible/very-difficult to do with a set based solution is that the process looks at the difference in data between each pair of rows parsed by the cursor in a particular order. It's not something that set base logic is good at.
I'll take that challenge and I'll take all comers... CLR's, C# externals, Cursors, While Loops, and "Poor Man's Cursors". And, no, I'm not trying to be a smart a$$, put anyone down, or show off... I'm interested in showing folks a better way and how to use some of the true, raw, nasty fast power that can be found in T-SQL that's been availible since the days of Sybase that so few even know about never mind use.
Now, here's the catch... I don't even know how to spell "C#" and I sure won't allow CLR's (I don't trust them) on any of my computers at home or at work, so I need some good and honest souls to do the actual testing. I'll be happy to provide a simple script that will generate a million rows of randomized test data and then we can see what we can see. I think it'll be a great learning experience for everyone.
Any volunteers to help out in this very worthwhile bit of computational heaven?
I'd like to do one based on the set based way I had of doing it above. I think we should work with jacroberts' script as a start point - all solutions must come, reliably, to the same result.
I would also like to see three data sets - 10,000, 100,000 and 1,000,000 as I think that would be an important part in identifying scaling problems.
Code off! Code off! Code off!
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 2, 2009 at 10:13 am
Jeff
I'll take that challenge and I'll take all comers.
[p]I've got a 'quirky' Update solution that motors. I'd love some real data to run comparative timings on.[/p]
Best wishes,
Phil Factor
January 2, 2009 at 10:17 am
Phil Factor (1/2/2009)
Jeff
I'll take that challenge and I'll take all comers.
[p]I've got a 'quirky' Update solution that motors. I'd love some real data to run comparative timings on.[/p]
Heh... not fair, Phil... that's the same solution I'm going to use. 😉
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 10:46 am
Matt Whitfield (1/2/2009)
Jeff Moden (1/2/2009)
jacroberts (1/2/2009)
TheSQLGuru (1/2/2009)
jacroberts, I am pretty sure (>90%) that your problem can be solved with a self-join set-based solution. If the table is properly indexed that solution could well be faster than the cursor one.The reason that the SiteVisits problem is not-possible/very-difficult to do with a set based solution is that the process looks at the difference in data between each pair of rows parsed by the cursor in a particular order. It's not something that set base logic is good at.
I'll take that challenge and I'll take all comers... CLR's, C# externals, Cursors, While Loops, and "Poor Man's Cursors". And, no, I'm not trying to be a smart a$$, put anyone down, or show off... I'm interested in showing folks a better way and how to use some of the true, raw, nasty fast power that can be found in T-SQL that's been availible since the days of Sybase that so few even know about never mind use.
Now, here's the catch... I don't even know how to spell "C#" and I sure won't allow CLR's (I don't trust them) on any of my computers at home or at work, so I need some good and honest souls to do the actual testing. I'll be happy to provide a simple script that will generate a million rows of randomized test data and then we can see what we can see. I think it'll be a great learning experience for everyone.
Any volunteers to help out in this very worthwhile bit of computational heaven?
I'd like to do one based on the set based way I had of doing it above. I think we should work with jacroberts' script as a start point - all solutions must come, reliably, to the same result.
I would also like to see three data sets - 10,000, 100,000 and 1,000,000 as I think that would be an important part in identifying scaling problems.
Code off! Code off! Code off!
I agree... JacRoberts, would you provide the CREATE TABLE and related statements for the dbo.WebLogEvents table, please. From that, I'll make a million row table over a period of 10 years for 50,000 different session id's... unless you have a different "spread" that more closely approximates your data.
This will let us know not only what the different datatypes are for the Date and Time columns, but it'll also let us know if and what the PK and other indexes are.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 91 through 105 (of 272 total)
You must be logged in to reply to this topic. Login to reply