Cursors Be Gone!

Question

Cursors Be Gone!

Viewing 15 posts - 106 through 120 (of 272 total)

You must be logged in to reply to this topic. Login to reply

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 1

And, just to be 100% clear... would you give a couple of actual examples of the various cookie formats you're expecting so I can make a good cross section of cookie types, as well? Thanks.

Last but not least, would you explain to folks why you're adding two days to the date/time of each cookie?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jonathan AC Roberts SSCoach Points: 18377 More actions · Answer 2

Jeff Moden (1/2/2009)
I agree... JacRoberts, would you provide the CREATE TABLE statement for the dbo.WebLogEvents table, please. From that, I'll make a million row table over a period of 10 years for 50,000 different session id's... unless you have a different "spread" that more closely approximates your data.

The sessionId is hidden in the [cs(Cookie)] Column.

An example row is:

perm_track=9be22f85-0aeb-492e-a1cb-0ef74043bf41;+__utma=131431056.2740713761679478300.1230240763.1230240763.1230240763.1;+__utmz=131431056.1230240763.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=trains%20to%20leeds;+sess_track=20c12566-eddb-43cf-a9a7-974b027a3834;+ASP.NET_SessionId=iyjuwx45r2rl5455fmdqpp55

And the sessionId for that value is iyjuwx45r2rl5455fmdqpp55 which can be located anywhere in the string hence all the CharIndex stuff in the SQL.

The table has about 10,000,000 rows added per day and about 60,000 site visits per day it stores just 8 days worth of data as all data over 8 days old is deleted each night, there is only 1 non-unique index on the [c-ip] column. There are about 80 million rows on that table all together.

The 'date' column is just the date without the time part. And the 'time' column is just a time in '1899-12-30'

You probably don't need all the columns but here is the actual table definition:

CREATE TABLE [dbo].[WebLogEvents](

[date] [datetime] NULL,

[time] [datetime] NULL,

[s-sitename] [varchar](255) NULL,

[s-ip] [varchar](50) NULL,

[cs-method] [varchar](50) NULL,

[cs-uri-stem] [varchar](255) NULL,

[cs-uri-query] [varchar](2048) NULL,

[s-port] [varchar](50) NULL,

[cs-username] [varchar](50) NULL,

[c-ip] [varchar](50) NULL,

[cs(User-Agent)] [varchar](255) NULL,

[cs(Cookie)] [varchar](2048) NULL,

[cs(Referer)] [varchar](2048) NULL,

[sc-status] [int] NULL,

[sc-substatus] [int] NULL,

[sc-win32-status] [int] NULL,

[sc-bytes] [int] NULL,

[cs-bytes] [int] NULL,

[time-taken] [int] NULL

) ON [PRIMARY]

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 3

Cool... that 1899-12-30 thingy about the time also explains why you add 2 days to the combination of date and time. Thanks, Jac.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TomThomson SSC Guru Points: 104773 More actions · Answer 4

TheSQLGuru (1/2/2009)
1) Tom, can you or anyone else give an example of UPDATEs that cannot be done without using the FROM clause that can be done with it's use?
2) I interacted with Hugo on a number of occassions at last years MVP summit and I can tell you from first hand experience he is a very smart (and thorough) guy. I for one will never dismiss out of hand anything that he blogs about.

Clearly it isn't possible to do anything with the clause that is impossible without it if one ignores performance. SQL is Turing-complete. There may be examples where the FROM clause can't be replaced by a string of select subclauses so that the same effect (ignoring legibility and performance) can't be achieved in a single SQL statement, but I doubt it. If you want an example where the cost in legibility and performance is unacceptably high, look in HK's blog where he gives a fine example. The performance issue is that the optimiser can't easily see what's going on - the FROM cause is much easier to understand mechanically. The clarity/legibility issue is a language expressiveness issues - there are other ways of getting clarity. HK suggests using row assignments, which I first came across in discussions in the 1980s about what we (ICL, Bull, Siemens, various Academic institutions) hoped "SQL3" might contain (and we might include in our joint massively parallel SQL implementation experiment - but we left this one out); in my view this is the correct approach in many cases of using "FROM" in update, but no MS implementation - not even SQLS 2008 - supports it.

HK himself gives a nice example where there is a vast performance difference in the very blog entry that deprecates the FROM clause. The blog entry does not say starkly "let's abolish FROM in UPDATE in TSQL", it says something like "Remember that FROM in UPDATE is not part of the SQL Standard and use it only when gains in performance or clarity are such as to offset the resulting non-portability" - something with which I'm wholly in agreement. HK himself says he would use the FROM clause in the example he gives in TSQL releases that don't have the MERGE feature ("For that performance gain, I will gladly choose the proprietary syntax over the standard!" are the words he uses). So while I maintain that anyone who said "let's abolish this feature" would be a nut, and the impression I got from Jeff's post was that Hugo was saying that, I certainly don't think that what he actually does say is the slightest bit nutty.

Tom

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 5

Heh... an I'll maintain that what he said is nutty for mor than 1 reason. Consider the title and the first sentence...

[font="Arial Black"]Let's deprecate UPDATE FROM! [/font]

I guess that many people using UPDATE … FROM on a daily basis do so without being aware that they are violating all SQL standards.

Now, if that doesn't sound like he'd like to get rid of it, then, perhaps this other thing he said does...

With this alternative available, I fail to see any reason why the proprietary UPDATE FROM syntax should be maintained. In my opinion, it can safely be marked as deprecated in SQL Server 2008.

... and, maybe even that doesn't do it for you. But this sure as hell does for me...

but it should be marked as deprecated, and it should eventually be removed from the product.

My answer to all of that is that code portability is a myth (and a nutty one, at that). Every RDBMS has it's own extensions to the cruddy and way-behind-the-real-world ANSI standards. In order to have true code portability, every database vendor must meet all of the standards (which, are pretty useless for some things to be done) and everyone must ignore some of the better features the various vendors have offered in the form of extensions to the language. If that's the case, then why even bother having different database vendors?

On the fringe of chaos lives innovation and new technology... the myth of code portability being realized for SQL would destroy both because there would be no need for any form of competition between vendors. We'd all end up using the same crap and, yes, it would all be crap.

That would be like saying there shall not be Java or C# anymore... instead, we'll only have one language called Cava so that everyone can write portable code. BWAAA=HAAA!!!! That's when I'll go back to writing code in Power Basic and blow everyones' doors off... 😛

I'd also like Microsoft to stop deprecating stuff... they're taking away stuff that's not broken and is frequently better than what it's being replaced by. :w00t:

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 6

Ok... from JacRobert's description, Create Table statement, and a couple of good guesses like the fact that since all the columns are nullable, there's no primary key, here's a chunk of code that'll build a million row test table.

Now, based on what I saw in JacRobert's code, I've changed the scope of the test code generator just a tiny bit. Instead of a million rows across ten years, it's a million rows across a single month for 50,000 session ID's to simulate a million "hits" on a web site in a month. Here's the code... lemme know if we need any changes so that we can all work from the same thing. As always, details are in the comments...

--=================================================================================================

-- Create and populate a 1,000,000 row WebLogEvents test table.

-- Jeff Moden

--=================================================================================================

--===== First, change to a nice safe place that we all have...

USE TempDb

--===== If the test table already exists here, drop it so we can rebuild it for reruns.

IF OBJECT_ID('TempDB.dbo.WebLogEvents','U') IS NOT NULL

DROP TABLE TempDB.dbo.WebLogEvents

GO

--===== Declare a some local variables to handle all the presets well need...

DECLARE @StartDate DATETIME, --First date in events table

@NumberOfDays INT, --Max number of different days in the events table

@TotalRows INT, --Total number of rows in the events table

@CookieMain VARCHAR(2048), --All the same... will have a session number appended

@Sessions INT --The max number of unique sessions in the events table

--===== ... and then assign them the presets we want.

SELECT @StartDate = '2008-12-01',

@NumberOfDays = 31,

@TotalRows = 1000000,

@CookieMain = 'perm_track=9be22f85-0aeb-492e-a1cb-0ef74043bf41;'

+ '+__utma=131431056.2740713761679478300.1230240763.1230240763.1230240763.1;'

+ '+__utmz=131431056.1230240763.1.1.utmcsr=google|utmccn=(organic)'

+ '|utmcmd=organic|utmctr=trains%20to%20leeds;'

+ '+sess_track=20c12566-eddb-43cf-a9a7-974b027a3834;'

+ 'ASP.NET_SessionId=iyjuwx45r2rl5455fmdqpp55',

@Sessions = 50000

--===== All set... build the table and populate it with random data according to the presets

-- We'll also measure how long this takes, just for grins... see the "Messages" tab

SET STATISTICS TIME ON

SELECT TOP (@TotalRows)

[Date] = @StartDate + ABS(CHECKSUM(NEWID())) % @NumberOfDays,

[Time] = CAST('1899-12-30' AS DATETIME) + RAND(CHECKSUM(NEWID())),

[cs(Cookie)] = CAST(@CookieMain + CAST(ABS(CHECKSUM(NEWID()))%@Sessions+1 AS VARCHAR(10)) AS VARCHAR(2048))

INTO dbo.WebLogEvents

FROM Master.dbo.SysColumns t1

CROSS JOIN Master.dbo.SysColumns t2

SET STATISTICS TIME OFF

GO

--===== And, finally, let's see what some of the data looks like and what

-- the min and max for each column is as a sanity check. This takes some time.

SELECT TOP 100 * FROM dbo.WebLogEvents

SELECT MIN(Date) AS MinDate,

MAX(Date) AS MaxDate,

MIN(Time) AS MinTime,

MAX(Time) AS MaxTime,

MIN([cs(Cookie)]) AS MinCookie,

MAX([cs(Cookie)]) AS MaxCookie

FROM dbo.WebLogEvents

[font="Arial Black"]Hey, Phil Factor[/font]... do you want to do the honors on the "quirky update" method, or do you want me to?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Matt Whitfield SSCrazy Eights Points: 8107 More actions · Answer 7

Jeff Moden (1/2/2009)
My answer to all of that is that code portability is a myth (and a nutty one, at that). ...snip... If that's the case, then why even bother having different database vendors?

Funny - that was in the long post I lost. Agree 100%.

Jeff Moden (1/2/2009)
I'd also like Microsoft to stop deprecating stuff... they're taking away stuff that's not broken and is frequently better than what it's being replaced by. :w00t:

I don't agree here. Yes - when they're deprecating stuff that's better than the replacement, then that sucks, and deprecation for deprcation's sake is also lame... but...

I have to say I am a big fan of the breaking change in the right places. Some stuff from .NET 1.1 was just poorly thought out, for example. I am glad that someone at MS thought 'you know what guys, let's not live with this forever - let's deprecate, accept our mistakes and move on'.

Similary - I think the idea of defined rules and defaults, and having them bound to columns / data types sucks in SQL Server. One of the things I am doing is a schema replication program, and working with those types doesn't do it for me. I think CHECK / DEFAULT constraints are a definite step forward in that respect.

And lastly - one of the reason Linux hacks me off is that compatibility with the 1970's seems to be an important factor. I know, it's a contentious point criticising Linux. But really, there are some amazingly smart people working on/with that OS, and I just wonder what they would come up with, if they were given a clean slate and put usability at the core of their architecture... I think it could be brilliant. Similarly, I think singularity has the promise of a great OS - but some marketing dude at MS is going to have a benny when they realise it's not binary compatible, and that will be where the layers of mediocrity creep in.

Semi-thinking out loud here.

wrt the script - top job. I'll get on with the set based method later on - i'm pretty sure it will come in second place, but i want to do it anyway for completeness and an excercise. For now i'm just putting the nippers to bed...

Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone

Phil Factor SSC-Insane Points: 20254 More actions · Answer 8

Hey, Phil Factor... do you want to do the honors on the "quirky update" method, or do you want me to?
--Jeff Moden

[p]Over to you Jeff. I had to nip off to do DBW. I'm now off to bed! I'll see how you got on tomorrow. Your build script looks very cool![/p]

Best wishes,
Phil Factor

Jonathan AC Roberts SSCoach Points: 18377 More actions · Answer 9

Jeff Moden (1/2/2009)
Ok... from JacRobert's description, Create Table statement, and a couple of good guesses like the fact that since all the columns are nullable, there's no primary key, here's a chunk of code that'll build a million row test table.
Now, based on what I saw in JacRobert's code, I've changed the scope of the test code generator just a tiny bit. Instead of a million rows across ten years, it's a million rows across a single month for 50,000 session ID's to simulate a million "hits" on a web site in a month. Here's the code... lemme know if we need any changes so that we can all work from the same thing. As always, details are in the comments...

The table has about 10,000,000 rows added per day and about 60,000 site visits per day it stores just 8 days worth of data as all data over 8 days old is deleted each night, there is only 1 non-unique index on the [c-ip] column, this column is not used in the TVF. There are about 80 million rows on that table all together. The average time a user spends on a site is about 20 minutes with about 2 percent spending over 30 minutes. For the test it probably doesn't matter too much as long as there are a large number of rows.

The process to count site visits takes place once a day, for the previous day, so you only need to aggregate the data for 1 day.

I can test the end result on real data if you make the input and output of the table valued function the same as the one I pasted in earlier.

Matt Whitfield SSCrazy Eights Points: 8107 More actions · Answer 10

Ok - here's one for you then...

Before i start - a couple of things. Given the volume of data you're talking about, i'm not confident my first method would be quicker. Because it's a function i was forced to use table variables, and not temp tables as i'd have preferred for such a large data set.

Also, i'm not convinced the logic is right in your function. I think the datediff(minute, @CurrDateTime, @PrevDateTime) should be datediff(minute, @PrevDateTime, @CurrDateTime). This is because how it is in the code you pasted, if @CurrDateTime is later then @PrevDateTime (which it would be always) then the value returned would be negative in the parameter order specified. This would mean that it would never be > 30, however it may be < -30. So it could well be that you are undercounting there?

Another thing was that I found that, because of the lack of indexing, reading from the table was more efficient in two parts. I altered the cursor declaration (pasted below) and I found that on my machine the existing way took 31.828 and the new way took 4.437. Anyway, here's some code.

Direct functional replacement, best I could get on Jeff's test data

Exec time 7.046

-- *******************************************************************

-- FUNCTION SiteVisits

-- Description: This function creates a table with a row for each site visit

-- for a day.

-- *******************************************************************

CREATE FUNCTION [dbo].[SiteVisitsMattTestData]

(

@Date datetime

)

RETURNS @Table TABLE

(

[Hour] tinyint, -- 0 to 23

[FiveMinute] tinyint -- 0 to 11

)

AS

BEGIN

DECLARE @ASPString nvarchar(100)

DECLARE @Len int

DECLARE @DateMinus1 datetime

DECLARE @ElevenThirty datetime

SET @ElevenThirty = '1899-12-30 23:30:00.000'

SET @ASPString = 'ASP.NET_SessionId='

SET @Len = Len(@ASPString)

SET @Date = Convert(varchar, @Date, 112)

SET @DateMinus1 = DateAdd(dd, -1, @Date)

-- ok - so get out the data for today and yesterday into separate temp tables

-- - the query processor and OR are not best of friends

DECLARE @sessionstoday TABLE (EventDateTime datetime, ASPSessionID varchar(40))

DECLARE @sessionsyesterday TABLE (EventDateTime datetime, ASPSessionID varchar(40))

-- todays

INSERT INTO @sessionstoday (EventDateTime, ASPSessionID)

SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

CASE

WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

- CharIndex(@ASPString, [cs(Cookie)])- @Len)

ELSE

SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

END ASPSessionId

FROM dbo.WebLogEvents WITH (NOLOCK)

WHERE Date = @date

AND CharIndex(@ASPString , [cs(Cookie)]) > 0

-- yesterdays

INSERT INTO @sessionsyesterday (EventDateTime, ASPSessionID)

SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

CASE

WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

- CharIndex(@ASPString, [cs(Cookie)])- @Len)

ELSE

SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

END ASPSessionId

FROM dbo.WebLogEvents WITH (NOLOCK)

WHERE Date = @DateMinus1 AND [Time] >= @ElevenThirty

AND CharIndex(@ASPString , [cs(Cookie)]) > 0

-- now create a unified table with a clustered PK for both sets of data

DECLARE @sessions TABLE (ID int identity(1,1) primary key clustered, EventDateTime datetime, ASPSessionID varchar(40))

-- and insert in the right order

insert INTO @sessions (EventDateTime, ASPSessionID)

SELECT EventDateTime, ASPSessionID FROM

(

SELECT EventDateTime, ASPSessionID FROM

@sessionstoday

UNION ALL

SELECT EventDateTime, ASPSessionID FROM

@sessionsyesterday

) itbl

ORDER BY ASPSessionID, EventDateTime

-- now insert into the functional result table

INSERT INTO @Table

(

[Hour],

[FiveMinute]

)

-- the hour and five minute marker

select datepart(hour, eventdatetime), datepart(minute, eventdatetime) / 5 FROM

(

-- from a subquery that finds the previous hit by inner joining onto the previous ID

-- and selects out the current hit's ID, evendatetime, the minutes between them, and the session IDs from each

SELECT s1.id, s1.eventdatetime, datediff(minute, s2.eventdatetime, s1.eventdatetime) as timediff,

s1.aspsessionid as sid1, s2.aspsessionid as sid2

FROM @sessions s1 INNER JOIN

@sessions s2

on s1.ID = s2.ID + 1

UNION ALL

-- union the first row in, as the above join will never pick it up

SELECT s1.id, s1.eventdatetime, 999 as timediff, s1.aspsessionid as sid1, '' as sid2

FROM @sessions s1 where id = 1

) itbl

-- where the hit is on the right day

WHERE convert(datetime,convert(varchar, eventdatetime, 112)) = @date

-- and the session id is not the same as the previous

and ((sid1 <> sid2) or

-- or the time differential is > 30 for the same session id

(timediff > 30))

-- order by ID because we want the result sets to be *exactly* the same

ORDER BY id

RETURN

END

Temp table version of the above, outside the scope of a function

Exec time 3.843

DECLARE @Date datetime

SET @Date = '20081220'

DECLARE @Table TABLE

(

[Hour] tinyint, -- 0 to 23

[FiveMinute] tinyint -- 0 to 11

)

BEGIN

DECLARE @ASPString nvarchar(100)

DECLARE @Len int

DECLARE @DateMinus1 datetime

DECLARE @ElevenThirty datetime

SET @ElevenThirty = '1899-12-30 23:30:00.000'

SET @ASPString = 'ASP.NET_SessionId='

SET @Len = Len(@ASPString)

SET @Date = Convert(varchar, @Date, 112)

SET @DateMinus1 = DateAdd(dd, -1, @Date)

CREATE TABLE #sessions(ID int IDENTITY(1,1) primary key clustered, EventDateTime datetime, ASPSessionID varchar(40))

INSERT INTO #sessions (EventDateTime, ASPSessionID)

SELECT EventDateTime, ASPSessionID FROM

(

SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

CASE

WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

- CharIndex(@ASPString, [cs(Cookie)])- @Len)

ELSE

SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

END ASPSessionId

FROM dbo.WebLogEvents WITH (NOLOCK)

WHERE Date = @date

AND CharIndex(@ASPString , [cs(Cookie)]) > 0

UNION ALL

SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

CASE

WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

- CharIndex(@ASPString, [cs(Cookie)])- @Len)

ELSE

SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

END ASPSessionId

FROM dbo.WebLogEvents WITH (NOLOCK)

WHERE Date = @DateMinus1 AND [Time] >= @ElevenThirty

AND CharIndex(@ASPString , [cs(Cookie)]) > 0

) itbl

ORDER BY ASPSessionID, EventDateTime

INSERT INTO @Table

(

[Hour],

[FiveMinute]

)

select datepart(hour, eventdatetime), datepart(minute, eventdatetime) / 5 FROM

(

SELECT s1.id, s1.eventdatetime, datediff(minute, s2.eventdatetime, s1.eventdatetime) as timediff,

s1.aspsessionid as sid1, s2.aspsessionid as sid2

FROM #sessions s1 INNER JOIN

#sessions s2

on s1.ID = s2.ID + 1

UNION ALL

-- union the first row in, as the above join will never pick it up

SELECT s1.id, s1.eventdatetime, 999 as timediff, s1.aspsessionid as sid1, '' as sid2

FROM #sessions s1 where id = 1

) itbl

WHERE convert(datetime,convert(varchar, eventdatetime, 112)) = @date

and ((sid1 <> sid2) or

(timediff > 30))

ORDER BY id

END

Revised declaration from cursor, reduced exec time of cursor method from 31.828 to 4.437

/****** Object: UserDefinedFunction [dbo].[SiteVisits] Script Date: 01/03/2009 00:06:48 ******/

SET ANSI_NULLS ON

GO

SET QUOTED_IDENTIFIER ON

GO

-- *******************************************************************

-- FUNCTION SiteVisits

-- Description: This function creates a table with a row for each site visit

-- for a day.

-- *******************************************************************

CREATE FUNCTION [dbo].[SiteVisitsNewCursor]

(

@Date datetime

)

RETURNS @Table TABLE

(

[Hour] tinyint, -- 0 to 23

[FiveMinute] tinyint -- 0 to 11

)

AS

BEGIN

DECLARE @ASPString nvarchar(100)

DECLARE @Len int

DECLARE @DateMinus1 datetime

DECLARE @ElevenThirty datetime

SET @ElevenThirty = '1899-12-30 23:30:00.000'

SET @ASPString = 'ASP.NET_SessionId='

SET @Len = Len(@ASPString)

SET @Date = Convert(varchar, @Date, 112)

SET @DateMinus1 = DateAdd(dd, -1, @Date)

DECLARE @PrevDateTime datetime

DECLARE @PrevSessionId nvarchar(50)

SET @PrevDateTime = '1999-01-01' --Initialise

SET @PrevSessionId = 'xxxxxxxx' --Initialise

DECLARE @CurrDateTime datetime

DECLARE @CurrSessionId nvarchar(50)

DECLARE @NewVisit bit

DECLARE SessionCursor CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY FOR -- Fulfilment site Downtime Cursor

SELECT EventDateTime, ASPSessionID FROM

(

SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

CASE

WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

- CharIndex(@ASPString, [cs(Cookie)])- @Len)

ELSE

SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

END ASPSessionId

FROM dbo.WebLogEvents WITH (NOLOCK)

WHERE Date = @DateMinus1

AND [Time] >= @ElevenThirty

AND CharIndex(@ASPString , [cs(Cookie)]) > 0

UNION ALL

SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,

CASE

WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN

SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])

+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))

- CharIndex(@ASPString, [cs(Cookie)])- @Len)

ELSE

SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)

END ASPSessionId

FROM dbo.WebLogEvents WITH (NOLOCK)

WHERE Date = @Date

AND CharIndex(@ASPString , [cs(Cookie)]) > 0

) itbl

ORDER BY ASPSessionID, EventDateTime

OPEN SessionCursor

FETCH NEXT -- Get the first values into the @Curr variables

FROM SessionCursor

INTO @CurrDateTime,

@CurrSessionId

WHILE (@@Fetch_Status = 0)

BEGIN

SET @NewVisit = 0 --Initialise

IF @CurrSessionId <> @PrevSessionId -- New @CurrSessionId

BEGIN

IF Convert(varchar, @CurrDateTime, 112) = @Date -- It's for the day we are looking at date

BEGIN

SET @NewVisit = 1

END

--END IF

END

ELSE --It is the same session Id so test at least 30 mins since last logged

BEGIN

IF Convert(varchar, @CurrDateTime, 112) = @Date -- It's for the day we are looking at date

BEGIN

IF DateDiff(minute, @CurrDateTime, @PrevDateTime) > 30

BEGIN

SET @NewVisit = 1

END

--END IF

END

--END IF

END

--END IF

IF (@NewVisit = 1) --Insert a row into the table

BEGIN

INSERT INTO @Table

(

[Hour],

[FiveMinute]

)

VALUES

(

Cast(DatePart(hh, @CurrDateTime) as tinyint),

Cast(DatePart(minute, @CurrDateTime)/5 as tinyint)

)

END

--END IF

SET @PrevDateTime = @CurrDateTime

SET @PrevSessionId = @CurrSessionId

FETCH NEXT -- Get the first values into the @prev variables

FROM SessionCursor

INTO @CurrDateTime,

@CurrSessionId

END

--END WHILE

CLOSE SessionCursor

DEALLOCATE SessionCursor

RETURN

END

GO

Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone

Jonathan AC Roberts SSCoach Points: 18377 More actions · Answer 11

Matt Whitfield (1/2/2009)
Before i start - a couple of things. Given the volume of data you're talking about, i'm not confident my first method would be quicker. Because it's a function i was forced to use table variables, and not temp tables as i'd have preferred for such a large data set.

I am not able to test the performance of the different methods until Monday.

The stored procedure that calls the function just takes the results table from the TVF and inserts this into a reporting table number of new site visits for every 5 minute period in a day, it does this by inserting a:

SELECT @Date, [Hour], [FiveMinute], Count(*)

FROM SiteVisits(@Date)

GROUP BY [Hour], [FiveMinute]

There is no reason why it shouldn't all be done inside one stored procedure without a TVF.

Also, i'm not convinced the logic is right in your function. I think the datediff(minute, @CurrDateTime, @PrevDateTime) should be datediff(minute, @PrevDateTime, @CurrDateTime). This is because how it is in the code you pasted, if @CurrDateTime is later then @PrevDateTime (which it would be always) then the value returned would be negative in the parameter order specified. This would mean that it would never be > 30, however it may be < -30. So it could well be that you are undercounting there?

Thanks, looks like you've spotted a bug!

Another thing was that I found that, because of the lack of indexing, reading from the table was more efficient in two parts. I altered the cursor declaration (pasted below) and I found that on my machine the existing way took 31.828 and the new way took 4.437. Anyway, here's some code.

That's interesting. I can't see why it should take longer as your method would require 2 full table scans as opposed to just 1 table scan if an OR were used in the WHERE clause? Maybe it's calculating the AND CharIndex(@ASPString , [cs(Cookie)]) > 0 for the whole table first rather than evaluating:AND Date = @Date

OR (Date = @DateMinus1

AND [Time] >= @ElevenThirty)and then afterwards evaluating: AND CharIndex(@ASPString , [cs(Cookie)]) > 0

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 12

Matt Whitfield (1/2/2009)
Temp table version of the above, outside the scope of a function
Exec time 3.843

Oh, now I know I need to upgrade my poor ol' 6 year old work horse... that little slice of computational heaven took over 28 seconds on my machine with no index on the test table.

Still if I can get close to your time on my box, it should absolutely wail on yours... I vote you be the final tester for this soiree. 😛

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

TheSQLGuru SSC Guru Points: 134017 More actions · Answer 13

Flipping between watching Alabama get stomped, playing with 3yo daughter and surfing so not much detailed review. But is it possible that some good ol' fashioned indexing will help out here??

Look forward to the continuation of this thread!

Hey, good job to whomever found the bug in the logic too!

Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service

battelofhalfwits Old Hand Points: 369 More actions · Answer 14

You seem pretty sure. Have you tested it?

Tested / Worked with = Yes

Imagination is more important than knowledge.

– Albert Einstein

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 15

jacroberts (1/2/2009)
The stored procedure that calls the function just takes the results table from the TVF and inserts this into a reporting table number of new site visits for every 5 minute period in a day, it does this by inserting a:
SELECT @Date, [Hour], [FiveMinute], Count(*)
FROM SiteVisits(@Date)
GROUP BY [Hour], [FiveMinute]
There is no reason why it shouldn't all be done inside one stored procedure without a TVF.

Then, if no one has any objections, there are certain huge peformance benefits to NOT using a TVF in this case... I'm just gonna do it as a sproc if no one has any serious objections.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)