January 2, 2009 at 10:52 am
And, just to be 100% clear... would you give a couple of actual examples of the various cookie formats you're expecting so I can make a good cross section of cookie types, as well? Thanks.
Last but not least, would you explain to folks why you're adding two days to the date/time of each cookie?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 11:33 am
Jeff Moden (1/2/2009)
I agree... JacRoberts, would you provide the CREATE TABLE statement for the dbo.WebLogEvents table, please. From that, I'll make a million row table over a period of 10 years for 50,000 different session id's... unless you have a different "spread" that more closely approximates your data.
The sessionId is hidden in the [cs(Cookie)] Column.
An example row is:
perm_track=9be22f85-0aeb-492e-a1cb-0ef74043bf41;+__utma=131431056.2740713761679478300.1230240763.1230240763.1230240763.1;+__utmz=131431056.1230240763.1.1.utmcsr=google|utmccn=(organic)|utmcmd=organic|utmctr=trains%20to%20leeds;+sess_track=20c12566-eddb-43cf-a9a7-974b027a3834;+ASP.NET_SessionId=iyjuwx45r2rl5455fmdqpp55
And the sessionId for that value is iyjuwx45r2rl5455fmdqpp55 which can be located anywhere in the string hence all the CharIndex stuff in the SQL.
The table has about 10,000,000 rows added per day and about 60,000 site visits per day it stores just 8 days worth of data as all data over 8 days old is deleted each night, there is only 1 non-unique index on the [c-ip] column. There are about 80 million rows on that table all together.
The 'date' column is just the date without the time part. And the 'time' column is just a time in '1899-12-30'
You probably don't need all the columns but here is the actual table definition:
CREATE TABLE [dbo].[WebLogEvents](
[date] [datetime] NULL,
[time] [datetime] NULL,
[s-sitename] [varchar](255) NULL,
[s-ip] [varchar](50) NULL,
[cs-method] [varchar](50) NULL,
[cs-uri-stem] [varchar](255) NULL,
[cs-uri-query] [varchar](2048) NULL,
[s-port] [varchar](50) NULL,
[cs-username] [varchar](50) NULL,
[c-ip] [varchar](50) NULL,
[cs(User-Agent)] [varchar](255) NULL,
[cs(Cookie)] [varchar](2048) NULL,
[cs(Referer)] [varchar](2048) NULL,
[sc-status] [int] NULL,
[sc-substatus] [int] NULL,
[sc-win32-status] [int] NULL,
[sc-bytes] [int] NULL,
[cs-bytes] [int] NULL,
[time-taken] [int] NULL
) ON [PRIMARY]
January 2, 2009 at 11:51 am
Cool... that 1899-12-30 thingy about the time also explains why you add 2 days to the combination of date and time. Thanks, Jac.
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 12:09 pm
TheSQLGuru (1/2/2009)
1) Tom, can you or anyone else give an example of UPDATEs that cannot be done without using the FROM clause that can be done with it's use?
2) I interacted with Hugo on a number of occassions at last years MVP summit and I can tell you from first hand experience he is a very smart (and thorough) guy. I for one will never dismiss out of hand anything that he blogs about.
Clearly it isn't possible to do anything with the clause that is impossible without it if one ignores performance. SQL is Turing-complete. There may be examples where the FROM clause can't be replaced by a string of select subclauses so that the same effect (ignoring legibility and performance) can't be achieved in a single SQL statement, but I doubt it. If you want an example where the cost in legibility and performance is unacceptably high, look in HK's blog where he gives a fine example. The performance issue is that the optimiser can't easily see what's going on - the FROM cause is much easier to understand mechanically. The clarity/legibility issue is a language expressiveness issues - there are other ways of getting clarity. HK suggests using row assignments, which I first came across in discussions in the 1980s about what we (ICL, Bull, Siemens, various Academic institutions) hoped "SQL3" might contain (and we might include in our joint massively parallel SQL implementation experiment - but we left this one out); in my view this is the correct approach in many cases of using "FROM" in update, but no MS implementation - not even SQLS 2008 - supports it.
HK himself gives a nice example where there is a vast performance difference in the very blog entry that deprecates the FROM clause. The blog entry does not say starkly "let's abolish FROM in UPDATE in TSQL", it says something like "Remember that FROM in UPDATE is not part of the SQL Standard and use it only when gains in performance or clarity are such as to offset the resulting non-portability" - something with which I'm wholly in agreement. HK himself says he would use the FROM clause in the example he gives in TSQL releases that don't have the MERGE feature ("For that performance gain, I will gladly choose the proprietary syntax over the standard!" are the words he uses). So while I maintain that anyone who said "let's abolish this feature" would be a nut, and the impression I got from Jeff's post was that Hugo was saying that, I certainly don't think that what he actually does say is the slightest bit nutty.
Tom
Tom
January 2, 2009 at 1:08 pm
Heh... an I'll maintain that what he said is nutty for mor than 1 reason. Consider the title and the first sentence...
[font="Arial Black"]Let's deprecate UPDATE FROM! [/font]
I guess that many people using UPDATE … FROM on a daily basis do so without being aware that they are violating all SQL standards.
Now, if that doesn't sound like he'd like to get rid of it, then, perhaps this other thing he said does...
With this alternative available, I fail to see any reason why the proprietary UPDATE FROM syntax should be maintained. In my opinion, it can safely be marked as deprecated in SQL Server 2008.
... and, maybe even that doesn't do it for you. But this sure as hell does for me...
but it should be marked as deprecated, and it should eventually be removed from the product.
My answer to all of that is that code portability is a myth (and a nutty one, at that). Every RDBMS has it's own extensions to the cruddy and way-behind-the-real-world ANSI standards. In order to have true code portability, every database vendor must meet all of the standards (which, are pretty useless for some things to be done) and everyone must ignore some of the better features the various vendors have offered in the form of extensions to the language. If that's the case, then why even bother having different database vendors?
On the fringe of chaos lives innovation and new technology... the myth of code portability being realized for SQL would destroy both because there would be no need for any form of competition between vendors. We'd all end up using the same crap and, yes, it would all be crap.
That would be like saying there shall not be Java or C# anymore... instead, we'll only have one language called Cava so that everyone can write portable code. BWAAA=HAAA!!!! That's when I'll go back to writing code in Power Basic and blow everyones' doors off... 😛
I'd also like Microsoft to stop deprecating stuff... they're taking away stuff that's not broken and is frequently better than what it's being replaced by. :w00t:
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 1:26 pm
Ok... from JacRobert's description, Create Table statement, and a couple of good guesses like the fact that since all the columns are nullable, there's no primary key, here's a chunk of code that'll build a million row test table.
Now, based on what I saw in JacRobert's code, I've changed the scope of the test code generator just a tiny bit. Instead of a million rows across ten years, it's a million rows across a single month for 50,000 session ID's to simulate a million "hits" on a web site in a month. Here's the code... lemme know if we need any changes so that we can all work from the same thing. As always, details are in the comments...
--=================================================================================================
-- Create and populate a 1,000,000 row WebLogEvents test table.
-- Jeff Moden
--=================================================================================================
--===== First, change to a nice safe place that we all have...
USE TempDb
--===== If the test table already exists here, drop it so we can rebuild it for reruns.
IF OBJECT_ID('TempDB.dbo.WebLogEvents','U') IS NOT NULL
DROP TABLE TempDB.dbo.WebLogEvents
GO
--===== Declare a some local variables to handle all the presets well need...
DECLARE @StartDate DATETIME, --First date in events table
@NumberOfDays INT, --Max number of different days in the events table
@TotalRows INT, --Total number of rows in the events table
@CookieMain VARCHAR(2048), --All the same... will have a session number appended
@Sessions INT --The max number of unique sessions in the events table
--===== ... and then assign them the presets we want.
SELECT @StartDate = '2008-12-01',
@NumberOfDays = 31,
@TotalRows = 1000000,
@CookieMain = 'perm_track=9be22f85-0aeb-492e-a1cb-0ef74043bf41;'
+ '+__utma=131431056.2740713761679478300.1230240763.1230240763.1230240763.1;'
+ '+__utmz=131431056.1230240763.1.1.utmcsr=google|utmccn=(organic)'
+ '|utmcmd=organic|utmctr=trains%20to%20leeds;'
+ '+sess_track=20c12566-eddb-43cf-a9a7-974b027a3834;'
+ 'ASP.NET_SessionId=iyjuwx45r2rl5455fmdqpp55',
@Sessions = 50000
--===== All set... build the table and populate it with random data according to the presets
-- We'll also measure how long this takes, just for grins... see the "Messages" tab
SET STATISTICS TIME ON
SELECT TOP (@TotalRows)
[Date] = @StartDate + ABS(CHECKSUM(NEWID())) % @NumberOfDays,
[Time] = CAST('1899-12-30' AS DATETIME) + RAND(CHECKSUM(NEWID())),
[cs(Cookie)] = CAST(@CookieMain + CAST(ABS(CHECKSUM(NEWID()))%@Sessions+1 AS VARCHAR(10)) AS VARCHAR(2048))
INTO dbo.WebLogEvents
FROM Master.dbo.SysColumns t1
CROSS JOIN Master.dbo.SysColumns t2
SET STATISTICS TIME OFF
GO
--===== And, finally, let's see what some of the data looks like and what
-- the min and max for each column is as a sanity check. This takes some time.
SELECT TOP 100 * FROM dbo.WebLogEvents
SELECT MIN(Date) AS MinDate,
MAX(Date) AS MaxDate,
MIN(Time) AS MinTime,
MAX(Time) AS MaxTime,
MIN([cs(Cookie)]) AS MinCookie,
MAX([cs(Cookie)]) AS MaxCookie
FROM dbo.WebLogEvents
[font="Arial Black"]Hey, Phil Factor[/font]... do you want to do the honors on the "quirky update" method, or do you want me to?
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 1:47 pm
Jeff Moden (1/2/2009)
My answer to all of that is that code portability is a myth (and a nutty one, at that). ...snip... If that's the case, then why even bother having different database vendors?
Funny - that was in the long post I lost. Agree 100%.
Jeff Moden (1/2/2009)
I'd also like Microsoft to stop deprecating stuff... they're taking away stuff that's not broken and is frequently better than what it's being replaced by. :w00t:
I don't agree here. Yes - when they're deprecating stuff that's better than the replacement, then that sucks, and deprecation for deprcation's sake is also lame... but...
I have to say I am a big fan of the breaking change in the right places. Some stuff from .NET 1.1 was just poorly thought out, for example. I am glad that someone at MS thought 'you know what guys, let's not live with this forever - let's deprecate, accept our mistakes and move on'.
Similary - I think the idea of defined rules and defaults, and having them bound to columns / data types sucks in SQL Server. One of the things I am doing is a schema replication program, and working with those types doesn't do it for me. I think CHECK / DEFAULT constraints are a definite step forward in that respect.
And lastly - one of the reason Linux hacks me off is that compatibility with the 1970's seems to be an important factor. I know, it's a contentious point criticising Linux. But really, there are some amazingly smart people working on/with that OS, and I just wonder what they would come up with, if they were given a clean slate and put usability at the core of their architecture... I think it could be brilliant. Similarly, I think singularity has the promise of a great OS - but some marketing dude at MS is going to have a benny when they realise it's not binary compatible, and that will be where the layers of mediocrity creep in.
Semi-thinking out loud here.
wrt the script - top job. I'll get on with the set based method later on - i'm pretty sure it will come in second place, but i want to do it anyway for completeness and an excercise. For now i'm just putting the nippers to bed...
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 2, 2009 at 3:21 pm
Hey, Phil Factor... do you want to do the honors on the "quirky update" method, or do you want me to?
--Jeff Moden
[p]Over to you Jeff. I had to nip off to do DBW. I'm now off to bed! I'll see how you got on tomorrow. Your build script looks very cool![/p]
Best wishes,
Phil Factor
January 2, 2009 at 3:44 pm
Jeff Moden (1/2/2009)
Ok... from JacRobert's description, Create Table statement, and a couple of good guesses like the fact that since all the columns are nullable, there's no primary key, here's a chunk of code that'll build a million row test table.Now, based on what I saw in JacRobert's code, I've changed the scope of the test code generator just a tiny bit. Instead of a million rows across ten years, it's a million rows across a single month for 50,000 session ID's to simulate a million "hits" on a web site in a month. Here's the code... lemme know if we need any changes so that we can all work from the same thing. As always, details are in the comments...
The table has about 10,000,000 rows added per day and about 60,000 site visits per day it stores just 8 days worth of data as all data over 8 days old is deleted each night, there is only 1 non-unique index on the [c-ip] column, this column is not used in the TVF. There are about 80 million rows on that table all together. The average time a user spends on a site is about 20 minutes with about 2 percent spending over 30 minutes. For the test it probably doesn't matter too much as long as there are a large number of rows.
The process to count site visits takes place once a day, for the previous day, so you only need to aggregate the data for 1 day.
I can test the end result on real data if you make the input and output of the table valued function the same as the one I pasted in earlier.
January 2, 2009 at 5:23 pm
Ok - here's one for you then...
Before i start - a couple of things. Given the volume of data you're talking about, i'm not confident my first method would be quicker. Because it's a function i was forced to use table variables, and not temp tables as i'd have preferred for such a large data set.
Also, i'm not convinced the logic is right in your function. I think the datediff(minute, @CurrDateTime, @PrevDateTime) should be datediff(minute, @PrevDateTime, @CurrDateTime). This is because how it is in the code you pasted, if @CurrDateTime is later then @PrevDateTime (which it would be always) then the value returned would be negative in the parameter order specified. This would mean that it would never be > 30, however it may be < -30. So it could well be that you are undercounting there?
Another thing was that I found that, because of the lack of indexing, reading from the table was more efficient in two parts. I altered the cursor declaration (pasted below) and I found that on my machine the existing way took 31.828 and the new way took 4.437. Anyway, here's some code.
Direct functional replacement, best I could get on Jeff's test data
Exec time 7.046
-- *******************************************************************
-- FUNCTION SiteVisits
-- Description: This function creates a table with a row for each site visit
-- for a day.
-- *******************************************************************
CREATE FUNCTION [dbo].[SiteVisitsMattTestData]
(
@Date datetime
)
RETURNS @Table TABLE
(
[Hour] tinyint, -- 0 to 23
[FiveMinute] tinyint -- 0 to 11
)
AS
BEGIN
DECLARE @ASPString nvarchar(100)
DECLARE @Len int
DECLARE @DateMinus1 datetime
DECLARE @ElevenThirty datetime
SET @ElevenThirty = '1899-12-30 23:30:00.000'
SET @ASPString = 'ASP.NET_SessionId='
SET @Len = Len(@ASPString)
SET @Date = Convert(varchar, @Date, 112)
SET @DateMinus1 = DateAdd(dd, -1, @Date)
-- ok - so get out the data for today and yesterday into separate temp tables
-- - the query processor and OR are not best of friends
DECLARE @sessionstoday TABLE (EventDateTime datetime, ASPSessionID varchar(40))
DECLARE @sessionsyesterday TABLE (EventDateTime datetime, ASPSessionID varchar(40))
-- todays
INSERT INTO @sessionstoday (EventDateTime, ASPSessionID)
SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,
CASE
WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN
SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])
+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))
- CharIndex(@ASPString, [cs(Cookie)])- @Len)
ELSE
SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)
END ASPSessionId
FROM dbo.WebLogEvents WITH (NOLOCK)
WHERE Date = @date
AND CharIndex(@ASPString , [cs(Cookie)]) > 0
-- yesterdays
INSERT INTO @sessionsyesterday (EventDateTime, ASPSessionID)
SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,
CASE
WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN
SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])
+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))
- CharIndex(@ASPString, [cs(Cookie)])- @Len)
ELSE
SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)
END ASPSessionId
FROM dbo.WebLogEvents WITH (NOLOCK)
WHERE Date = @DateMinus1 AND [Time] >= @ElevenThirty
AND CharIndex(@ASPString , [cs(Cookie)]) > 0
-- now create a unified table with a clustered PK for both sets of data
DECLARE @sessions TABLE (ID int identity(1,1) primary key clustered, EventDateTime datetime, ASPSessionID varchar(40))
-- and insert in the right order
insert INTO @sessions (EventDateTime, ASPSessionID)
SELECT EventDateTime, ASPSessionID FROM
(
SELECT EventDateTime, ASPSessionID FROM
@sessionstoday
UNION ALL
SELECT EventDateTime, ASPSessionID FROM
@sessionsyesterday
) itbl
ORDER BY ASPSessionID, EventDateTime
-- now insert into the functional result table
INSERT INTO @Table
(
[Hour],
[FiveMinute]
)
-- the hour and five minute marker
select datepart(hour, eventdatetime), datepart(minute, eventdatetime) / 5 FROM
(
-- from a subquery that finds the previous hit by inner joining onto the previous ID
-- and selects out the current hit's ID, evendatetime, the minutes between them, and the session IDs from each
SELECT s1.id, s1.eventdatetime, datediff(minute, s2.eventdatetime, s1.eventdatetime) as timediff,
s1.aspsessionid as sid1, s2.aspsessionid as sid2
FROM @sessions s1 INNER JOIN
@sessions s2
on s1.ID = s2.ID + 1
UNION ALL
-- union the first row in, as the above join will never pick it up
SELECT s1.id, s1.eventdatetime, 999 as timediff, s1.aspsessionid as sid1, '' as sid2
FROM @sessions s1 where id = 1
) itbl
-- where the hit is on the right day
WHERE convert(datetime,convert(varchar, eventdatetime, 112)) = @date
-- and the session id is not the same as the previous
and ((sid1 <> sid2) or
-- or the time differential is > 30 for the same session id
(timediff > 30))
-- order by ID because we want the result sets to be *exactly* the same
ORDER BY id
RETURN
END
Temp table version of the above, outside the scope of a function
Exec time 3.843
DECLARE @Date datetime
SET @Date = '20081220'
DECLARE @Table TABLE
(
[Hour] tinyint, -- 0 to 23
[FiveMinute] tinyint -- 0 to 11
)
BEGIN
DECLARE @ASPString nvarchar(100)
DECLARE @Len int
DECLARE @DateMinus1 datetime
DECLARE @ElevenThirty datetime
SET @ElevenThirty = '1899-12-30 23:30:00.000'
SET @ASPString = 'ASP.NET_SessionId='
SET @Len = Len(@ASPString)
SET @Date = Convert(varchar, @Date, 112)
SET @DateMinus1 = DateAdd(dd, -1, @Date)
CREATE TABLE #sessions(ID int IDENTITY(1,1) primary key clustered, EventDateTime datetime, ASPSessionID varchar(40))
INSERT INTO #sessions (EventDateTime, ASPSessionID)
SELECT EventDateTime, ASPSessionID FROM
(
SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,
CASE
WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN
SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])
+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))
- CharIndex(@ASPString, [cs(Cookie)])- @Len)
ELSE
SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)
END ASPSessionId
FROM dbo.WebLogEvents WITH (NOLOCK)
WHERE Date = @date
AND CharIndex(@ASPString , [cs(Cookie)]) > 0
UNION ALL
SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,
CASE
WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN
SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])
+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))
- CharIndex(@ASPString, [cs(Cookie)])- @Len)
ELSE
SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)
END ASPSessionId
FROM dbo.WebLogEvents WITH (NOLOCK)
WHERE Date = @DateMinus1 AND [Time] >= @ElevenThirty
AND CharIndex(@ASPString , [cs(Cookie)]) > 0
) itbl
ORDER BY ASPSessionID, EventDateTime
INSERT INTO @Table
(
[Hour],
[FiveMinute]
)
select datepart(hour, eventdatetime), datepart(minute, eventdatetime) / 5 FROM
(
SELECT s1.id, s1.eventdatetime, datediff(minute, s2.eventdatetime, s1.eventdatetime) as timediff,
s1.aspsessionid as sid1, s2.aspsessionid as sid2
FROM #sessions s1 INNER JOIN
#sessions s2
on s1.ID = s2.ID + 1
UNION ALL
-- union the first row in, as the above join will never pick it up
SELECT s1.id, s1.eventdatetime, 999 as timediff, s1.aspsessionid as sid1, '' as sid2
FROM #sessions s1 where id = 1
) itbl
WHERE convert(datetime,convert(varchar, eventdatetime, 112)) = @date
and ((sid1 <> sid2) or
(timediff > 30))
ORDER BY id
END
Revised declaration from cursor, reduced exec time of cursor method from 31.828 to 4.437
/****** Object: UserDefinedFunction [dbo].[SiteVisits] Script Date: 01/03/2009 00:06:48 ******/
SET ANSI_NULLS ON
GO
SET QUOTED_IDENTIFIER ON
GO
-- *******************************************************************
-- FUNCTION SiteVisits
-- Description: This function creates a table with a row for each site visit
-- for a day.
-- *******************************************************************
CREATE FUNCTION [dbo].[SiteVisitsNewCursor]
(
@Date datetime
)
RETURNS @Table TABLE
(
[Hour] tinyint, -- 0 to 23
[FiveMinute] tinyint -- 0 to 11
)
AS
BEGIN
DECLARE @ASPString nvarchar(100)
DECLARE @Len int
DECLARE @DateMinus1 datetime
DECLARE @ElevenThirty datetime
SET @ElevenThirty = '1899-12-30 23:30:00.000'
SET @ASPString = 'ASP.NET_SessionId='
SET @Len = Len(@ASPString)
SET @Date = Convert(varchar, @Date, 112)
SET @DateMinus1 = DateAdd(dd, -1, @Date)
DECLARE @PrevDateTime datetime
DECLARE @PrevSessionId nvarchar(50)
SET @PrevDateTime = '1999-01-01' --Initialise
SET @PrevSessionId = 'xxxxxxxx' --Initialise
DECLARE @CurrDateTime datetime
DECLARE @CurrSessionId nvarchar(50)
DECLARE @NewVisit bit
DECLARE SessionCursor CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY FOR -- Fulfilment site Downtime Cursor
SELECT EventDateTime, ASPSessionID FROM
(
SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,
CASE
WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN
SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])
+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))
- CharIndex(@ASPString, [cs(Cookie)])- @Len)
ELSE
SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)
END ASPSessionId
FROM dbo.WebLogEvents WITH (NOLOCK)
WHERE Date = @DateMinus1
AND [Time] >= @ElevenThirty
AND CharIndex(@ASPString , [cs(Cookie)]) > 0
UNION ALL
SELECT DateAdd(dd, 2, Date + [Time]) EventDateTime,
CASE
WHEN CharIndex(';', [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)])) > 1 THEN
SubString([cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)])
+ @Len, CharIndex(';', [cs(Cookie)], CharIndex(@ASPString, [cs(Cookie)]))
- CharIndex(@ASPString, [cs(Cookie)])- @Len)
ELSE
SubString( [cs(Cookie)], CharIndex(@ASPString , [cs(Cookie)]) + @Len, 100)
END ASPSessionId
FROM dbo.WebLogEvents WITH (NOLOCK)
WHERE Date = @Date
AND CharIndex(@ASPString , [cs(Cookie)]) > 0
) itbl
ORDER BY ASPSessionID, EventDateTime
OPEN SessionCursor
FETCH NEXT -- Get the first values into the @Curr variables
FROM SessionCursor
INTO @CurrDateTime,
@CurrSessionId
WHILE (@@Fetch_Status = 0)
BEGIN
SET @NewVisit = 0 --Initialise
IF @CurrSessionId <> @PrevSessionId -- New @CurrSessionId
BEGIN
IF Convert(varchar, @CurrDateTime, 112) = @Date -- It's for the day we are looking at date
BEGIN
SET @NewVisit = 1
END
--END IF
END
ELSE --It is the same session Id so test at least 30 mins since last logged
BEGIN
IF Convert(varchar, @CurrDateTime, 112) = @Date -- It's for the day we are looking at date
BEGIN
IF DateDiff(minute, @CurrDateTime, @PrevDateTime) > 30
BEGIN
SET @NewVisit = 1
END
--END IF
END
--END IF
END
--END IF
IF (@NewVisit = 1) --Insert a row into the table
BEGIN
INSERT INTO @Table
(
[Hour],
[FiveMinute]
)
VALUES
(
Cast(DatePart(hh, @CurrDateTime) as tinyint),
Cast(DatePart(minute, @CurrDateTime)/5 as tinyint)
)
END
--END IF
SET @PrevDateTime = @CurrDateTime
SET @PrevSessionId = @CurrSessionId
FETCH NEXT -- Get the first values into the @prev variables
FROM SessionCursor
INTO @CurrDateTime,
@CurrSessionId
END
--END WHILE
CLOSE SessionCursor
DEALLOCATE SessionCursor
RETURN
END
GO
Atlantis Interactive - SQL Server Tools
My blog[/url]
Why I wrote a sql query analyzer clone
January 2, 2009 at 6:04 pm
Matt Whitfield (1/2/2009)
Before i start - a couple of things. Given the volume of data you're talking about, i'm not confident my first method would be quicker. Because it's a function i was forced to use table variables, and not temp tables as i'd have preferred for such a large data set.
I am not able to test the performance of the different methods until Monday.
The stored procedure that calls the function just takes the results table from the TVF and inserts this into a reporting table number of new site visits for every 5 minute period in a day, it does this by inserting a:
SELECT @Date, [Hour], [FiveMinute], Count(*)
FROM SiteVisits(@Date)
GROUP BY [Hour], [FiveMinute]
There is no reason why it shouldn't all be done inside one stored procedure without a TVF.
Also, i'm not convinced the logic is right in your function. I think the datediff(minute, @CurrDateTime, @PrevDateTime) should be datediff(minute, @PrevDateTime, @CurrDateTime). This is because how it is in the code you pasted, if @CurrDateTime is later then @PrevDateTime (which it would be always) then the value returned would be negative in the parameter order specified. This would mean that it would never be > 30, however it may be < -30. So it could well be that you are undercounting there?
Thanks, looks like you've spotted a bug!
Another thing was that I found that, because of the lack of indexing, reading from the table was more efficient in two parts. I altered the cursor declaration (pasted below) and I found that on my machine the existing way took 31.828 and the new way took 4.437. Anyway, here's some code.
That's interesting. I can't see why it should take longer as your method would require 2 full table scans as opposed to just 1 table scan if an OR were used in the WHERE clause? Maybe it's calculating the AND CharIndex(@ASPString , [cs(Cookie)]) > 0
for the whole table first rather than evaluating:AND Date = @Date
OR (Date = @DateMinus1
AND [Time] >= @ElevenThirty)and then afterwards evaluating: AND CharIndex(@ASPString , [cs(Cookie)]) > 0
January 2, 2009 at 7:01 pm
Matt Whitfield (1/2/2009)
Temp table version of the above, outside the scope of a functionExec time 3.843
Oh, now I know I need to upgrade my poor ol' 6 year old work horse... that little slice of computational heaven took over 28 seconds on my machine with no index on the test table.
Still if I can get close to your time on my box, it should absolutely wail on yours... I vote you be the final tester for this soiree. 😛
--Jeff Moden
Change is inevitable... Change for the better is not.
January 2, 2009 at 7:52 pm
Flipping between watching Alabama get stomped, playing with 3yo daughter and surfing so not much detailed review. But is it possible that some good ol' fashioned indexing will help out here??
Look forward to the continuation of this thread!
Hey, good job to whomever found the bug in the logic too!
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
January 2, 2009 at 10:06 pm
You seem pretty sure. Have you tested it?
Tested / Worked with = Yes
– Albert Einstein
January 3, 2009 at 9:18 am
jacroberts (1/2/2009)
The stored procedure that calls the function just takes the results table from the TVF and inserts this into a reporting table number of new site visits for every 5 minute period in a day, it does this by inserting a:
SELECT @Date, [Hour], [FiveMinute], Count(*)
FROM SiteVisits(@Date)
GROUP BY [Hour], [FiveMinute]
There is no reason why it shouldn't all be done inside one stored procedure without a TVF.
Then, if no one has any objections, there are certain huge peformance benefits to NOT using a TVF in this case... I'm just gonna do it as a sproc if no one has any serious objections.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 15 posts - 106 through 120 (of 272 total)
You must be logged in to reply to this topic. Login to reply