October 14, 2006 at 12:27 am
Mark,
I don't know if I can get it to be a fast as Serqiy says he can, but I did get it down to 8 seconds...
First, add this index...
CREATE INDEX IX_JBMTestCitrixLog_ServerName_TimeGenerated
ON dbo.JBMTestCitrixLog(ServerName,TimeGenerated)
Then run this modification of the same report I provided before...
SELECT SessionDate = CASE
WHEN GROUPING(d.TheDate) = 0
THEN CONVERT(CHAR(11),d.TheDate,100)
ELSE CHAR(254)+'Grand Total'
END,
ServerName = CASE
WHEN GROUPING(d.ServerName) =0
AND GROUPING(d.TheDate)=0
THEN d.ServerName
WHEN GROUPING(d.TheDate)=1
THEN ''
ELSE CHAR(222)+'Daily Total'
END,
'12:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 00 THEN d.TheCount ELSE 0 END),8),
'01:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 01 THEN d.TheCount ELSE 0 END),8),
'02:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 02 THEN d.TheCount ELSE 0 END),8),
'03:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 03 THEN d.TheCount ELSE 0 END),8),
'04:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 04 THEN d.TheCount ELSE 0 END),8),
'05:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 05 THEN d.TheCount ELSE 0 END),8),
'06:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 06 THEN d.TheCount ELSE 0 END),8),
'07:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 07 THEN d.TheCount ELSE 0 END),8),
'08:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 08 THEN d.TheCount ELSE 0 END),8),
'09:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 09 THEN d.TheCount ELSE 0 END),8),
'10:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 10 THEN d.TheCount ELSE 0 END),8),
'11:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 11 THEN d.TheCount ELSE 0 END),8),
'12:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 12 THEN d.TheCount ELSE 0 END),8),
'01:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 13 THEN d.TheCount ELSE 0 END),8),
'02:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 14 THEN d.TheCount ELSE 0 END),8),
'03:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 15 THEN d.TheCount ELSE 0 END),8),
'04:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 16 THEN d.TheCount ELSE 0 END),8),
'05:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 17 THEN d.TheCount ELSE 0 END),8),
'06:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 18 THEN d.TheCount ELSE 0 END),8),
'07:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 19 THEN d.TheCount ELSE 0 END),8),
'08:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 20 THEN d.TheCount ELSE 0 END),8),
'09:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 21 THEN d.TheCount ELSE 0 END),8),
'10:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 22 THEN d.TheCount ELSE 0 END),8),
'11:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 23 THEN d.TheCount ELSE 0 END),8),
Total = STR(COUNT(*),8),
[Avg/Hr] = CASE
WHEN GROUPING(d.TheDate) = 0
THEN STR(COUNT(*)/24,8)
ELSE 'N/A'
END
FROM (--Derived table "d" isolates and preformats the data we want
SELECT DATEADD(dd,DATEDIFF(dd,0,TimeGenerated),0) AS TheDate,
DATEPART(hh,TimeGenerated) AS HourSlot,
ServerName,
COUNT(*) AS TheCount
FROM dbo.JBMTestCitrixLog WITH (NOLOCK)
GROUP BY DATEADD(dd,DATEDIFF(dd,0,TimeGenerated),0),
DATEPART(hh,TimeGenerated),
ServerName
) d
GROUP BY d.TheDate, d.ServerName WITH ROLLUP
ORDER BY SessionDate, ServerName
--Jeff Moden
Change is inevitable... Change for the better is not.
October 14, 2006 at 1:34 am
Thanks for the benchmark, Serqiy... the computed columns help a lot especially when you make them both deterministic so they can be indexed... got it down to 7 seconds on 4 million records... I still wouldn't mind seeing what you can do with this. Even an old dog like me can learn new tricks
I added the following computed columns (Serqiy's suggestion) (couldn't use datediff/dateadd because became non-deterministic)
TheDate AS CONVERT(CHAR(8),TimeGenerated,112),
HourSlot AS DATEPART(hh,TimeGenerated)
I dropped the previous index and added this one, instead...
CREATE INDEX IX_JBMTestCitrixLog1
ON dbo.JBMTestCitrixLog(ServerName,TheDate,HourSlot)
... and the modified the report code just a bit...
SELECT SessionDate = CASE
WHEN GROUPING(d.TheDate) = 0
THEN CONVERT(CHAR(11),CAST(d.TheDate AS DATETIME),100)
ELSE CHAR(254)+'Grand Total'
END,
ServerName = CASE
WHEN GROUPING(d.ServerName) =0
AND GROUPING(d.TheDate)=0
THEN d.ServerName
WHEN GROUPING(d.TheDate)=1
THEN ''
ELSE CHAR(222)+'Daily Total'
END,
'12:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 00 THEN d.TheCount ELSE 0 END),8),
'01:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 01 THEN d.TheCount ELSE 0 END),8),
'02:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 02 THEN d.TheCount ELSE 0 END),8),
'03:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 03 THEN d.TheCount ELSE 0 END),8),
'04:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 04 THEN d.TheCount ELSE 0 END),8),
'05:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 05 THEN d.TheCount ELSE 0 END),8),
'06:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 06 THEN d.TheCount ELSE 0 END),8),
'07:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 07 THEN d.TheCount ELSE 0 END),8),
'08:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 08 THEN d.TheCount ELSE 0 END),8),
'09:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 09 THEN d.TheCount ELSE 0 END),8),
'10:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 10 THEN d.TheCount ELSE 0 END),8),
'11:00 AM' = STR(SUM(CASE WHEN d.HourSlot = 11 THEN d.TheCount ELSE 0 END),8),
'12:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 12 THEN d.TheCount ELSE 0 END),8),
'01:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 13 THEN d.TheCount ELSE 0 END),8),
'02:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 14 THEN d.TheCount ELSE 0 END),8),
'03:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 15 THEN d.TheCount ELSE 0 END),8),
'04:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 16 THEN d.TheCount ELSE 0 END),8),
'05:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 17 THEN d.TheCount ELSE 0 END),8),
'06:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 18 THEN d.TheCount ELSE 0 END),8),
'07:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 19 THEN d.TheCount ELSE 0 END),8),
'08:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 20 THEN d.TheCount ELSE 0 END),8),
'09:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 21 THEN d.TheCount ELSE 0 END),8),
'10:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 22 THEN d.TheCount ELSE 0 END),8),
'11:00 PM' = STR(SUM(CASE WHEN d.HourSlot = 23 THEN d.TheCount ELSE 0 END),8),
Total = STR(SUM(d.TheCount),8),
[Avg/Hr] = CASE
WHEN GROUPING(d.TheDate) = 0
THEN STR(SUM(d.TheCount)/24,8)
ELSE 'N/A'
END
FROM (--Derived table "d" isolates and preformats the data we want
SELECT TheDate,
HourSlot,
ServerName,
COUNT(*) AS TheCount
FROM dbo.JBMTestCitrixLog WITH (NOLOCK)
GROUP BY TheDate,
HourSlot,
ServerName
) d
GROUP BY d.TheDate, d.ServerName WITH ROLLUP
ORDER BY d.TheDate, d.ServerName
--Jeff Moden
Change is inevitable... Change for the better is not.
October 14, 2006 at 3:19 pm
Well, I'd just like to say,even before I get to put these suggestions into action, that you folks are amazing, I have used dialup bulletin boards, which gives you an idea of how long I have been around, but I really cannot recall anyone putting so much much effort and knowledge into helping someone, and that someone is me....even better!
Profoundly inspirational, and the next time someone asks a question that I can help with, I certainly won't stint with my efforts.
Jeff / Serqiy /Journeyman
Thank you
October 14, 2006 at 4:30 pm
Thanks for the feedback, Mark... this one was a lot of fun!
Serqiy... I'd still like to know the "magic" behind your 5 second report, if you get a chance.
--Jeff Moden
Change is inevitable... Change for the better is not.
October 14, 2006 at 9:29 pm
Mark, you're welcome.
Hope, you don't take all suggestions above as solutions, you take it just like directions.
Because otherwise you are in trouble.
Actually, none of them is absolutely right.
Yes Jeff, you are right, dateadd(dd, datediff(dd, 0, TimeGenerated)) is not deterministic.
But you were wrong saying "couldn't use datediff/dateadd because became non-deterministic". Check BOL: datediff and dateadd ARE DETERMINISTIC!
It's implicit conversion of zero to datetime makes it non-deterministic!
Include explicit CONVERT into the formula and nothing will stop you from creating index on that column!
This works on my instance of SQL Server:
CREATE TABLE dbo.[citrixlog] (
[TimeGenerated] [datetime] NULL ,
GeneratedOnHour AS (dateadd(hour,datediff(hour,convert(datetime,0,120),[TimeGenerated]),convert(datetime,0,120))) ,
[SessionID] [int] NULL ,
[ClientName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,
[AppName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL ,
[UserName] [varchar] (255) COLLATE Finnish_Swedish_CI_AS NULL
)
create index citrixlog_GeneratedOnHour ON dbo.[citrixlog](GeneratedOnHour)
What about yours?
Note, formula is a little bit shorter than before, am I that smart now or I was just that stupid then?
And about 5 sec. There is no magic.
Trick is nobody will ask for WHOLE report for all accounts for last 3 years. And if somebody will ask there is web page pagination to my service.
So, the task is to filter only those records which will affect figures on screen. Typically it's a continuous range of values to be retrieved.
Problem with non-indexed approach is that you need to create whole report at once, no matter if you suppose to display only small part of it related to the last week.
Effective WHERE clause is the key. Actually I use many "join" tables containing only IDs. Like:
AccountID,
InvoiceID.
Many tables with small number of columns allow me to create many clustered indexes and optimiser then can choose the most effective way to proceed for any particular type of range requested for any particular report.
As I use to say, to writr an effective SELECT you need to starts with CREATE/ALTER TABLE.
_____________
Code for TallyGenerator
October 15, 2006 at 9:03 am
Actually, none of them is absolutely right. |
If you have one that is, would you mind sharing it for this example?
And about 5 sec. There is no magic. Trick is nobody will ask for WHOLE report for all accounts for last 3 years |
Well, ya think? You didn't say that when you bragged about the 5 second report... you only said that you created an executive summary from 4 million records in 5 seconds... that implied that you were processing and reporting on ALL 4 million records. That's what my example did I also regenned the test table to hold 24 million records across 365 days and added a where clause to return a month... does so in, you guessed it, under 5 seconds. Like I've said... thank you for the tip on calculated columns... it works great.
My only concern is when someone has purchased software from some dummy 3rd party that uses bound controls that break when the underlying schema of the table has been added to. I'll have to do a speed test with a view in that case. Might even be one of those spots where an indexed view makes sense.
It's implicit conversion of zero to datetime makes it non-deterministic! |
Outstanding!!! Thank you for the tip!!! I've always had problems with this and never even considered that I am glad to see you made that same mistake that I did when you first posted the formula for the calculated column... let's me know that I'm not the only one that makes "mistrakes".
Note, formula is a little bit shorter than before, am I that smart now or I was just that stupid then? |
Actually, I split the formula into two calculated columns so I wouldn't have to parse the column for simple sorting purposes at all... Worked great... again, thanks for the tip.
TheDate AS CONVERT(CHAR(8),TimeGenerated,112),
HourSlot AS DATEPART(hh,TimeGenerated)
Problem with non-indexed approach is that you need to create whole report at once, no matter if you suppose to display only small part of it related to the last week. |
That's why my latest 2 examples have an index on a new set of calculated columns... that was a good idea you had So good that I took it one step further and included an index on a calculated column that included the date with no time. Made things nasty fast and I've got you to thank for that suggestion.
Hope, you don't take all suggestions above as solutions, you take it just like directions. Because otherwise you are in trouble. |
I really hope you don't think that's true. If so, please post YOUR solution to the problem.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 6 posts - 16 through 20 (of 20 total)
You must be logged in to reply to this topic. Login to reply