July 13, 2009 at 9:09 am
Yes, sorry, done really quickly!
Mel HarbourProject ManagerRed Gate Software
July 13, 2009 at 9:13 am
As a first try at indexes, and without seeing any of the other queries against this table..
Add an index:
PointsCategory, UserID INCLUDE Date, PointsScored
I know there's a similar existing index, but I suspect this will be more beneficial with the equality match as the leading column. If you trust that there are no other queries against this table, then the existing index ssc_UserID can have PointsCategory removed and the date move to the include. If there may be other queries, leave that index as is.
Having one with the leading column as PointsCategory and a second with leading column of UserID means there's an optimal index for each of the split-up queries.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 13, 2009 at 9:28 am
Ok, I've run the query, simply using a temporary table rather than a table variable. I haven't put any indexes on the temporary table. The execution plans are attached.
Mel HarbourProject ManagerRed Gate Software
July 13, 2009 at 9:35 am
That looks a little better already.
Try adding an index on the temp table. Put this on before you insert the data into the temp table.
CREATE CLUSTERED INDEX idx_TopScores_RowNumberUserID ON #TopScores (RowNumber, UserID)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 13, 2009 at 9:36 am
While I can try splitting the query and look at the execution plans, making the code changes will require that that part of it passes through our test and release cycles. That doesn't take that long, but it's obviously not as quick as just being able to try some things out!
Mel HarbourProject ManagerRed Gate Software
July 13, 2009 at 9:40 am
And here's the plans for the temporary table with the index added.
Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.
Mel HarbourProject ManagerRed Gate Software
July 13, 2009 at 9:45 am
Mel Harbour (7/13/2009)
Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.
Costs are based on the row count estimate. When dealing with a table variable, the optimiser always estimates one row, hence the costs are completely untrustworthy, especially when dealing with 100000 rows in the table variable.
When I said 'looks better' I was basing that one what I know of query operatory. When joining two resultsets both with thousands of rows, a nested loop join is terrible, because it will read one table (in your case the users table) once for each row of the other resultset. In this case, over 100 000 times. When the join type changed to hash join, after converting to a temp table. each table is read only once. You'll see the improvement in the query time and in the IO statistics.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 13, 2009 at 9:46 am
As far as a revamped solution goes, here's what I've come up with so far.
create table dbo.Users (
UserID int primary key,
DisplayName varchar(100));
go
insert into dbo.Users (UserID)
select Number
from dbo.Numbers
where Number <= 1000;
go
CREATE TABLE [dbo].[UserPoints](
[UserPointsRecordID] [int] IDENTITY(1,1) primary key,
[UserID] [int] NOT NULL references dbo.Users(UserID),
[Date] [smalldatetime] NOT NULL,
[PointsScored] [int] NOT NULL,
[PointsCategory] [int] NOT NULL);
go
;with
PCats (PCat) as
(select 1
union all select 2),
Dates (Date) as
(select dateadd(day, Number, '1/1/2006')
from dbo.Numbers
where Number <= datediff(day, '1/1/2006', getdate()))
insert into dbo.UserPoints (UserID, Date, PointsScored, PointsCategory)
select UserID, Date, abs(checksum(newid()))%10, PCat
from dbo.Users
cross join Dates
cross join PCats;
go
CREATE TABLE [dbo].[UserPointsArchiveOld](
[UserPointsRecordID] [int] IDENTITY(1,1) primary key,
[UserID] [int] NOT NULL references dbo.Users(UserID),
[PointsScored] [int] NOT NULL,
[PointsCategory] [int] NOT NULL);
go
CREATE TABLE [dbo].[UserPointsArchive30](
[UserPointsRecordID] [int] IDENTITY(1,1) primary key,
[UserID] [int] NOT NULL references dbo.Users(UserID),
[Date] [smalldatetime] NOT NULL,
[PointsScored] [int] NOT NULL,
[PointsCategory] [int] NOT NULL);
go
CREATE TABLE [dbo].[UserPointsToday](
[UserPointsRecordID] [int] IDENTITY(1,1) primary key,
[UserID] [int] NOT NULL references dbo.Users(UserID),
[Date] [smalldatetime] NOT NULL,
[PointsScored] [int] NOT NULL,
[PointsCategory] [int] NOT NULL);
go
insert into dbo.UserPointsArchiveOld (UserID, PointsScored, PointsCategory)
select UserID, sum(PointsScored), PointsCategory
from dbo.UserPoints
where Date = getdate()-30
and Date = dateadd(day, datediff(day, 0, getdate()), 0);
go
create view dbo.UserPointsCurrent
as
select UserID, sum(PointsScored) as PointsScored, PointsCategory
from dbo.UserPointsToday
group by UserID, PointsCategory;
go
create view dbo.UserPointsLast30
as
select UserID, sum(PointsScored) as PointsScored, PointsCategory
from
(select UserID, PointsScored, PointsCategory
from dbo.UserPointsCurrent
union all
select UserID, sum(PointsScored), PointsCategory
from dbo.UserPointsArchive30
group by UserID, PointsCategory) Last30
group by UserID, PointsCategory;
go
create view dbo.UserPointsTotal
as
select UserID, sum(PointsScored) as PointsScored, PointsCategory
from
(select UserID, PointsScored, PointsCategory
from dbo.UserPointsLast30
union all
select UserID, PointsScored, PointsCategory
from dbo.UserPointsArchiveOld) AllPoints
group by UserID, PointsCategory;
go
create index IDX_UserPointsArchive30_Main on dbo.UserPointsArchive30
(UserID, PointsCategory, Date)
include
(PointsScored);
go
create index IDX_UserPointsToday_Main on dbo.UserPointsToday
(UserID, PointsCategory, Date)
include
(PointsScored);
go
declare @user-id int;
select @user-id = 1;
;with
CTE1 as
(select top 5 UserID, sum(PointsScored) as Points
from dbo.UserPointsLast30
group by UserID
order by sum(PointsScored) desc),
CTE2 as
(select @user-id as UserID, sum(PointsScored) as Points
from dbo.UserPointsLast30
where UserID = @user-id)
select *
from CTE1
union
select *
from CTE2
order by Points desc;
The last bit is a sample query that would get you most of what you need for the home page. The other queries would be similar.
You'd archive the data from the Archive30 table into the ArchiveOld table with a simple query run each night. Same for archiving from the Today table into Archive30.
I'd run the Today table as an inventory-style solution, where you insert points transactions instead of updating. I.e., if someone earns 5 points for answering a QotD entry, you'd insert a row for that, as opposed to updating their entry. That way, you don't have to run an upsert, you just run an insert, which will be faster and simpler.
I tested this structure with some basic queries, like the top 25 all-time scores, the top 25 all-time forum scores, the top 5 total scores for the last 30 days. Major improvement in query time over the current solution.
For example, getting the top 5 + current user (or just top 5 if current user is in there), using your query, took 822 milliseconds on my desktop machine, but took 28 milliseconds using the final query in my code above. All of that on the sample data I produced as per the scripts I posted.
That might be a bit more extensive than what you're looking for right now, but do take a look at it and see if it's a valid long-term solution.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 13, 2009 at 9:48 am
Ok, index didn't help much, and it's because of the OR.
Can you try changing that index and adding a second
CREATE CLUSTERED INDEX idx_TopScores_RowNumber ON #TopScores (RowNumber)
CREATE INDEX idx_TopScores_UserID ON #TopScores (UserID)
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 13, 2009 at 9:50 am
Mel Harbour (7/13/2009)
And here's the plans for the temporary table with the index added.Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.
Table variables are always estimated as if they were going to only have 1 row in them. That will give you a very false low on the cost, and can end up with all kinds of other problems in the execution plan.
They're generally okay if you can be certain they will not ever exceed about 100 rows. That's a rule-of-thumb and kind of arbitrary, but it generally works.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 13, 2009 at 9:52 am
On the idea of multiple queries to handle the OR problems, you don't actually have to modify the front end. Have the main proc call sub-procs based on the parameter values. You can optimize the sub-procs, and eliminate scans caused by the IsNull and OR issues.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 13, 2009 at 9:55 am
More plans, with the different indexes suggested by Gail.
@GSquared, yes, indeed we can!
Mel HarbourProject ManagerRed Gate Software
July 13, 2009 at 9:56 am
Once we've got a version of the SP that there's general agreement for, I'm happy to put it live so that we can pick up some real data on how it performs over time.
Mel HarbourProject ManagerRed Gate Software
July 13, 2009 at 10:06 am
Ok, that OR really doesn't like me. Can you change the query as follows?
SELECT t.RowNumber, u.UserID, u.DisplayName, t.RecentPoints, t.AllPoints
FROM #TopScores t
JOIN Users u ON u.[UserID] = t.UserID
WHERE t.RowNumber <= 5 OR t.UserID = 769521
ORDER BY t.RowNumber
Just changed what table the user filter is done on. It shouldn't make a difference, however with the OR it might.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
July 13, 2009 at 10:28 am
Well, that got a reaction out of it!
Mel HarbourProject ManagerRed Gate Software
Viewing 15 posts - 16 through 30 (of 103 total)
You must be logged in to reply to this topic. Login to reply