Help us tune a query

Question

Post reply

Help us tune a query

Viewing 15 posts - 16 through 30 (of 103 total)

You must be logged in to reply to this topic. Login to reply

Mel Harbour SSC Eights! Points: 857 More actions · Answer 1

Yes, sorry, done really quickly!

Mel HarbourProject ManagerRed Gate Software

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 2

As a first try at indexes, and without seeing any of the other queries against this table..

Add an index:

PointsCategory, UserID INCLUDE Date, PointsScored

I know there's a similar existing index, but I suspect this will be more beneficial with the equality match as the leading column. If you trust that there are no other queries against this table, then the existing index ssc_UserID can have PointsCategory removed and the date move to the include. If there may be other queries, leave that index as is.

Having one with the leading column as PointsCategory and a second with leading column of UserID means there's an optimal index for each of the split-up queries.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Mel Harbour SSC Eights! Points: 857 More actions · Answer 3

Ok, I've run the query, simply using a temporary table rather than a table variable. I haven't put any indexes on the temporary table. The execution plans are attached.

Mel HarbourProject ManagerRed Gate Software

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 4

That looks a little better already.

Try adding an index on the temp table. Put this on before you insert the data into the temp table.

CREATE CLUSTERED INDEX idx_TopScores_RowNumberUserID ON #TopScores (RowNumber, UserID)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Mel Harbour SSC Eights! Points: 857 More actions · Answer 5

While I can try splitting the query and look at the execution plans, making the code changes will require that that part of it passes through our test and release cycles. That doesn't take that long, but it's obviously not as quick as just being able to try some things out!

Mel HarbourProject ManagerRed Gate Software

Mel Harbour SSC Eights! Points: 857 More actions · Answer 6

And here's the plans for the temporary table with the index added.

Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.

Mel HarbourProject ManagerRed Gate Software

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 7

Mel Harbour (7/13/2009)
Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.

Costs are based on the row count estimate. When dealing with a table variable, the optimiser always estimates one row, hence the costs are completely untrustworthy, especially when dealing with 100000 rows in the table variable.

When I said 'looks better' I was basing that one what I know of query operatory. When joining two resultsets both with thousands of rows, a nested loop join is terrible, because it will read one table (in your case the users table) once for each row of the other resultset. In this case, over 100 000 times. When the join type changed to hash join, after converting to a temp table. each table is read only once. You'll see the improvement in the query time and in the IO statistics.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

GSquared SSC Guru Points: 260824 More actions · Answer 8

As far as a revamped solution goes, here's what I've come up with so far.

create table dbo.Users (

UserID int primary key,

DisplayName varchar(100));

go

insert into dbo.Users (UserID)

select Number

from dbo.Numbers

where Number <= 1000;

go

CREATE TABLE [dbo].[UserPoints](

[UserPointsRecordID] [int] IDENTITY(1,1) primary key,

[UserID] [int] NOT NULL references dbo.Users(UserID),

[Date] [smalldatetime] NOT NULL,

[PointsScored] [int] NOT NULL,

[PointsCategory] [int] NOT NULL);

go

;with

PCats (PCat) as

(select 1

union all select 2),

Dates (Date) as

(select dateadd(day, Number, '1/1/2006')

from dbo.Numbers

where Number <= datediff(day, '1/1/2006', getdate()))

insert into dbo.UserPoints (UserID, Date, PointsScored, PointsCategory)

select UserID, Date, abs(checksum(newid()))%10, PCat

from dbo.Users

cross join Dates

cross join PCats;

go

CREATE TABLE [dbo].[UserPointsArchiveOld](

[UserPointsRecordID] [int] IDENTITY(1,1) primary key,

[UserID] [int] NOT NULL references dbo.Users(UserID),

[PointsScored] [int] NOT NULL,

[PointsCategory] [int] NOT NULL);

go

CREATE TABLE [dbo].[UserPointsArchive30](

[UserPointsRecordID] [int] IDENTITY(1,1) primary key,

[UserID] [int] NOT NULL references dbo.Users(UserID),

[Date] [smalldatetime] NOT NULL,

[PointsScored] [int] NOT NULL,

[PointsCategory] [int] NOT NULL);

go

CREATE TABLE [dbo].[UserPointsToday](

[UserPointsRecordID] [int] IDENTITY(1,1) primary key,

[UserID] [int] NOT NULL references dbo.Users(UserID),

[Date] [smalldatetime] NOT NULL,

[PointsScored] [int] NOT NULL,

[PointsCategory] [int] NOT NULL);

go

insert into dbo.UserPointsArchiveOld (UserID, PointsScored, PointsCategory)

select UserID, sum(PointsScored), PointsCategory

from dbo.UserPoints

where Date = getdate()-30

and Date = dateadd(day, datediff(day, 0, getdate()), 0);

go

create view dbo.UserPointsCurrent

as

select UserID, sum(PointsScored) as PointsScored, PointsCategory

from dbo.UserPointsToday

group by UserID, PointsCategory;

go

create view dbo.UserPointsLast30

as

select UserID, sum(PointsScored) as PointsScored, PointsCategory

from

(select UserID, PointsScored, PointsCategory

from dbo.UserPointsCurrent

union all

select UserID, sum(PointsScored), PointsCategory

from dbo.UserPointsArchive30

group by UserID, PointsCategory) Last30

group by UserID, PointsCategory;

go

create view dbo.UserPointsTotal

as

select UserID, sum(PointsScored) as PointsScored, PointsCategory

from

(select UserID, PointsScored, PointsCategory

from dbo.UserPointsLast30

union all

select UserID, PointsScored, PointsCategory

from dbo.UserPointsArchiveOld) AllPoints

group by UserID, PointsCategory;

go

create index IDX_UserPointsArchive30_Main on dbo.UserPointsArchive30

(UserID, PointsCategory, Date)

include

(PointsScored);

go

create index IDX_UserPointsToday_Main on dbo.UserPointsToday

(UserID, PointsCategory, Date)

include

(PointsScored);

go

declare @user-id int;

select @user-id = 1;

;with

CTE1 as

(select top 5 UserID, sum(PointsScored) as Points

from dbo.UserPointsLast30

group by UserID

order by sum(PointsScored) desc),

CTE2 as

(select @user-id as UserID, sum(PointsScored) as Points

from dbo.UserPointsLast30

where UserID = @user-id)

select *

from CTE1

union

select *

from CTE2

order by Points desc;

The last bit is a sample query that would get you most of what you need for the home page. The other queries would be similar.

You'd archive the data from the Archive30 table into the ArchiveOld table with a simple query run each night. Same for archiving from the Today table into Archive30.

I'd run the Today table as an inventory-style solution, where you insert points transactions instead of updating. I.e., if someone earns 5 points for answering a QotD entry, you'd insert a row for that, as opposed to updating their entry. That way, you don't have to run an upsert, you just run an insert, which will be faster and simpler.

I tested this structure with some basic queries, like the top 25 all-time scores, the top 25 all-time forum scores, the top 5 total scores for the last 30 days. Major improvement in query time over the current solution.

For example, getting the top 5 + current user (or just top 5 if current user is in there), using your query, took 822 milliseconds on my desktop machine, but took 28 milliseconds using the final query in my code above. All of that on the sample data I produced as per the scripts I posted.

That might be a bit more extensive than what you're looking for right now, but do take a look at it and see if it's a valid long-term solution.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 9

Ok, index didn't help much, and it's because of the OR.

Can you try changing that index and adding a second

CREATE CLUSTERED INDEX idx_TopScores_RowNumber ON #TopScores (RowNumber)

CREATE INDEX idx_TopScores_UserID ON #TopScores (UserID)

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

GSquared SSC Guru Points: 260824 More actions · Answer 10

Mel Harbour (7/13/2009)
And here's the plans for the temporary table with the index added.
Out of interest, Gail, when you say that it looks better already, what part are you referring to? When I'm looking at the subtree cost of the two queries, the select looks to be massively more costly than it was when it was running with the table variable.

Table variables are always estimated as if they were going to only have 1 row in them. That will give you a very false low on the cost, and can end up with all kinds of other problems in the execution plan.

They're generally okay if you can be certain they will not ever exceed about 100 rows. That's a rule-of-thumb and kind of arbitrary, but it generally works.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

GSquared SSC Guru Points: 260824 More actions · Answer 11

On the idea of multiple queries to handle the OR problems, you don't actually have to modify the front end. Have the main proc call sub-procs based on the parameter values. You can optimize the sub-procs, and eliminate scans caused by the IsNull and OR issues.

- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread

"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Mel Harbour SSC Eights! Points: 857 More actions · Answer 12

More plans, with the different indexes suggested by Gail.

@GSquared, yes, indeed we can!

Mel HarbourProject ManagerRed Gate Software

Mel Harbour SSC Eights! Points: 857 More actions · Answer 13

Once we've got a version of the SP that there's general agreement for, I'm happy to put it live so that we can pick up some real data on how it performs over time.

Mel HarbourProject ManagerRed Gate Software

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 14

Ok, that OR really doesn't like me. Can you change the query as follows?

SELECT t.RowNumber, u.UserID, u.DisplayName, t.RecentPoints, t.AllPoints

FROM #TopScores t

JOIN Users u ON u.[UserID] = t.UserID

WHERE t.RowNumber <= 5 OR t.UserID = 769521

ORDER BY t.RowNumber

Just changed what table the user filter is done on. It shouldn't make a difference, however with the OR it might.

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

Mel Harbour SSC Eights! Points: 857 More actions · Answer 15

Well, that got a reaction out of it!

Mel HarbourProject ManagerRed Gate Software