June 18, 2003 at 3:07 pm
I have a table, tShortTerm, that I need to store a crunched down version of for the long term, tLongTerm.
So the query that crunches it down looks like this:
insert into tLongTerm
select userId, max(date) from tShortTerm
group by userId
On tShortTerm I have a unique clustered index on userId and date desc but looking at the execution plan, I can see that it always does a scan (not a seek) on the clustered index.
I thought that if the unique clustered index was on userId and date desc that maybe query optimizer would be smart enough not to search further once it found the first row with a new userId since that row would be the row with the max(date) for that userID??
How can I make a query with a group by do a seek?
Thanks,
-J
June 18, 2003 at 7:24 pm
In most cases, Index selection is based on SARGs
which is determined by your [WHERE] clause.
The rule is that for an index to be considered by the optimizer, at least
the first column of the index must match the SARG
In your case you can write it the query
select userId, max(date)
from tShortTerm
Where userid IS NOT NULL -- SARG
group by userId
MW
MW
June 19, 2003 at 3:25 am
The covering index (all fields required are in the index) will be helping performance compared to a table scan, especially if the table is wide. This is because the query does not need to referenec the data pages individually, but ony the (more compact)information in the index, which is also sorted ready for the aggregating operation.
But you can't expect an index seek, as the query requires that every date value be inspected for every userid.
You could use a trigger to maintain your summary table, but this is very likely to impact the performance of updates.
This query doesn't look like a killer. It also looks like the kind of thing you might schedule to run at a quiet time (if there is one).
Tim
Tim Wilkinson
"If it doesn't work in practice, you're using the wrong theory"
- Immanuel Kant
June 19, 2003 at 7:30 am
quote:
In most cases, Index selection is based on SARGswhich is determined by your [WHERE] clause.
Forgive the newbility, but WTF is a SARG? I've never seen it in WROX, and BOL is giving me a blank stare.
June 19, 2003 at 8:19 am
From the Sybase site.
quote:
SARG (search argument) the query's qualifying clause(s).
In other words, the WHERE clause ...
June 19, 2003 at 8:25 am
quote:
Tim:Index covering is for non-clustered indexes, but in this case the aggregate
function would force index covering b/c the optimizer in almost all cases
would resort to index covering whenever it encounters aggregates
quote:
talleman:but WTF is a SARG?
What is the point
MW
Edited by - mworku on 06/19/2003 08:26:10 AM
Edited by - mworku on 06/19/2003 08:26:42 AM
MW
June 19, 2003 at 8:48 am
The results are in:
quote:
WTF is a SARG?
That's a searchable argument I think... when you look at the execution plan at the bottom of the bubbles that pop up over each icon there is a section that says arguments. It first lists OBJECT indicating what index it used. Then if a SARG is present it may list SEEK: for instance where userID = @userid. Based off of the where clause, the optimizer will guess which index to use.
I ran the following three statements:
insert into tImpressionViewThroughBuilder
select userID, max(time)
from tImpressionWithID1
where userId is not null
group by userID
truncate table tImpressionViewThroughBuilder
insert into tImpressionViewThroughBuilder
select userID, max(time)
from tImpressionWithID1
group by userID
There are no null's in the userId column so the where clause has no effect on the actual data returned however, it had a big effect on the execution plans.
A clustered index seek was performed in the first block. 58% of the query runtime
A clustered index scan was performed in the second block. 83% of the query runtime
The first then used parallelism when performing the stream aggregate (4 processors on this machine) while the second didn't use parallelism.
At the end the runtime percentage was 41% vs 59% of the total time which cut the query from 3 minutes 43 seconds to 2 minutes 7 seconds... (I know the math doesn't work out, but there are some other things going on on the server)
Thanks all, but I'll be posting some more. Hopefully, my learning can help you all out aswell.
-J
Edited by - jraha on 06/19/2003 08:52:42 AM
June 19, 2003 at 12:19 pm
jraha's query forces the index to be used, but you may want to rethink that composite index.
For example, tShortTerm has a one to many relationship with tLongTerm, but as long as new rows are inserted into tShortTerm sequentially by date then you should add an identity column to tShortTerm and use that instead.
For example, say you have a composite clustered index on identity (column stid) and userid. (having stid first will speed up your inserts to tLongTerm, but having userid first would probably be more useful when selecting.)
insert into tLongTerm
Selectst1.userid, st1.date
From
(
select userId, max(stid) as maxStid
from tShortTerm (nolock)
Where stid is not null--only necessary if stid if 1st in index
Group by userid
) st
JOINtShortTerm st1 (nolock) on st.maxStid= st1.stid
Having a nonclustered index on date is most likely a good idea as well, but experiment. Stay away from max(date) as often as you can!
cl
Signature is NULL
June 19, 2003 at 10:43 pm
Thank you NPeters,
People love to use big vocabulary when it is unecessary. I am with you.
What a stupidity! Let's get real and let's speak in English! Only DBA with complex will start talking with weird language.
June 19, 2003 at 10:47 pm
Hey guys,
This is very interesting. The where statement influencing the quesry optimization.
Let's get real!!!! MIcrosoft is very stupid... they can have had forecasted that; adding a where statement just to improve the performance is a non-sense but well, I guess it pays...sometimes Microsoft gives me a big laugh 🙂
June 20, 2003 at 7:17 am
quote:
but as long as new rows are inserted into tShortTerm sequentially by date then you should add an identity column to tShortTerm and use that instead.
Good point! unless he also frequently has to query on a date-range, in which case I'd think you would want the clustered index on the date column, but a non-clustered index on an identity column would probably still be a good idea for the max() query even then.
Thanks for the idea!
June 20, 2003 at 11:17 am
quote:
Hey guys,This is very interesting. The where statement influencing the quesry optimization.
Let's get real!!!! MIcrosoft is very stupid... they can have had forecasted that; adding a where statement just to improve the performance is a non-sense but well, I guess it pays...sometimes Microsoft gives me a big laugh 🙂
Come down bro.
How would you do it ?
MW
Edited by - mworku on 06/20/2003 11:19:24 AM
MW
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply