November 2, 2005 at 4:05 pm
Hello All,
Consider the following table:
create table MessageTbl
(
msgId int primary key clustered,
src nvarchar(50), -- Source of the message
type int, -- Type of the message.
msg nvarchar(1000), -- Text of the message
dt datetime -- When the message was posted
)
With indexes:
CREATE INDEX IX_MessageTbl_1
ON MessageTbl (src, type, dt DESC)
CREATE INDEX IX_MessageTbl_2
ON MessageTbl (dt DESC)
In the following queries @myTable, @startime , and @endTine are defined as:
declare @myTable table( src nvarchar(50), type int)
declare @startTime datetime
declare @endTime datetime
Consider the following two queries:
------------- #1
SELECT a.*
FROM MessageTbl AS a
INNER JOIN @myTable AS b
ON b.src = a.src
AND b.type = a.type
WHERE a.dt BETWEEN @startTime AND @endTime
AND (SELECT COUNT(*)
FROM MessageTbl AS c
WHERE c.src = a.src
AND c.type = a.type
AND c.dt BETWEEN @startTime AND @endTime
AND c.dt <= a.dt) <= 20
------------- #2
SELECT a.*
FROM MessageTbl AS a
INNER JOIN @myTable AS b
ON b.src = a.src
AND b.type = a.type
WHERE a.dt BETWEEN @startTime AND @endTime
AND a.dt IN (SELECT TOP 20 dt
FROM MessageTbl AS c
WHERE c.src = a.src
AND c.type = a.type
AND c.dt BETWEEN @startTime AND @endTime
ORDER BY c.dt DESC)
Query #2 runs 6 times faster than #1 under the same conditions. Nevertheles, #2 has a big disadvantage to me because I need to vary the number of elements returned by the TOP clause. I could create the whole query string dynamically but I would consider that only as a last resource.
Frankly, I do not understand why #1 is slower. Both queries seem to favor from the defined indexes and have apparently the same complexity. The only difference is in one of the subqueries in the WHERE clause. While #1 has
AND (SELECT COUNT(*)
FROM MessageTbl AS c
WHERE c.src = a.src
AND c.type = a.type
AND c.dt BETWEEN @startTime AND @endTime
AND c.dt <= a.dt) <= 20
query #2 has
AND a.dt IN (SELECT TOP 20 dt
FROM MessageTbl AS c
WHERE c.src = a.src
AND c.type = a.type
AND c.dt BETWEEN @startTime AND @endTime
ORDER BY c.dt DESC)
I would appreciate any insight behind the differences in performance as well as any suggestions (new indices, hints, etc) on how to improve the performance of #1.
Thank you
- CD
November 2, 2005 at 5:41 pm
It would be helpful if you provided a description of what you are trying to achieve.
On first glance I'd suggest the following,
1) Move the sub-query from the WHERE clause to the FROM clause
2) Change the primary key to non-clustered. And change your index on the date column to the clustered one.
--------------------
Colt 45 - the original point and click interface
November 3, 2005 at 1:27 am
did you try to create a CLUSTERED index on dt, src, type? In this order. Because you are doing a range seeking (between) and a "LESS THEN" comparison so a clustered index on the dt column at the first position would be more effective
Bye
Gabor
November 3, 2005 at 4:36 am
Both of your subqueries are correlated, which is often a problem, but with the 1st query, SQL probably has to do a lot more evaluation and reading rows to determine counts, and then perform the <= 20 comparison.
I'm trying to work out what you are trying to do, but it's a bit late and I'm distracted by the tellie Could you elaborate what you are trying to accomplish...
I can only assume that you are trying to get messages from a set of sources + types between two dates - furthermore, you only want the top X records.. I can think of a few ways, but before trying other things, you should try adding in the extra indices as described above (clustered index on date, type and source in that order) and also make your query look like this
SELECT a.* FROM MessageTbl AS a INNER JOIN @myTable AS b ON b.src = a.src AND b.type = a.type WHERE a.dt BETWEEN @startTime AND @endTime AND (SELECT COUNT(*) FROM MessageTbl AS c WHERE c.src = a.src AND c.src = b.src AND c.type = a.type AND c.type = b.type AND c.dt BETWEEN @startTime AND @endTime AND a.dt >= @startTime AND c.dt <= a.dt) <= 20
This might give SQL Server a better way of matching up your tables (since A & B have the same values for src & type columns and a's dt field will always be >= @startTime. Also, you could try making @myTable a temp table, rather than a table variable, and add indices to it.
If you let us know your requirements, we can probably come up with something better. Cheers!
November 3, 2005 at 12:12 pm
Ian,
I appologize for not giving a better picture of what I want to accomplish. This is a brief description.
The MessageTbl table accumulates messages of different types (type column) from different sources (src column). A specific source can have many types associated. The dt column is the time the message was recorded.
What I want is equivalent to the union of the top N messages in a time frame for each row (src, type) in the @myTable variable. Our current application performs all the queries very inneficiently from the client side. I am attemting to optimize the performance by creating a SP that speeds up the process.
Regards
- CD
November 5, 2005 at 7:09 pm
No probs CD
How did the other suggestions impact on your query execution speed? If they didn't help, I'll see if I can think of some way of using grouping etc to make it efficient...
Cheers
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply