October 26, 2005 at 12:15 pm
I have a simple table that stores messages of different types from
different sources. The definition of the table is shown below. I need
to devise an efficient query to return a "list of the N more recent
messages for a subset of sources within a specified time frame."
create table MessageTbl
(
src nvarchar(50), -- Source of the message
type nvarchar(50), -- Type of the message.
msg nvarchar(1000), -- Text of the message
dt datetime -- When the message was posted
)
We are given the following parameters:
1. declare @startTime datetime -- Start of the time frame
2. declare @endTime datetime -- End of the time frame
3. declare @myTable( src nvarchar(50), type nvarchar(50)) -- This table
contains a list of sources/types for which we want to obtain the
messages.
4. N -- How many messages per source/type
The following seems to produce the correct result but it is very expensive. It takes almost 30 seconds
when MessageTbl has about 3600 rows and @myTable has 51 rows. I checked
the execution plan and it seems there is a lot of table scanning going
on.
SELECT a.src, a.type, a.msg, a.dt
FROM MessageTbl AS a
INNER JOIN @myTable AS b
ON b.src = a.src
AND b.type = a.type
WHERE a.dt BETWEEN @startTime AND @endTime
AND a.dt IN (SELECT TOP 20 dt
FROM MessageTbl AS c
WHERE c.src = a.src
AND c.type = a.type
AND c.dt BETWEEN @startTime AND @endTime
ORDER BY c.dt DESC)
I would appreciate your suggetions and comments.
Thank you
CD
October 26, 2005 at 12:27 pm
Is the clustered index on the dt column?
October 26, 2005 at 1:40 pm
OK, somebody help me out here...
I can't see why you need to do the embedded query if it's the same table. Why just do the top 20 on the outer query already filtered by start and end?
(and as said an index on datetime can only help)
October 26, 2005 at 2:38 pm
>>4. N -- How many messages per source/type
If N is a variable parameter, you're going to have additional problems above the performance issue, namely that you can't use a variable with TOP (You have a hard-coded value of 20 in the example)
In SQL 2005, the solution to this problem will be easy, using the new RANK T-SQL enhancements and the ability to use a variable with TOP.
However, back in SQL2K ... the requirement for dynamic TOP calls for dynamic SQL, yuck, or use of a temp table and an IDENTITY for ranking. This becomes fugly in a hurry:
-- Create an empty temp table
SELECT Identity(int, 1, 1) As Rank, *
INTO #RankedMsgs
FROM MessageTbl
WHERE 0 = 1
-- Populate, order by source, type & date, for date-sequencial ranking of records
-- within each src/type
INSERT INTO #RankedMsgs (src, type, msg, dt)
SELECT src, type, msg, dt
FROM MessageTbl As a
INNER JOIN @myTable AS b
ON b.src = a.src
AND b.type = a.type
WHERE a.dt BETWEEN @startTime AND @endTime
ORDER BY src, type, dt
-- Return results
SELECT *
FROM #RankedMsgs As r
INNER JOIN
-- Derived table to get the min rank within each src/type
(
SELECT src, type, Min(Rank) As MinRank
FROM #RankedMsgs
GROUP BY src, type
) dtMin
WHERE r.src = dtMin.src
AND r.type = dtMin.type
-- this implements the dynamic TOP, with dynamic value in @N
AND (r.Rank - dtMin.MinRank) <= @N
ORDER BY r.src, r.type
October 27, 2005 at 12:45 am
Hello PW,
Thank you for your reply. Do you think that the query you suggested would benefit from any additional indices. I currently do not have any indices in any of the columns involved.
Thanks
-CD
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply