May 22, 2007 at 3:30 am
Hi,
I've got a query which joins two tables on a user number field and then has a where clause with a greater than and smaller than where clause.
eg:
SELECT *
FROM tableA
LEFT OUTER JOIN tableB on tableA.userID=tableB.userID
WHERE
tableB.LineNumber > tableA.StartLineNumber
AND tableB.LineNumber <= tableA.StopLineNumber
The above query takes much longer than when I actually go and replace TableA's start and stop line number fields with the actual values.
eg:
SELECT *
FROM tableA
LEFT OUTER JOIN tableB on tableA.userID=tableB.userID
WHERE
tableB.LineNumber > 16598712
AND tableB.LineNumber <= 98753132
I've also tried using BETWEEN, and including the WHERE clause in the JOIN statement.
Any Ideas? I'm Baffled.
Robert
May 22, 2007 at 4:08 am
Robert
Please will you provide some DDL, sample data and expected results, since it's hard to see what you're trying to do.
Adding in that WHERE clause turns your OUTER JOIN into an INNER JOIN, so that your query is the equivalent of this:
SELECT *
FROM tableA a
INNER JOIN tableB b
ON a.userID = b.userID
AND b.LineNumber > a.StartLineNumber
AND b.LineNumber <= a.StopLineNumber
The inequality has to be evaluated for each row, causing a triangular join, and this is why your query is taking so long. You can only rewrite it in the way you have done in your second attempt if the values of StartLineNumber and StopLineNumber are each the same in all rows. Is this the case?
How many rows do you have in your tables, and what indexes are there?
John
May 22, 2007 at 5:35 am
Ok, I posted a reply, but it failed for some reason, so lets try again.
Basically, tableB is Sales, and TableA is a Punctuation Table. The idea is when a cashier/waiter does sales, it gets posted to the Sales table, and at the end of aech Shift, they get Cashed-Up, and the days Start Line Number and Stop Line Number is written to the Puntuation Table, so we can get a report of each shift's total sales.
CREATE TABLE [dbo].[Sales] (
[LineNumber] [int] IDENTITY (1, 1) NOT NULL ,
[UserID] [smallint] NULL ,
[Amount] [money] NULL
) ON [PRIMARY]
GO
CREATE TABLE [dbo].[Punctuation] (
[Dated] [datetime] NULL ,
[UserID] [smallint] NULL ,
[StartLineNumber] [int] NULL ,
[StopLineNumber] [int] NULL
) ON [PRIMARY]
GO
Truncate Table Sales
Truncate Table Punctuation
insert into sales VALUES (1,27)
insert into sales VALUES (1,35)
insert into sales VALUES (2,55)
insert into sales VALUES (1,62)
insert into sales VALUES (3,72)
insert into sales VALUES (2,81)
insert into sales VALUES (2,13)
insert into sales VALUES (3,24)
insert into sales VALUES (3,33)
insert into sales VALUES (3,43)
insert into sales VALUES (3,52)
insert into sales VALUES (3,63)
insert into sales VALUES (3,73)
insert into sales VALUES (3,87)
insert into sales VALUES (1,95)
insert into sales VALUES (1,42)
insert into sales VALUES (3,33)
insert into sales VALUES (1,24)
insert into sales VALUES (1,36)
insert into sales VALUES (2,37)
insert into sales VALUES (2,98)
insert into sales VALUES (3,19)
insert into punctuation VALUES (getdate(),1,1,16)
insert into punctuation VALUES (getdate(),2,3,7)
insert into punctuation VALUES (getdate(),3,5,14)
insert into punctuation VALUES (dateadd(d,1,getdate()),1,18,19)
insert into punctuation VALUES (dateadd(d,1,getdate()),2,20,21)
insert into punctuation VALUES (dateadd(d,1,getdate()),3,17,22)
SELECT Punctuation.Dated,Punctuation.UserID,SUM(Sales.Amount) as Amount
FROM Punctuation
INNER JOIN Sales
ON Punctuation.userID = Sales.userID
AND Sales.LineNumber > Punctuation.StartLineNumber
AND Sales.LineNumber <= Punctuation.StopLineNumber
GROUP BY Punctuation.Dated,Punctuation.UserID
ORDER BY Punctuation.Dated,Punctuation.UserID
The actual tables has alot more fields and more rows (2M rows in Sales and 20k in Punctuation), than the above example. All the fields except the amount field has non-clustered indexes. Also, the LineNumbers will not be the same for all users. I actually used the line numbers in the query to test the speed, but I also added a specific userID in the where clause, and still had the same speed diff.
I know this is a mouth full, so let me know if I missed something.
Thanks in advance.
Robert
May 22, 2007 at 6:25 am
Just having a bunch of idexes may not help a thing...
Try these...
CREATE INDEX IX_Sales_Composite1 ON Sales (UserID,LineNumber,Amount) CREATE INDEX IX_Punctuation_Composite1 ON Punctuation (UserID,StartLineNumber,StopLineNumber,Dated)
--Jeff Moden
Change is inevitable... Change for the better is not.
May 22, 2007 at 8:45 am
Hi, Thanks for the post. The tables already have composite indexes on the fields which is used in the where/join clause of the query.
I've found something that might be causing the problem, I'll post it here if I find it.
Thanks.
Robert
May 22, 2007 at 9:14 am
Ok, I found the problem.
This is problably the reason why one should rather provide the complete query and not extracts.
There is floor function being used on one of the join fields.
ie:
left outer join TableB on TableA.UserID=floor(TableA.UserID)
This renders any index useless.
Thanks for your help.
Robert
May 22, 2007 at 12:16 pm
Just add a new calculated field in the table, index it (makes it materialized), make sure that your query uses the correct indexes and optimize accordingly and you'll be set to go.
l8r
May 23, 2007 at 1:11 am
Robert
UserID is an integer, so why are you using the FLOOR function on it anyway?
John
May 23, 2007 at 2:20 am
Sorry, some of the details got lost in translation or in my attempt to simplify the problem for this post (The original Query's size is about 25k).
Thanks for your help.
Robert
May 23, 2007 at 3:21 am
The difference comes from the fact that hardcoded values let optimiser estimate number of index entries to participate in the range and, because the number is quite small, use non-clustered index on the column.
When limits of the range are not explicitly specified optimiser takes more general approach and uses table scan, because non-clustered index seek is less effecive in most cases.
Having clustered index on tableB(userID, LineNumber ) will fix your problem.
And make it a rule - every column used for range selection must have clustered index on it.
_____________
Code for TallyGenerator
Viewing 10 posts - 1 through 9 (of 9 total)
You must be logged in to reply to this topic. Login to reply