Where Clause with >/<= Statement

Question

Post reply

Where Clause with >/<= Statement

hrodberaht

SSC-Addicted

Points: 428
More actions
May 22, 2007 at 3:30 am

#116818

Hi,
I've got a query which joins two tables on a user number field and then has a where clause with a greater than and smaller than where clause.
eg:
SELECT *
FROM tableA
LEFT OUTER JOIN tableB on tableA.userID=tableB.userID
WHERE
tableB.LineNumber > tableA.StartLineNumber
AND tableB.LineNumber <= tableA.StopLineNumber
The above query takes much longer than when I actually go and replace TableA's start and stop line number fields with the actual values.
eg:
SELECT *
FROM tableA
LEFT OUTER JOIN tableB on tableA.userID=tableB.userID
WHERE
tableB.LineNumber > 16598712
AND tableB.LineNumber <= 98753132
I've also tried using BETWEEN, and including the WHERE clause in the JOIN statement.
Any Ideas? I'm Baffled.

Robert

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 1

Robert

Please will you provide some DDL, sample data and expected results, since it's hard to see what you're trying to do.

Adding in that WHERE clause turns your OUTER JOIN into an INNER JOIN, so that your query is the equivalent of this:

SELECT *

FROM tableA a

INNER JOIN tableB b

ON a.userID = b.userID

AND b.LineNumber > a.StartLineNumber

AND b.LineNumber <= a.StopLineNumber

The inequality has to be evaluated for each row, causing a triangular join, and this is why your query is taking so long. You can only rewrite it in the way you have done in your second attempt if the values of StartLineNumber and StopLineNumber are each the same in all rows. Is this the case?

How many rows do you have in your tables, and what indexes are there?

John

hrodberaht SSC-Addicted Points: 428 More actions · Answer 2

Ok, I posted a reply, but it failed for some reason, so lets try again.

Basically, tableB is Sales, and TableA is a Punctuation Table. The idea is when a cashier/waiter does sales, it gets posted to the Sales table, and at the end of aech Shift, they get Cashed-Up, and the days Start Line Number and Stop Line Number is written to the Puntuation Table, so we can get a report of each shift's total sales.

CREATE TABLE [dbo].[Sales] (

[LineNumber] [int] IDENTITY (1, 1) NOT NULL ,

[UserID] [smallint] NULL ,

[Amount] [money] NULL

) ON [PRIMARY]

GO

CREATE TABLE [dbo].[Punctuation] (

[Dated] [datetime] NULL ,

[UserID] [smallint] NULL ,

[StartLineNumber] [int] NULL ,

[StopLineNumber] [int] NULL

) ON [PRIMARY]

GO

Truncate Table Sales

Truncate Table Punctuation

insert into sales VALUES (1,27)

insert into sales VALUES (1,35)

insert into sales VALUES (2,55)

insert into sales VALUES (1,62)

insert into sales VALUES (3,72)

insert into sales VALUES (2,81)

insert into sales VALUES (2,13)

insert into sales VALUES (3,24)

insert into sales VALUES (3,33)

insert into sales VALUES (3,43)

insert into sales VALUES (3,52)

insert into sales VALUES (3,63)

insert into sales VALUES (3,73)

insert into sales VALUES (3,87)

insert into sales VALUES (1,95)

insert into sales VALUES (1,42)

insert into sales VALUES (3,33)

insert into sales VALUES (1,24)

insert into sales VALUES (1,36)

insert into sales VALUES (2,37)

insert into sales VALUES (2,98)

insert into sales VALUES (3,19)

insert into punctuation VALUES (getdate(),1,1,16)

insert into punctuation VALUES (getdate(),2,3,7)

insert into punctuation VALUES (getdate(),3,5,14)

insert into punctuation VALUES (dateadd(d,1,getdate()),1,18,19)

insert into punctuation VALUES (dateadd(d,1,getdate()),2,20,21)

insert into punctuation VALUES (dateadd(d,1,getdate()),3,17,22)

SELECT Punctuation.Dated,Punctuation.UserID,SUM(Sales.Amount) as Amount

FROM Punctuation

INNER JOIN Sales

ON Punctuation.userID = Sales.userID

AND Sales.LineNumber > Punctuation.StartLineNumber

AND Sales.LineNumber <= Punctuation.StopLineNumber

GROUP BY Punctuation.Dated,Punctuation.UserID

ORDER BY Punctuation.Dated,Punctuation.UserID

The actual tables has alot more fields and more rows (2M rows in Sales and 20k in Punctuation), than the above example. All the fields except the amount field has non-clustered indexes. Also, the LineNumbers will not be the same for all users. I actually used the line numbers in the query to test the speed, but I also added a specific userID in the where clause, and still had the same speed diff.

I know this is a mouth full, so let me know if I missed something.

Thanks in advance.

Robert

Jeff Moden SSC Guru Points: 1004318 More actions · Answer 3

Just having a bunch of idexes may not help a thing...

Try these...

CREATE INDEX IX_Sales_Composite1 ON Sales (UserID,LineNumber,Amount) CREATE INDEX IX_Punctuation_Composite1 ON Punctuation (UserID,StartLineNumber,StopLineNumber,Dated)

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

hrodberaht SSC-Addicted Points: 428 More actions · Answer 4

Hi, Thanks for the post. The tables already have composite indexes on the fields which is used in the where/join clause of the query.

I've found something that might be causing the problem, I'll post it here if I find it.

Thanks.

Robert

hrodberaht SSC-Addicted Points: 428 More actions · Answer 5

Ok, I found the problem.

This is problably the reason why one should rather provide the complete query and not extracts.

There is floor function being used on one of the join fields.

ie:

left outer join TableB on TableA.UserID=floor(TableA.UserID)

This renders any index useless.

Thanks for your help.

Robert

Ninja's_RGR'us SSC Guru Points: 294069 More actions · Answer 6

Just add a new calculated field in the table, index it (makes it materialized), make sure that your query uses the correct indexes and optimize accordingly and you'll be set to go.

l8r

John Mitchell-245523 SSC Guru Points: 148809 More actions · Answer 7

Robert

UserID is an integer, so why are you using the FLOOR function on it anyway?

John

hrodberaht SSC-Addicted Points: 428 More actions · Answer 8

Sorry, some of the details got lost in translation or in my attempt to simplify the problem for this post (The original Query's size is about 25k).

Thanks for your help.

Robert

Sergiy SSC Guru Points: 110208 More actions · Answer 9

The difference comes from the fact that hardcoded values let optimiser estimate number of index entries to participate in the range and, because the number is quite small, use non-clustered index on the column.

When limits of the range are not explicitly specified optimiser takes more general approach and uses table scan, because non-clustered index seek is less effecive in most cases.

Having clustered index on tableB(userID, LineNumber ) will fix your problem.

And make it a rule - every column used for range selection must have clustered index on it.

_____________
Code for TallyGenerator