Where Clause with >/<= Statement

  • Hi,

    I've got a query which joins two tables on a user number field and then has a where clause with a greater than and smaller than where clause.

    eg:

    SELECT * 

    FROM tableA 

    LEFT OUTER JOIN tableB on tableA.userID=tableB.userID

    WHERE 

    tableB.LineNumber > tableA.StartLineNumber

    AND tableB.LineNumber <= tableA.StopLineNumber

    The above query takes much longer than when I actually go and replace TableA's start and stop line number fields with the actual values.

    eg:

    SELECT * 

    FROM tableA 

    LEFT OUTER JOIN tableB on tableA.userID=tableB.userID

    WHERE 

    tableB.LineNumber > 16598712

    AND tableB.LineNumber <= 98753132

    I've also tried using BETWEEN, and including the WHERE clause in the JOIN statement.

    Any Ideas? I'm Baffled.


    Robert

  • Robert

    Please will you provide some DDL, sample data and expected results, since it's hard to see what you're trying to do.

    Adding in that WHERE clause turns your OUTER JOIN into an INNER JOIN, so that your query is the equivalent of this:

    SELECT     *

    FROM       tableA a

    INNER JOIN tableB b

    ON         a.userID = b.userID

    AND        b.LineNumber > a.StartLineNumber

    AND        b.LineNumber <= a.StopLineNumber

    The inequality has to be evaluated for each row, causing a triangular join, and this is why your query is taking so long.  You can only rewrite it in the way you have done in your second attempt if the values of StartLineNumber and StopLineNumber are each the same in all rows.  Is this the case?

    How many rows do you have in your tables, and what indexes are there?

    John

  • Ok, I posted a reply, but it failed for some reason, so lets try again.

    Basically, tableB is Sales, and TableA is a Punctuation Table. The idea is when a cashier/waiter does sales, it gets posted to the Sales table, and at the end of aech Shift, they get Cashed-Up, and the days Start Line Number and Stop Line Number is written to the Puntuation Table, so we can get a report of each shift's total sales.

    CREATE TABLE [dbo].[Sales] (

     [LineNumber] [int] IDENTITY (1, 1) NOT NULL ,

     [UserID] [smallint] NULL ,

     [Amount] [money] NULL

    ) ON [PRIMARY]

    GO

    CREATE TABLE [dbo].[Punctuation] (

     [Dated] [datetime] NULL ,

     [UserID] [smallint] NULL ,

     [StartLineNumber] [int] NULL ,

     [StopLineNumber] [int] NULL

    ) ON [PRIMARY]

    GO

    Truncate Table Sales

    Truncate Table Punctuation

    insert into sales VALUES (1,27)

    insert into sales VALUES (1,35)

    insert into sales VALUES (2,55)

    insert into sales VALUES (1,62)

    insert into sales VALUES (3,72)

    insert into sales VALUES (2,81)

    insert into sales VALUES (2,13)

    insert into sales VALUES (3,24)

    insert into sales VALUES (3,33)

    insert into sales VALUES (3,43)

    insert into sales VALUES (3,52)

    insert into sales VALUES (3,63)

    insert into sales VALUES (3,73)

    insert into sales VALUES (3,87)

    insert into sales VALUES (1,95)

    insert into sales VALUES (1,42)

    insert into sales VALUES (3,33)

    insert into sales VALUES (1,24)

    insert into sales VALUES (1,36)

    insert into sales VALUES (2,37)

    insert into sales VALUES (2,98)

    insert into sales VALUES (3,19)

    insert into punctuation VALUES (getdate(),1,1,16)

    insert into punctuation VALUES (getdate(),2,3,7)

    insert into punctuation VALUES (getdate(),3,5,14)

    insert into punctuation VALUES (dateadd(d,1,getdate()),1,18,19)

    insert into punctuation VALUES (dateadd(d,1,getdate()),2,20,21)

    insert into punctuation VALUES (dateadd(d,1,getdate()),3,17,22)

    SELECT     Punctuation.Dated,Punctuation.UserID,SUM(Sales.Amount) as Amount

    FROM       Punctuation

    INNER JOIN Sales

    ON         Punctuation.userID = Sales.userID

    AND        Sales.LineNumber > Punctuation.StartLineNumber

    AND        Sales.LineNumber <= Punctuation.StopLineNumber

    GROUP BY Punctuation.Dated,Punctuation.UserID

    ORDER BY Punctuation.Dated,Punctuation.UserID

    The actual tables has alot more fields and more rows (2M rows in Sales and 20k in Punctuation), than the above example. All the fields except the amount field has non-clustered indexes. Also, the LineNumbers will not be the same for all users. I actually used the line numbers in the query to test the speed, but I also added a specific userID in the where clause, and still had the same speed diff.

    I know this is a mouth full, so let me know if I missed something.

    Thanks in advance.


    Robert

  • Just having a bunch of idexes may not help a thing...

    Try these...

    CREATE INDEX IX_Sales_Composite1 ON Sales (UserID,LineNumber,Amount)
    CREATE INDEX IX_Punctuation_Composite1 ON Punctuation (UserID,StartLineNumber,StopLineNumber,Dated)

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Hi, Thanks for the post. The tables already have composite indexes on the fields which is used in the where/join clause of the query.

    I've found something that might be causing the problem, I'll post it here if I find it.

    Thanks. 


    Robert

  • Ok, I found the problem.

    This is problably the reason why one should rather provide the complete query and not extracts.

    There is floor function being used on one of the join fields.

    ie:

    left outer join TableB on TableA.UserID=floor(TableA.UserID)

    This renders any index useless.

    Thanks for your help.


    Robert

  • Just add a new calculated field in the table, index it (makes it materialized), make sure that your query uses the correct indexes and optimize accordingly and you'll be set to go.

     

    l8r

  • Robert

    UserID is an integer, so why are you using the FLOOR function on it anyway?

    John

  • Sorry, some of the details got lost in translation or in my attempt to simplify the problem for this post (The original Query's size is about 25k).

    Thanks for your help.

     

     


    Robert

  • The difference comes from the fact that hardcoded values let optimiser estimate number of index entries to participate in the range and, because the number is quite small, use non-clustered index on the column.

    When limits of the range are not explicitly specified optimiser takes more general approach and uses table scan, because non-clustered index seek is less effecive in most cases.

    Having clustered index on tableB(userID, LineNumber ) will fix your problem.

    And make it a rule - every column used for range selection must have clustered index on it.

    _____________
    Code for TallyGenerator

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply