>= as part of a join?

  • Hello,

    I'm wondering about the use of >= as part of a join.

    With the example below is the use of >= and <= the most efficient way of accomplishing the final query? Is there a better way to store the bonus data which would improve the performance of the query? The only way i could think of would be to have a record for each date between the start and end date so you could do an inner join on the Saledate and a single Bonusdate.

    Create Table BONUSES (Account varchar(30), Product varchar(30), StarDate datetime,EndDate datetime,Bonus int)

    '20943','A','26/01/2005','31/01/2005',10

    would become

    Create Table BONUSES (Account varchar(30), Product varchar(30), BonusDate datetime, Bonus int)

    '20943','A','26/01/2005',10

    '20943','A','27/01/2005',10

    '20943','A','28/01/2005',10

    '20943','A','29/01/2005',10

    '20943','A','30/01/2005',10

    '20943','A','31/01/2005',10

    With a unique index on Account Product BonusDate

    I just get the impression the use of >= and <= in this situation is producing a partial cartesian query.

    The "real world" example has 1.5 million rows of sales data and 6000 rows of bonuses, my tests so far return results in 3 seconds on a pretty basic box. The tables are well indexed - the index analyser has been unable to find any improvements at least!

    This is a reporting DB so i can pretty much do what i like to the tables.

    Any advice gratefully received.

    Thanks

    K.

    SET DATEFORMAT dmy

    Create Table BONUSES (Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime,Bonus int)

    Create Table SALES (Account varchar(30), Product varchar(30), SaleDate datetime)

    Go

    DECLARE @sample1-2 Table(Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime,Bonus int)

    INSERT INTO @sample1-2

    SELECT '20943','A','26/01/2005','31/01/2005',10 UNION

    SELECT '20943','A','01/02/2005','30/04/2007',10 UNION

    SELECT '20943','A','01/05/2007','01/05/2007', 0 UNION

    SELECT '20943','B','01/04/2003','01/10/2007',30 UNION

    SELECT '20943','C','01/04/2003','31/07/2004',20 UNION

    SELECT '20943','C','01/08/2004','15/02/2006',25 UNION

    SELECT '20943','C','16/02/2006','28/02/2006',35 UNION

    SELECT '20943','C','01/03/2006','01/10/2007',35 UNION

    SELECT '20943','D','01/04/2003','31/07/2004',15 UNION

    SELECT '20943','D','01/08/2004','01/10/2007',30 UNION

    SELECT '21000','A','28/01/2005','31/03/2005',10 UNION

    SELECT '21000','A','01/04/2005','01/10/2007',10 UNION

    SELECT '21000','B','01/04/2006','30/06/2006',30 UNION

    SELECT '21000','B','01/07/2006','01/10/2007',25 UNION

    SELECT '21000','C','15/04/2003','31/07/2004',20 UNION

    SELECT '21000','C','01/08/2004','15/02/2006',25 UNION

    SELECT '21000','C','16/02/2006','28/02/2006',35 UNION

    SELECT '21000','C','01/03/2006','01/10/2007',35 UNION

    SELECT '21000','D','01/04/2003','31/07/2004',15 UNION

    SELECT '21000','D','01/08/2004','01/10/2007',30

    Insert into BONUSES

    SELECT Account,Product,StartDate, EndDate, Bonus from @sample1-2

    DECLARE @Sample2 Table(Account varchar(30), Product varchar(30), SaleDate datetime)

    INSERT INTO @Sample2

    SELECT '20943','A', '27/01/2005' UNION

    SELECT '20943','A','02/02/2005' UNION

    SELECT '20943','A','02/05/2007' UNION

    SELECT '20943','B','02/04/2003' UNION

    SELECT '20943','C','02/04/2003' UNION

    SELECT '20943','C','02/08/2004' UNION

    SELECT '20943','C','17/02/2006' UNION

    SELECT '20943','C','02/03/2006' UNION

    SELECT '20943','D','02/04/2003' UNION

    SELECT '20943','D','02/08/2004' UNION

    SELECT '21000','A','29/01/2005' UNION

    SELECT '21000','A','02/04/2005' UNION

    SELECT '21000','B','02/04/2006' UNION

    SELECT '21000','B','02/07/2006' UNION

    SELECT '21000','C','16/04/2003' UNION

    SELECT '21000','C','02/08/2004' UNION

    SELECT '21000','C','17/02/2006' UNION

    SELECT '21000','C','02/03/2006' UNION

    SELECT '21000','D','02/04/2003' UNION

    SELECT '21000','D','02/08/2004'

    Insert into SALES

    SELECT Account,Product,SaleDate from @Sample2

    SELECT dbo.SALES.Account, dbo.SALES.Product, dbo.SALES.SaleDate, dbo.BONUSES.Bonus

    FROM dbo.SALES INNER JOIN

    dbo.BONUSES ON dbo.SALES.Account = dbo.BONUSES.Account AND dbo.SALES.Product = dbo.BONUSES.Product AND

    dbo.SALES.SaleDate >= dbo.BONUSES.StartDate AND dbo.SALES.SaleDate <= dbo.BONUSES.EndDate

  • Well - it's certainly not a cartesian product, since you have a "real" set of join criteria.

    I don't think it will make a lot (if any) difference, but I'm somehow seeing the BETWEEN syntax as something belonging in the WHERE clause rather than the join clause. Either way - assuming you have the right indexes like you mentioned - you shouldn't have too many perf issues.

    This is all assuming that ONLY the very last select is what you plan on using in Production. Your test process is a little interesting (you do two separate table inserts for each test table - why? What's the table variable for?).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • IMO one should always work with deterministic data, meaning always specify an end date (even if it is way up in the future).

    It avoids handling a NULL enddate and urges developers to read and interprete ranges.

    Also provide a trigger to check only one bonus can be "active" at a certain point in time.

    I also tend to use aliasses and write the date usage in chronological order:

    well, nag, nag, nag ... that's me 😉

    SELECT S.Account, S.Product, S.SaleDate, B.Bonus

    FROM dbo.SALES S

    INNER JOIN dbo.BONUSES B

    ON S.Account = B.Account

    AND S.Product = B.Product

    AND B.StartDate <= S.SaleDate

    AND S.SaleDate <= B.EndDate ;

    To support this query I would suggest :

    Offcourse indexes on the join predicates ( Account and Product ) but I guess those will already be in place as FK-indexes.

    - for the Sale table an index for column Salesdate.

    - for the Bonus table an index for columns (StartDate, EndDate)

    (one index containing both columns !)

    You could indeed concider a "bonus_per_date" table, containing one row per date (indexed) for a product.

    This table could be build from your bonus table containing the dateranges for an active bonus.

    CREATE FUNCTION utvf_getDateList ( @startDate datetime, @endDate DATETIME)

    RETURNS @tvfResult TABLE ( effective_date DATETIME NOT NULL )

    AS

    BEGIN

    WITH cteNumberOfDays AS

    (

    SELECT 0 as num

    UNION ALL

    SELECT num + 1

    FROM cteNumberOfDays

    WHERE num < (SELECT DATEDIFF(dd, @startDate, @endDate))

    )

    INSERT @tvfResult

    SELECT DATEADD(d, num , @startDate)

    FROM cteNumberOfDays

    OPTION (MAXRECURSION 0); -- MAXRECURSION: The server-wide default is 100. When 0 is specified, no limit is applied. 0 - 32767

    RETURN

    END

    go

    SELECT B.Account, B.Product, effective_date, B.Bonus

    FROM #BONUSES3 B

    OUTER APPLY utvf_getDateList (B.StartDate , B.EndDate) ;

    go

    As you have noticed, testing is the only way to actualy figure out what it (sqlserver) is and will be doing with your data.

    Multiple processors, disks, files, may also help optimize parallelism.

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Oops SQL7-2000 forum ..... offcouse CTE's will not be available for you, nor will outerapply :blush:

    CREATE FUNCTION utvf_getDateList_2000 ( @startDate datetime, @endDate DATETIME)

    RETURNS @tvfResult TABLE ( effective_date DATETIME NOT NULL )

    AS

    BEGIN

    DECLARE @num AS INT

    , @counter AS INT

    SELECT @num = DATEDIFF(dd, @startDate, @endDate)

    , @counter = 0

    WHILE @counter < @num

    BEGIN

    INSERT @tvfResult values( DATEADD(d, @counter , @startDate) )

    SET @counter = @counter + 1

    END

    RETURN

    END

    GO

    SELECT B.Account, B.Product, effective_date, B.Bonus

    FROM #BONUSES3 B

    INNER join utvf_getDateList_2000 ('2007-01-01', '2010-01-01')

    ON effective_date BETWEEN B.StartDate and B.EndDate

    ORDER BY 1,2,3 ;

    Johan

    Learn to play, play to learn !

    Dont drive faster than your guardian angel can fly ...
    but keeping both feet on the ground wont get you anywhere :w00t:

    - How to post Performance Problems
    - How to post data/code to get the best help[/url]

    - How to prevent a sore throat after hours of presenting ppt

    press F1 for solution, press shift+F1 for urgent solution 😀

    Need a bit of Powershell? How about this

    Who am I ? Sometimes this is me but most of the time this is me

  • Hi,

    Sorry for the lateness of reply.

    Matt: I tried the where clause. Interestingly - to me at least - if you try this in a view it will automatically re-write it as a join. The estimated execution plan seems to be indentical as well.

    As for the test data, i ripped this from a previous post which had the data in - saved a bit of time.

    ALZDBA: I'm happy the tables are correctly indexed, certainly the index tuning wizard had no helpful suggestions. I have been using the data from the sales table for well over a year now so the indexes on that are as good as i can get them. The indexes on the bonus table seemed fairly obvious. Your use of CTE's is interesting. As you realized i'm using 2000 but it's always worth investigating what is available. I don't think we're likely to upgrade until 2008 sp1!

    This data comes from a rather strange transactional database that i have to report on. That database takes care of the business logic and validation.

    As i've learned more about SQL i've come to realize just how easy it is to write poor queries. There was something about this that just didn't feel right. I got bitten by a triangular join in one report that i didn't spot and something about this just didn't feel right :ermm:

    I'm going to give the bonus_per_date table a go and see what difference that makes.

    Thanks for your help

    K.

Viewing 5 posts - 1 through 4 (of 4 total)

You must be logged in to reply to this topic. Login to reply