October 26, 2007 at 5:32 am
Hello,
I'm wondering about the use of >= as part of a join.
With the example below is the use of >= and <= the most efficient way of accomplishing the final query? Is there a better way to store the bonus data which would improve the performance of the query? The only way i could think of would be to have a record for each date between the start and end date so you could do an inner join on the Saledate and a single Bonusdate.
Create Table BONUSES (Account varchar(30), Product varchar(30), StarDate datetime,EndDate datetime,Bonus int)
'20943','A','26/01/2005','31/01/2005',10
would become
Create Table BONUSES (Account varchar(30), Product varchar(30), BonusDate datetime, Bonus int)
'20943','A','26/01/2005',10
'20943','A','27/01/2005',10
'20943','A','28/01/2005',10
'20943','A','29/01/2005',10
'20943','A','30/01/2005',10
'20943','A','31/01/2005',10
With a unique index on Account Product BonusDate
I just get the impression the use of >= and <= in this situation is producing a partial cartesian query.
The "real world" example has 1.5 million rows of sales data and 6000 rows of bonuses, my tests so far return results in 3 seconds on a pretty basic box. The tables are well indexed - the index analyser has been unable to find any improvements at least!
This is a reporting DB so i can pretty much do what i like to the tables.
Any advice gratefully received.
Thanks
K.
SET DATEFORMAT dmy
Create Table BONUSES (Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime,Bonus int)
Create Table SALES (Account varchar(30), Product varchar(30), SaleDate datetime)
Go
DECLARE @sample1-2 Table(Account varchar(30), Product varchar(30), StartDate datetime,EndDate datetime,Bonus int)
INSERT INTO @sample1-2
SELECT '20943','A','26/01/2005','31/01/2005',10 UNION
SELECT '20943','A','01/02/2005','30/04/2007',10 UNION
SELECT '20943','A','01/05/2007','01/05/2007', 0 UNION
SELECT '20943','B','01/04/2003','01/10/2007',30 UNION
SELECT '20943','C','01/04/2003','31/07/2004',20 UNION
SELECT '20943','C','01/08/2004','15/02/2006',25 UNION
SELECT '20943','C','16/02/2006','28/02/2006',35 UNION
SELECT '20943','C','01/03/2006','01/10/2007',35 UNION
SELECT '20943','D','01/04/2003','31/07/2004',15 UNION
SELECT '20943','D','01/08/2004','01/10/2007',30 UNION
SELECT '21000','A','28/01/2005','31/03/2005',10 UNION
SELECT '21000','A','01/04/2005','01/10/2007',10 UNION
SELECT '21000','B','01/04/2006','30/06/2006',30 UNION
SELECT '21000','B','01/07/2006','01/10/2007',25 UNION
SELECT '21000','C','15/04/2003','31/07/2004',20 UNION
SELECT '21000','C','01/08/2004','15/02/2006',25 UNION
SELECT '21000','C','16/02/2006','28/02/2006',35 UNION
SELECT '21000','C','01/03/2006','01/10/2007',35 UNION
SELECT '21000','D','01/04/2003','31/07/2004',15 UNION
SELECT '21000','D','01/08/2004','01/10/2007',30
Insert into BONUSES
SELECT Account,Product,StartDate, EndDate, Bonus from @sample1-2
DECLARE @Sample2 Table(Account varchar(30), Product varchar(30), SaleDate datetime)
INSERT INTO @Sample2
SELECT '20943','A', '27/01/2005' UNION
SELECT '20943','A','02/02/2005' UNION
SELECT '20943','A','02/05/2007' UNION
SELECT '20943','B','02/04/2003' UNION
SELECT '20943','C','02/04/2003' UNION
SELECT '20943','C','02/08/2004' UNION
SELECT '20943','C','17/02/2006' UNION
SELECT '20943','C','02/03/2006' UNION
SELECT '20943','D','02/04/2003' UNION
SELECT '20943','D','02/08/2004' UNION
SELECT '21000','A','29/01/2005' UNION
SELECT '21000','A','02/04/2005' UNION
SELECT '21000','B','02/04/2006' UNION
SELECT '21000','B','02/07/2006' UNION
SELECT '21000','C','16/04/2003' UNION
SELECT '21000','C','02/08/2004' UNION
SELECT '21000','C','17/02/2006' UNION
SELECT '21000','C','02/03/2006' UNION
SELECT '21000','D','02/04/2003' UNION
SELECT '21000','D','02/08/2004'
Insert into SALES
SELECT Account,Product,SaleDate from @Sample2
SELECT dbo.SALES.Account, dbo.SALES.Product, dbo.SALES.SaleDate, dbo.BONUSES.Bonus
FROM dbo.SALES INNER JOIN
dbo.BONUSES ON dbo.SALES.Account = dbo.BONUSES.Account AND dbo.SALES.Product = dbo.BONUSES.Product AND
dbo.SALES.SaleDate >= dbo.BONUSES.StartDate AND dbo.SALES.SaleDate <= dbo.BONUSES.EndDate
October 26, 2007 at 7:07 am
Well - it's certainly not a cartesian product, since you have a "real" set of join criteria.
I don't think it will make a lot (if any) difference, but I'm somehow seeing the BETWEEN syntax as something belonging in the WHERE clause rather than the join clause. Either way - assuming you have the right indexes like you mentioned - you shouldn't have too many perf issues.
This is all assuming that ONLY the very last select is what you plan on using in Production. Your test process is a little interesting (you do two separate table inserts for each test table - why? What's the table variable for?).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 26, 2007 at 7:56 am
IMO one should always work with deterministic data, meaning always specify an end date (even if it is way up in the future).
It avoids handling a NULL enddate and urges developers to read and interprete ranges.
Also provide a trigger to check only one bonus can be "active" at a certain point in time.
I also tend to use aliasses and write the date usage in chronological order:
well, nag, nag, nag ... that's me 😉
SELECT S.Account, S.Product, S.SaleDate, B.Bonus
FROM dbo.SALES S
INNER JOIN dbo.BONUSES B
ON S.Account = B.Account
AND S.Product = B.Product
AND B.StartDate <= S.SaleDate
AND S.SaleDate <= B.EndDate ;
To support this query I would suggest :
Offcourse indexes on the join predicates ( Account and Product ) but I guess those will already be in place as FK-indexes.
- for the Sale table an index for column Salesdate.
- for the Bonus table an index for columns (StartDate, EndDate)
(one index containing both columns !)
You could indeed concider a "bonus_per_date" table, containing one row per date (indexed) for a product.
This table could be build from your bonus table containing the dateranges for an active bonus.
CREATE FUNCTION utvf_getDateList ( @startDate datetime, @endDate DATETIME)
RETURNS @tvfResult TABLE ( effective_date DATETIME NOT NULL )
AS
BEGIN
WITH cteNumberOfDays AS
(
SELECT 0 as num
UNION ALL
SELECT num + 1
FROM cteNumberOfDays
WHERE num < (SELECT DATEDIFF(dd, @startDate, @endDate))
)
INSERT @tvfResult
SELECT DATEADD(d, num , @startDate)
FROM cteNumberOfDays
OPTION (MAXRECURSION 0); -- MAXRECURSION: The server-wide default is 100. When 0 is specified, no limit is applied. 0 - 32767
RETURN
END
go
SELECT B.Account, B.Product, effective_date, B.Bonus
FROM #BONUSES3 B
OUTER APPLY utvf_getDateList (B.StartDate , B.EndDate) ;
go
As you have noticed, testing is the only way to actualy figure out what it (sqlserver) is and will be doing with your data.
Multiple processors, disks, files, may also help optimize parallelism.
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
October 26, 2007 at 8:12 am
Oops SQL7-2000 forum ..... offcouse CTE's will not be available for you, nor will outerapply :blush:
CREATE FUNCTION utvf_getDateList_2000 ( @startDate datetime, @endDate DATETIME)
RETURNS @tvfResult TABLE ( effective_date DATETIME NOT NULL )
AS
BEGIN
DECLARE @num AS INT
, @counter AS INT
SELECT @num = DATEDIFF(dd, @startDate, @endDate)
, @counter = 0
WHILE @counter < @num
BEGIN
INSERT @tvfResult values( DATEADD(d, @counter , @startDate) )
SET @counter = @counter + 1
END
RETURN
END
GO
SELECT B.Account, B.Product, effective_date, B.Bonus
FROM #BONUSES3 B
INNER join utvf_getDateList_2000 ('2007-01-01', '2010-01-01')
ON effective_date BETWEEN B.StartDate and B.EndDate
ORDER BY 1,2,3 ;
Johan
Learn to play, play to learn !
Dont drive faster than your guardian angel can fly ...
but keeping both feet on the ground wont get you anywhere :w00t:
- How to post Performance Problems
- How to post data/code to get the best help[/url]
- How to prevent a sore throat after hours of presenting ppt
press F1 for solution, press shift+F1 for urgent solution 😀
Need a bit of Powershell? How about this
Who am I ? Sometimes this is me but most of the time this is me
October 30, 2007 at 3:30 am
Hi,
Sorry for the lateness of reply.
Matt: I tried the where clause. Interestingly - to me at least - if you try this in a view it will automatically re-write it as a join. The estimated execution plan seems to be indentical as well.
As for the test data, i ripped this from a previous post which had the data in - saved a bit of time.
ALZDBA: I'm happy the tables are correctly indexed, certainly the index tuning wizard had no helpful suggestions. I have been using the data from the sales table for well over a year now so the indexes on that are as good as i can get them. The indexes on the bonus table seemed fairly obvious. Your use of CTE's is interesting. As you realized i'm using 2000 but it's always worth investigating what is available. I don't think we're likely to upgrade until 2008 sp1!
This data comes from a rather strange transactional database that i have to report on. That database takes care of the business logic and validation.
As i've learned more about SQL i've come to realize just how easy it is to write poor queries. There was something about this that just didn't feel right. I got bitten by a triangular join in one report that i didn't spot and something about this just didn't feel right :ermm:
I'm going to give the bonus_per_date table a go and see what difference that makes.
Thanks for your help
K.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply