July 5, 2008 at 10:18 am
Hi all
I have this script that calculates the best product combinations based on their sale price.
Anyway, I use recursive CTE to generate all possible combinations, from which I only need the cheapest one.
Running the select top 1 is quick, but adding the ORDER BY TotalPrice takes query down to 17 seconds.
Here is how to reproduce it:
First, create the demo table. It has 92 records.
CREATE TABLE [dbo].[DemoData](
[BasketSaleRowID] [int] IDENTITY(1000,1) NOT NULL,
[RowMasks] [bigint] NULL,
[SalePrice] [numeric](38, 8) NULL
)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (1,32.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (1,40.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (10,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (12,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (128,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (129,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (130,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (132,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (136,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (144,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (16,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (160,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (17,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (18,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (192,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (2,157.25000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (2,185.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (20,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (24,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (3,185.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (32,150.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (33,150.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (34,185.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (36,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (4,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (40,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (48,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (5,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (6,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (64,120.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (65,120.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (66,185.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (68,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (72,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (8,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (80,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (9,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,150.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,210.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (68,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (72,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (8,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (80,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (9,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,150.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,210.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (1,32.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (1,40.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (10,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (12,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (128,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (129,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (130,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (132,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (136,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (144,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (16,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (160,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (17,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (18,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (192,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (2,157.25000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (2,185.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (20,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (24,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (3,185.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (32,150.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (33,150.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (34,185.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (36,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (4,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (40,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (48,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (5,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (6,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (64,120.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (65,120.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (66,185.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (68,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (72,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (8,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (80,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (9,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,150.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,210.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (68,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (72,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (8,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (80,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (9,1496.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,150.00000000)
INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,210.00000000)
Now, for the CTE itself:
WITH cte
(
TotalPrice,
RowID,
Combination,
PermMask
)
AS
(
SELECT
TotalPrice = A.SalePrice,
RowID = CAST(BasketSaleRowID as int),
Combination = CAST(A.BasketSaleRowID as varchar(4000)),
PermMask = RowMasks
FROM
(
Select
BasketSaleRowID, SalePrice, RowMasks
from
DemoData WITH(NOLOCK)
) A
UNION ALL
SELECT
TotalPrice = A.SalePrice + CTE.TotalPrice,
RowID = CAST(BasketSaleRowID as int),
Combination = CAST(CAST(A.BasketSaleRowID as varchar(10))+','+ISNULL(cte.Combination,'') as varchar(4000)),
PermMask = RowMasks|PermMask
FROM
(
Select
BasketSaleRowID, SalePrice, RowMasks
from
DemoData WITH(NOLOCK)
) A
JOIN cte
ON
A.BasketSaleRowID<cte.RowID
AND
CTE.PermMask&A.RowMasks=0
)
select top 1 * from CTE WHERE permmask=255 order by TotalPrice
the CTE generates 544424 records in a few MS, but the ORDER BY kills it...
Any help will be apreciated.
Thanks
Tal
July 7, 2008 at 7:25 am
OK. I'm stumped. I tried mucking with the indexes and loading the data into a temp table. The temp table fixed the query time on the sort, but it took almost as much time to insert the data.
Does this data need to be dynamically determined in near real time? You could consider using a materialized view, generate the data at night and then have it available all day long.
One question, why do you do this:
(
Select
BasketSaleRowID, SalePrice, RowMasks
from
DemoData WITH(NOLOCK)
) A
Instead of this:
FROM DemoData AS A WITH(NOLOCK)
Yours isn't broken or anything, but that's a lot of extra typing (and I'm lazy).
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 7, 2008 at 8:47 am
Hi
Thanks for replying...
The reason for the misterious select is that the demo data table is a bit more complex in real world, but this does not affect execution time and merely belongs to inner line of business rules, so I made it simpler.
But, hey, thanks for that tip too.
As for offline comutations, I'm afraid it's not possible.. It is an E-commerce application and users will be adding/removing products all the time...
These will affect the data the CTE is processing..
Isn't it weired that such complex CTE takes no time and a (stupid) sort ruines it?!
I'm doomed...
July 7, 2008 at 9:03 am
Actually no, it's not that weird. The optimizer and the query engine are pretty amazing pieces of engineering. They're able to identify that you only need a single row returned and therefor adjust accordingly. It's the sort that forces it to ignore the TOP 1 and instead attack the entire data set...
Just for the purposes of an exeriment, I really didn't think it would make a difference, I tried using the FAST 1 query hint. No joy at all.
You know, looking at the execution plans, what I see is, the optimizer immediately only selects 76 rows instead of 544,000+ in the second part of the query... The key would be to somehow arrive at that same method of filitration...
Um, does this change the query too much? I moved the WHERE clause up into the recursion:
JOIN cte
ON
A.BasketSaleRowID<cte.RowID
AND
CTE.PermMask&A.RowMasks=0
WHERE permmask = 255
Now the ORDER BY query runs VERY fast. I'm just not sure it's returning the right data.
"The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
- Theodore Roosevelt
Author of:
SQL Server Execution Plans
SQL Server Query Performance Tuning
July 7, 2008 at 9:34 am
For a moment, I thought you caught me...
No, unfortunately it will not return the right data, since the recursion will not take place correctly...
Just to make sure I tried it and it returned 32$ instead of 3297$ which is the right deal.. However, I'm sure the buyers will be hapy to pay only 1% of the real price...:)
I'll keep drilling dipper. Maybe I could get a driller like the ones in ARMAGEDDON
July 7, 2008 at 9:36 am
I tried some tests on this. Not sure why the Order By is adding so much time to it, but it sure is.
Best I can suggest is pre-calculate the data and store it (indexed view or separate table). That'll only work if the data and the select criteria don't change more frequently than the pre-calculation time.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 7, 2008 at 9:52 am
I have some thoughts on this.
Are there rules on which items can be comibined with which other items? Like:
A and B
B and C
A and B and C
but not
A and C
Or anything else?
Also, for "best pricing", I'm assuming that if A costs $10 and B costs $5 and C costs $2, and if you buy all three, you get one of them free, which means that A and B = 5, B and C = 2, A and B and C = $7. Is that correct?
If I add in item D at $3:
A and B = $5 (least of 2)
A and C = $2
A and D = $3
B and C = $2
B and D = $3
C and D = $2
A and B and C = $7 (2 least of 3)
A and B and D = $8
B and C and D = $5
A and B and C and D = $5 (2 least of 4; basically, combination of A and C, and B and D)
Are these assumptions correct?
Also, how often do the prices on these items change?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 7, 2008 at 10:20 am
It goes like this:
Any roduct can have one of the following prices:
1 - Ordinary price (no discount)
2 - some discount (X$, X%, etc)
3 - 1+1 (if you take two items (specified list, not necessarily of same product.))
and some more rules.
for example:
a=10$
B = 20$
C = 30$
D = 40$
assuming all combinations could be of type 1+1:
a+b = 20$ (least expensive is waived, you save 10$)
c+d = 40$ (you save 30$)
that's 40$ waived of =f original price
but if you take products
a+c that's 10$ off
b+d that's 20$
you save only 30$...
combinations are always per only two products and never more, but it gets complex when you take 4 producs of A and 6 products of B, etc...
Yeah, I know I have a lousy job...:)
July 7, 2008 at 12:32 pm
Creating an index like the following reduces the time somewhat (on my PC, went down to 12 seconds from 19 seconds), but the performance is still bad.
Create Clustered Index idx_DemoData On DemoData (BasketSaleRowID, RowMasks)
Rupendra
July 8, 2008 at 11:44 am
What are the bitwise operators in there for? What rule are they enforcing?
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 9, 2008 at 10:59 am
Hi
They force you to select unique items. Since this is a CTE that generates permutations, you should select only the combinations that contain all items and each item once only. This way you're sure that no items is missing or appears twice.
Although moving to bitwise limited my basket to only 31 items and was made for performance test purposes, when I solve the big problem, I'll go back to my old method of enforcing this rule.
Tal
July 9, 2008 at 12:22 pm
Hey GSquared...
I think I just solved my own puzzle...
I added the following filter to the end of the CTE:
WHERE CTE.permmask>A.RowMasks
WITH cte
(
TotalPrice,
RowID,
Combination,
PermMask
)
AS
(
SELECT
TotalPrice = cast(A.SalePrice as int),
RowID = CAST(BasketSaleRowID as int),
Combination = CAST(A.BasketSaleRowID as varchar(4000)),
PermMask = RowMasks
FROM
(
Select
BasketSaleRowID, SalePrice, RowMasks
from
DemoData WITH(NOLOCK)
) A
UNION ALL
SELECT
TotalPrice = CAST(A.SalePrice as int) + CTE.TotalPrice,
RowID = CAST(BasketSaleRowID as int),
Combination = CAST(CAST(A.BasketSaleRowID as varchar(10))+','+ISNULL(cte.Combination,'') as varchar(4000)),
PermMask = RowMasks|PermMask
FROM
(
Select
BasketSaleRowID, SalePrice, RowMasks
from
DemoData WITH(NOLOCK)
) A
JOIN cte
ON
A.BasketSaleRowID<cte.RowID
AND
CTE.PermMask&A.RowMasks=0
WHERE CTE.permmask>A.RowMasks
)
SELECT
TOP 1 *
FROM
CTE
WHERE
permmask=255
ORDER BY
TotalPrice
I'm starting to check if all other tests yield correct results too...
YIPIKAYE #&^%&#@^%$@
Tal
July 9, 2008 at 3:35 pm
Cool biz. Hope that gets you what you need. 🙂
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 9, 2008 at 9:50 pm
Hey GSquared
After a few more tests, I'm now facing a funny situation where my demo code runs fine, but my original code doesn't benefit from that improvement at the same scale. At least the data is correct, so it should be a small roblem...
It was night so I went to sleep and maybe things will be more clear when I'm not yawning...
Anyways, I really appreciate your help and must tell you that the solution is abased on your idea to put the WHERE clause higher in the hierarchy.
Thanks for accompanying me with this one.
Tal
July 10, 2008 at 11:47 am
You're welcome.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 15 posts - 1 through 14 (of 14 total)
You must be logged in to reply to this topic. Login to reply