Selecting just one record using ORDER BY

  • Hi all

    I have this script that calculates the best product combinations based on their sale price.

    Anyway, I use recursive CTE to generate all possible combinations, from which I only need the cheapest one.

    Running the select top 1 is quick, but adding the ORDER BY TotalPrice takes query down to 17 seconds.

    Here is how to reproduce it:

    First, create the demo table. It has 92 records.

    CREATE TABLE [dbo].[DemoData](

    [BasketSaleRowID] [int] IDENTITY(1000,1) NOT NULL,

    [RowMasks] [bigint] NULL,

    [SalePrice] [numeric](38, 8) NULL

    )

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (1,32.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (1,40.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (10,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (12,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (128,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (129,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (130,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (132,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (136,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (144,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (16,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (160,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (17,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (18,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (192,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (2,157.25000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (2,185.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (20,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (24,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (3,185.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (32,150.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (33,150.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (34,185.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (36,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (4,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (40,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (48,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (5,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (6,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (64,120.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (65,120.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (66,185.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (68,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (72,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (8,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (80,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (9,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,150.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,210.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (68,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (72,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (8,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (80,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (9,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,150.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,210.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (1,32.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (1,40.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (10,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (12,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (128,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (129,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (130,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (132,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (136,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (144,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (16,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (160,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (17,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (18,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (192,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (2,157.25000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (2,185.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (20,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (24,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (3,185.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (32,150.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (33,150.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (34,185.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (36,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (4,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (40,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (48,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (5,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (6,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (64,120.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (65,120.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (66,185.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (68,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (72,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (8,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (80,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (9,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,150.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,210.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (68,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (72,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (8,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (80,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (9,1496.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,150.00000000)

    INSERT INTO DemoData(RowMasks,SalePrice) VALUES (96,210.00000000)

    Now, for the CTE itself:

    WITH cte

    (

    TotalPrice,

    RowID,

    Combination,

    PermMask

    )

    AS

    (

    SELECT

    TotalPrice = A.SalePrice,

    RowID = CAST(BasketSaleRowID as int),

    Combination = CAST(A.BasketSaleRowID as varchar(4000)),

    PermMask = RowMasks

    FROM

    (

    Select

    BasketSaleRowID, SalePrice, RowMasks

    from

    DemoData WITH(NOLOCK)

    ) A

    UNION ALL

    SELECT

    TotalPrice = A.SalePrice + CTE.TotalPrice,

    RowID = CAST(BasketSaleRowID as int),

    Combination = CAST(CAST(A.BasketSaleRowID as varchar(10))+','+ISNULL(cte.Combination,'') as varchar(4000)),

    PermMask = RowMasks|PermMask

    FROM

    (

    Select

    BasketSaleRowID, SalePrice, RowMasks

    from

    DemoData WITH(NOLOCK)

    ) A

    JOIN cte

    ON

    A.BasketSaleRowID<cte.RowID

    AND

    CTE.PermMask&A.RowMasks=0

    )

    select top 1 * from CTE WHERE permmask=255 order by TotalPrice

    the CTE generates 544424 records in a few MS, but the ORDER BY kills it...

    Any help will be apreciated.

    Thanks

    Tal

  • OK. I'm stumped. I tried mucking with the indexes and loading the data into a temp table. The temp table fixed the query time on the sort, but it took almost as much time to insert the data.

    Does this data need to be dynamically determined in near real time? You could consider using a materialized view, generate the data at night and then have it available all day long.

    One question, why do you do this:

    (

    Select

    BasketSaleRowID, SalePrice, RowMasks

    from

    DemoData WITH(NOLOCK)

    ) A

    Instead of this:

    FROM DemoData AS A WITH(NOLOCK)

    Yours isn't broken or anything, but that's a lot of extra typing (and I'm lazy).

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • Hi

    Thanks for replying...

    The reason for the misterious select is that the demo data table is a bit more complex in real world, but this does not affect execution time and merely belongs to inner line of business rules, so I made it simpler.

    But, hey, thanks for that tip too.

    As for offline comutations, I'm afraid it's not possible.. It is an E-commerce application and users will be adding/removing products all the time...

    These will affect the data the CTE is processing..

    Isn't it weired that such complex CTE takes no time and a (stupid) sort ruines it?!

    I'm doomed...

  • Actually no, it's not that weird. The optimizer and the query engine are pretty amazing pieces of engineering. They're able to identify that you only need a single row returned and therefor adjust accordingly. It's the sort that forces it to ignore the TOP 1 and instead attack the entire data set...

    Just for the purposes of an exeriment, I really didn't think it would make a difference, I tried using the FAST 1 query hint. No joy at all.

    You know, looking at the execution plans, what I see is, the optimizer immediately only selects 76 rows instead of 544,000+ in the second part of the query... The key would be to somehow arrive at that same method of filitration...

    Um, does this change the query too much? I moved the WHERE clause up into the recursion:

    JOIN cte

    ON

    A.BasketSaleRowID<cte.RowID

    AND

    CTE.PermMask&A.RowMasks=0

    WHERE permmask = 255

    Now the ORDER BY query runs VERY fast. I'm just not sure it's returning the right data.

    "The credit belongs to the man who is actually in the arena, whose face is marred by dust and sweat and blood"
    - Theodore Roosevelt

    Author of:
    SQL Server Execution Plans
    SQL Server Query Performance Tuning

  • For a moment, I thought you caught me...

    No, unfortunately it will not return the right data, since the recursion will not take place correctly...

    Just to make sure I tried it and it returned 32$ instead of 3297$ which is the right deal.. However, I'm sure the buyers will be hapy to pay only 1% of the real price...:)

    I'll keep drilling dipper. Maybe I could get a driller like the ones in ARMAGEDDON

  • I tried some tests on this. Not sure why the Order By is adding so much time to it, but it sure is.

    Best I can suggest is pre-calculate the data and store it (indexed view or separate table). That'll only work if the data and the select criteria don't change more frequently than the pre-calculation time.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • I have some thoughts on this.

    Are there rules on which items can be comibined with which other items? Like:

    A and B

    B and C

    A and B and C

    but not

    A and C

    Or anything else?

    Also, for "best pricing", I'm assuming that if A costs $10 and B costs $5 and C costs $2, and if you buy all three, you get one of them free, which means that A and B = 5, B and C = 2, A and B and C = $7. Is that correct?

    If I add in item D at $3:

    A and B = $5 (least of 2)

    A and C = $2

    A and D = $3

    B and C = $2

    B and D = $3

    C and D = $2

    A and B and C = $7 (2 least of 3)

    A and B and D = $8

    B and C and D = $5

    A and B and C and D = $5 (2 least of 4; basically, combination of A and C, and B and D)

    Are these assumptions correct?

    Also, how often do the prices on these items change?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • It goes like this:

    Any roduct can have one of the following prices:

    1 - Ordinary price (no discount)

    2 - some discount (X$, X%, etc)

    3 - 1+1 (if you take two items (specified list, not necessarily of same product.))

    and some more rules.

    for example:

    a=10$

    B = 20$

    C = 30$

    D = 40$

    assuming all combinations could be of type 1+1:

    a+b = 20$ (least expensive is waived, you save 10$)

    c+d = 40$ (you save 30$)

    that's 40$ waived of =f original price

    but if you take products

    a+c that's 10$ off

    b+d that's 20$

    you save only 30$...

    combinations are always per only two products and never more, but it gets complex when you take 4 producs of A and 6 products of B, etc...

    Yeah, I know I have a lousy job...:)

  • Creating an index like the following reduces the time somewhat (on my PC, went down to 12 seconds from 19 seconds), but the performance is still bad.

    Create Clustered Index idx_DemoData On DemoData (BasketSaleRowID, RowMasks)

    Rupendra

  • What are the bitwise operators in there for? What rule are they enforcing?

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hi

    They force you to select unique items. Since this is a CTE that generates permutations, you should select only the combinations that contain all items and each item once only. This way you're sure that no items is missing or appears twice.

    Although moving to bitwise limited my basket to only 31 items and was made for performance test purposes, when I solve the big problem, I'll go back to my old method of enforcing this rule.

    Tal

  • Hey GSquared...

    I think I just solved my own puzzle...

    I added the following filter to the end of the CTE:

    WHERE CTE.permmask>A.RowMasks

    WITH cte

    (

    TotalPrice,

    RowID,

    Combination,

    PermMask

    )

    AS

    (

    SELECT

    TotalPrice = cast(A.SalePrice as int),

    RowID = CAST(BasketSaleRowID as int),

    Combination = CAST(A.BasketSaleRowID as varchar(4000)),

    PermMask = RowMasks

    FROM

    (

    Select

    BasketSaleRowID, SalePrice, RowMasks

    from

    DemoData WITH(NOLOCK)

    ) A

    UNION ALL

    SELECT

    TotalPrice = CAST(A.SalePrice as int) + CTE.TotalPrice,

    RowID = CAST(BasketSaleRowID as int),

    Combination = CAST(CAST(A.BasketSaleRowID as varchar(10))+','+ISNULL(cte.Combination,'') as varchar(4000)),

    PermMask = RowMasks|PermMask

    FROM

    (

    Select

    BasketSaleRowID, SalePrice, RowMasks

    from

    DemoData WITH(NOLOCK)

    ) A

    JOIN cte

    ON

    A.BasketSaleRowID<cte.RowID

    AND

    CTE.PermMask&A.RowMasks=0

    WHERE CTE.permmask>A.RowMasks

    )

    SELECT

    TOP 1 *

    FROM

    CTE

    WHERE

    permmask=255

    ORDER BY

    TotalPrice

    I'm starting to check if all other tests yield correct results too...

    YIPIKAYE #&^%&#@^%$@

    Tal

  • Cool biz. Hope that gets you what you need. 🙂

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

  • Hey GSquared

    After a few more tests, I'm now facing a funny situation where my demo code runs fine, but my original code doesn't benefit from that improvement at the same scale. At least the data is correct, so it should be a small roblem...

    It was night so I went to sleep and maybe things will be more clear when I'm not yawning...

    Anyways, I really appreciate your help and must tell you that the solution is abased on your idea to put the WHERE clause higher in the hierarchy.

    Thanks for accompanying me with this one.

    Tal

  • You're welcome.

    - Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
    Property of The Thread

    "Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon

Viewing 15 posts - 1 through 14 (of 14 total)

You must be logged in to reply to this topic. Login to reply