FIND column that hits criteria

  • hi all,

  • I can help you, just post sample data with CREATE TABLE and INSERT statements, so I don't waste time recreating it. You can find out how to do that in the links from my signature.

    Luis C.
    General Disclaimer:
    Are you seriously taking the advice and code from someone from the internet without testing it? Do you at least understand it? Or can it easily kill your server?

    How to post data/code on a forum to get the best help: Option 1 / Option 2
  • you can run the following with

  • also many theres a way to have a loop counter rather than me duplicating the same code 52 times with a new column name???

    your help is much appreciated.

  • Sorry. I've been looking at this, but I have no clue at all as to what you are trying to do.

    Is there a way for you to repost your problem, but without using RAND - so just using INSERT ... VALUES statements? And then add the results you want to have returned, along with an explanation. Using random data makes it much harder to explain the issue, because I will not have the same data you have.

    Also, try to use the minimum amount of data required to explain the problem. But do use enough to cover all possible exceptions and special cases!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • hope this is clear.

  • Randomly pick? There is no algorithm to use to decide where an order is filled? Sorry, this makes no sense.

  • it does make sense.

    ignore everything else ive said, just look at my previous post, create the table orders and insert those values into orders.

    and i would like the output as previous post.

    i have written the algorithm to decide where the orders appear in which zone, but ive removed that from the create orders table statement.

    the most simplest answer is order 1 where orderline 1,2 and 3 are fulfilled in zones 2,4,6,7, or 9. fulfilled meaning number 1 appears down that column for each orderline. THEN is choose a random number from 2,4,6,7 or 9. i chose 6 hence output table has orderID 1 and 6 under onePick.

    order 2 can be filled by zones 1 and 6, or 6 and 7. this is because order 2 has 4 lines. zone 1/6/7 sum is 2. therefore i choose 6 and 7 to fill order 2.

    in excel you would do 'for each cell in range' etc.

    surely this has to make some sense now.......

  • Okay, two disclaimers up front. First, I agree with Lynn that "random" does not sound as a very precise specification. It is also very hard to pull off in SQL Server, because of how the RAND() function behaves in a set-based query. Also, even if you can work around that, you still cannot expect a very good distribution of random numbers; I have been told that the algorithm RAND() uses for the pseudo-random number generation is not very good. That being said, if by "random" you actually mean "if there are multiple possible combinations I don't care which one is used", then it can be done.

    Second, even though I managed to find a solution in T-SQL, I am not at all convinced that the database is the right place for this type of logic. If you can, consider moving the logic to the application tier.

    That being said, the problem was interesting enough that I decided to give it a go. If I understand your problem correctly, the Orders table you posted has a row for each orderline, with a column for each zone (where a zone is probably like a warehouse location??) where that orderline can be picked from. Your query tries to find a single zone that has all the items for all orderlines of an order, and failing that tries to find the minimum amount of zones; with a maximum of four orderlines per order, the worst case is having to use four zones for fulfilling the order. Is that interpretation correct?

    The first major problem is that your table is absolutely not normalized. The ten columns z1 through z10 are a repeating group, which violates first normal form. The effect of this violation is that any attempt to solve this issue on the table as given will have to repeat the same code 10 times, plus add a lot of awkawrd extra stuff. So instead of going down that path, I decided to as a first step normalize the table, using the code below:

    CREATE TABLE #Orders

    (OrderID int NOT NULL,

    OrderLine int NOT NULL,

    Zone tinyint NOT NULL,

    CountOfLines tinyint NOT NULL,

    PRIMARY KEY (Zone, OrderID, OrderLine)

    );

    INSERT INTO #Orders (OrderID, OrderLine, Zone, CountOfLines)

    SELECT o.OrderID,

    o.OrderLine,

    v.Zone,

    MAX(OrderLine) OVER (PARTITION BY OrderID) -- This assumes that OrderLine is always sequentially from 1 up for each order.

    -- If that is not the case, this query becomes more complex.

    FROM dbo.Orders AS o

    CROSS APPLY(VALUES (1, o.z1),

    (2, o.z2),

    (3, o.z3),

    (4, o.z4),

    (5, o.z5),

    (6, o.z6),

    (7, o.z7),

    (8, o.z8),

    (9, o.z9),

    (10, o.z10)) AS v(Zone, Flag)

    WHERE v.Flag = 1;

    Note that I could have done this in a CTE instead of creating a temporary table, but because the normalized data is referenced multiple time it is more efficient to do the logic once and store and reuse the results. For the sake of performance and understandability of later queries, I have also added one non-normalized column for the count of orderlines in each order.

    The order of columns in the primary key is not intuitive, but I think that this will result in the best performance for the queries below. However, do test other orders of the columns in your database and with your data.

    Also, if you have any control over the schema in your database, then I would recommend permanently replacing the existing unnormalized design with the normalized version above. If you do that, then remove the CountOfLines column. I will gladly help you modify the queries below to compensate for losing that column.

    The query below uses a CTE to find all zones that are relevant (zone 10 is not used because the flag is not set on any orderline), then joins them to the normalized orders table to find all orderlines that can be served from any zone. After grouping by zone and order, I count the number of lines and compare it to the CountOfLines column. If they are equal, then all orderlines of that order can be served from that zone. So this query returns all OrderID/Zone combinations that can be used as a single zone - for your test data, that would be OrderID 1, Zones 2, 4, 6, 7, or 9.

    (To help you understand the query, comment out the GROUP BY and HAVING, add * in the SELECT list and run it; then remove the *, uncomment the HAVING, and add the two expressions used in it to the SELECT list, and run it again).

    WITH Zones

    AS (SELECT DISTINCT Zone FROM #Orders)

    SELECT o.OrderID, z.Zone

    FROM Zones AS z

    INNER JOIN #Orders AS o

    ON o.Zone = z.Zone

    GROUP BY z.Zone, o.OrderID

    HAVING COUNT(*) = MAX(o.CountOfLines);

    This returns all zones for the order and you need just one. You cannot add a TOP because with other data multiple orders might be returned from this query. But instead of that you can add a ROW_NUMBER function to the query, as follows, and you will always get the lowest-numbered zone:

    WITH Zones

    AS (SELECT DISTINCT Zone FROM #Orders),

    Solutions

    AS (SELECT o.OrderID, z.Zone,

    rn = ROW_NUMBER() OVER (PARTITION BY o.OrderID ORDER BY z.Zone)

    FROM Zones AS z

    INNER JOIN #Orders AS o

    ON o.Zone = z.Zone

    GROUP BY z.Zone, o.OrderID

    HAVING COUNT(*) = MAX(o.CountOfLines))

    SELECT OrderID, Zone

    FROM Solutions

    WHERE rn = 1

    ORDER BY OrderID;

    The above is nice for single-zone orders. Now move on to the two-zone orders. The query below uses a variation on the technique above, by doing what is effectively a cross join between two copies of the table of zones (using greater than to ensure that a zone is never combined with itself, and that the same combination of zones is never used twice - e.g. (1,2) is used, but (2,1) is not). The query as I post it also has a very crude way to exclude orders that have a single-zone solution; without that you would also return all possible combinations of two zones that can server order 1 - and that's a long list! You should of course replace this with something that does not hardcode the OrderID. Perhaps you can do a NOT EXISTS to a table where you store the single-zone solutions you already have? Anyway, the query that returns all possible two-zone solutions is as follows (and you can use the same trick as above to make it return just a single solution for each order that has a two-zone solution):

    WITH Zones

    AS (SELECT DISTINCT Zone FROM #Orders)

    SELECT o.OrderID, z1.Zone, z2.Zone

    FROM Zones AS z1

    INNER JOIN Zones AS z2

    ON z2.Zone > z1.Zone -- Effectively a cross join

    INNER JOIN #Orders AS o

    ON o.Zone IN (z1.Zone, z2.Zone)

    WHERE o.OrderID <> 1 -- Replace this with a good way

    -- to exclude orders selected in the first query

    GROUP BY z1.Zone, z2.Zone, o.OrderID

    HAVING COUNT(DISTINCT o.OrderLine) = MAX(o.CountOfLines)

    ORDER BY o.OrderID;

    Since the maximum number of zones is 4, you can now create two more copies of this query and adapt them to present three-zone and four-zone solutions. Just follow the same pattern and I'm sure you'll do fine. Just make sure to exclude all orders that have a solution with less zones, not just the orders with a one-zone solution.

    The above is not a single set-based solution. The code uses a technique that I invented almost five years ago, and that I call set-based iteration - instead of trying to process all the orders in one single query (set-based), or processing one order at a time (iteration), I do a much smaller amount of iterations that each process a subset of the work - in this case four iterations, for every possible number of zones. In this case, it is quite likely that this will be the fastest possible solution for you.

    However, I also created a fully set-based alternative. The query below is the starting point for this - as you can see, I now always join four zones, but instead of using greater than I use greater than or equal - so I use for instance four copies of the same zone to represent a single-zone solution. The third and fourth zone become a bit awkward because I do want zone combinations such as (1, 2, 3, 3) [the first three zones], but not (1, 1, 2, 3) [which would represent the same combination of zones in a different way]. So once a second or third zone is equal to the previous zone, all later zones have to be equal as well.

    WITH Zones

    AS (SELECT DISTINCT Zone FROM #Orders)

    SELECT o.OrderID, z1.Zone, z2.Zone

    FROM Zones AS z1

    INNER JOIN Zones AS z2

    ON z2.Zone > z1.Zone -- Effectively a cross join

    INNER JOIN #Orders AS o

    ON o.Zone IN (z1.Zone, z2.Zone)

    WHERE o.OrderID <> 1 -- Replace this with a good way

    -- to exclude orders selected in the first query

    GROUP BY z1.Zone, z2.Zone, o.OrderID

    HAVING COUNT(DISTINCT o.OrderLine) = MAX(o.CountOfLines)

    ORDER BY o.OrderID;

    If you run this query, you will see that it returns a lot of rows. For your small testset, you already get 758 rows. That is all possible combinations of 1, 2, 3, or 4 zones for each order. Obviously, we need to water this down.

    The query below represents the next step. I have added a CTE "PossibleSolutions" that is effectively the previous query, with some smart use of NULLIF to replace repeated zones with NULL (so that you can more easy see how many zones are actually used). The outer query then uses an ORDER BY to ensure that for each order, the solutions with the least zones are presented first:

    WITH Zones

    AS (SELECT DISTINCT Zone FROM #Orders),

    PossibleSolutions

    AS (SELECT o.OrderID,

    z1.Zone AS onePick,

    NULLIF(z2.Zone, z1.Zone) AS twoPicks,

    NULLIF(z3.Zone, z2.Zone) AS threePicks,

    NULLIF(z4.Zone, z3.Zone) AS fourPicks

    FROM Zones AS z1

    INNER JOIN Zones AS z2

    ON z2.Zone >= z1.Zone

    INNER JOIN Zones AS z3

    ON z3.Zone >= z2.Zone

    AND NOT (z2.Zone = z1.Zone AND z3.Zone > z2.Zone)

    INNER JOIN Zones AS z4

    ON z4.Zone >= z3.Zone

    AND NOT (z3.Zone = z2.Zone AND z4.Zone > z3.Zone)

    INNER JOIN #Orders AS o

    ON o.Zone IN (z1.Zone, z2.Zone, z3.Zone, z4.Zone)

    GROUP BY z1.Zone, z2.Zone, z3.Zone, z4.Zone, o.OrderID

    HAVING COUNT(DISTINCT o.OrderLine) = MAX(o.CountOfLines))

    SELECT * FROM PossibleSolutions

    ORDER BY OrderID,

    CASE WHEN twoPicks IS NULL THEN 0 ELSE 1 END,

    CASE WHEN threePicks IS NULL THEN 0 ELSE 1 END,

    CASE WHEN fourPicks IS NULL THEN 0 ELSE 1 END;

    Now as a last trick, we can use the logic in the ORDER BY above in a RANK() function. Embed everything in yet another CTE and then select on rank 1 to get, for each order, only the solutions with the lowest amount of zones.

    WITH Zones

    AS (SELECT DISTINCT Zone FROM #Orders),

    PossibleSolutions

    AS (SELECT o.OrderID,

    z1.Zone AS onePick,

    NULLIF(z2.Zone, z1.Zone) AS twoPicks,

    NULLIF(z3.Zone, z2.Zone) AS threePicks,

    NULLIF(z4.Zone, z3.Zone) AS fourPicks

    FROM Zones AS z1

    INNER JOIN Zones AS z2

    ON z2.Zone >= z1.Zone

    INNER JOIN Zones AS z3

    ON z3.Zone >= z2.Zone

    AND NOT (z2.Zone = z1.Zone AND z3.Zone > z2.Zone)

    INNER JOIN Zones AS z4

    ON z4.Zone >= z3.Zone

    AND NOT (z3.Zone = z2.Zone AND z4.Zone > z3.Zone)

    INNER JOIN #Orders AS o

    ON o.Zone IN (z1.Zone, z2.Zone, z3.Zone, z4.Zone)

    GROUP BY z1.Zone, z2.Zone, z3.Zone, z4.Zone, o.OrderID

    HAVING COUNT(DISTINCT o.OrderLine) = MAX(o.CountOfLines)),

    RankedSolutions

    AS (SELECT OrderID,

    onePick,

    twoPicks,

    threePicks,

    fourPicks,

    rnk = RANK() OVER (PARTITION BY OrderID

    ORDER BY CASE WHEN twoPicks IS NULL THEN 0 ELSE 1 END,

    CASE WHEN threePicks IS NULL THEN 0 ELSE 1 END,

    CASE WHEN fourPicks IS NULL THEN 0 ELSE 1 END)

    FROM PossibleSolutions)

    SELECT OrderID,

    onePick,

    twoPicks,

    threePicks,

    fourPicks

    FROM RankedSolutions

    WHERE rnk = 1

    ORDER BY OrderID;

    To get just a single solution for each order, replace RANK() by ROW_NUMBER(). This will return just one of the solutions for each order. Since the ORDER BY in ROW_NUMBER() is not a unique column combination, the zones returned are not deterministic, however this is also not random - the result will probably always be the same for the same input data, but that is not guaranteed. You can make it deterministic by adding ", onePick, twoPicks, threePicks, fourPicks" at the end of the ORDER BY specification to know for sure that you will always get the solution with the lowest-numbered bins.

    Once more, I am not convinced that this single set-based version will actually perform better than the set-based iteration method. To find out, test both on a table that has sufficient data. And don't forget to play around with the order of the columns in the primary key constraint of the temporary #Orders table, since that might also effect performance.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • EDIT: I forgot to mention that my code returns some three-zone options for order 4; your expected results show four zones. I hope that this is because you made a mistake in posting, otherwise I have misunderstood the requirement and typed that novella for nought. :crying:


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Holy moly Hugo - you're too good!!!!

    I will have to read it slowly and understand the logic and test your code. I am on a charity run tomorrow but it might take a few days to get back to you.

    so much appreciated. :-D:-D:-D

  • Yes, please do take the time to ensure you understand what the code is doing. Code-writiing by copy/paste has been the downfall of many companies already. If you don't understand it, you'll be unable to maintain it.

    Do post back if you have any questions!

    Good luck on the charity run - I don't know what charity you are running for, but I hope that you make them loads of cash!


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • i will reply to you once ive done it.

  • Hi Hugo,

  • Let's see if I can do this without messing up - I want to reply to some of your questions in a different order, so I have to move bits and pieces around in the quote block.

    tajsohal (3/1/2016)


    I then went onto the full set based version. Excellent coding, and it took me a while to understand and research WITH and creating CTE.

    After a number of test cycles, i am confident the results are correct. Brilliant.

    This was a small number of orders and zones. I have 40 zones, and orders go to 100+. I extended the code to use 40 zones and a test run on 100 orders. My pre code produced 100 orders, 277 order lines (orders having an average of 3 order lines, so an order can have 2/3/4 lines). This produces around 5000 lines of combinations in the #Orders temp table.

    This dramatically slowed down the execution of the code. it produced the right results but it took 64mins. I will now play around with the temp table and look into pk constraint. But is there any way to improve performance?

    Yes, especially with this higher number of zones and rows the execution times will go up dramatically. That's because it will find all combinations of zones first before trimming it down to the ones with few zones. So if for instance you have a single order for a single item that is on stock in 20 zones, there will be 20 single-zone solutions, but also (20*19=) 380 two-zone solutions, (20*19*18=) 6840 three-zone solutions, and (20*19*18*17=) 116280 four-zone solutions - and that's just one single order! (And be glad that you capped the problem to never look at more than four zones...)

    The "all-at-once" query will first find all those solutions, then order them by number of zones used and throw away most of the intermediate results.

    That's why I posted the "set-based iteration" approach that uses just four steps, and each step only processes the orders not yet served by the previous steps. That should be a lot faster, because it's only the orders that have a "simpler" solution that create such huge amount of possible "more complex" solutions.

    I am having trouble with the iteration version when duplicating for third and four zone picks:

    WHERE o.OrderID <> 1 -- Replace this with a good way

    -- to exclude orders selected in the first query

    GROUP BY z1.Zone, z2.Zone, o.OrderID

    what if order 1 was a two or more picked zone? The where clause excludes orderID not if orderID was one picked.

    Hence the comment to replace this with a good way to exclude orders selected in the previous query. The hardcoded exclusion of OrderID 1 was just a simple Proof of Concept to show the idea.

    I figure if you go by the set-based iteration approach, your first step (that finds the single-zone solutions) will not immediately return the solutions to the client, but store it in an intermediate table. You then in the second step add the two-zone solutions, then the three-zones, and finally the four-zones. And you only return the results to the client when the results are complete.

    So the actual WHERE to exclude orders that already have a solution would be a NOT EXISTS into that intermediate table. To make this efficient, ensure that the intermediate table has an index either on just the OrderID column or with OrderID as the first column.

    Please write out the full code for the set-based iteration approach (I could do it for you, but I think you'll understand it a lot better if you at least try to do it yourself - which is important if you need to maintain it later, or to explain it to your successor). If you get stuck working out the code, post what you have and I'll help you over the bump in the road.

    I cannot guarantee that the set-babsed iteration will be fast enough for your needs, because you still do some pretty complex joining, but I am confident that it will be lot faster than the single set-based solution.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply