June 19, 2012 at 1:29 pm
This is something that may be better handled in an application of perhaps a CLR aggregate function of some sort.
Reminds me of something used for packaging parts, looking for the max number of items that would fit in a specific size box for example.
June 19, 2012 at 1:48 pm
And so, a link to Hugo Kornelis's bin packing series, which may spark an idea or 2.
http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/30/bin-packing-part-1-setting-a-baseline.aspx
June 19, 2012 at 1:59 pm
Les Cardwell (6/19/2012)
Lynn,It's a redistricting (route) requirement, where we're looking to contain each district to roughly the same number of meters (service locations). Urban growth and all that.
That's interesting, because that sounds like it would impose some reasonably important links between data points. After all you wouldn't want to superimpose districts, so you wouldn't be looking for ANY set, but sets of "adjacent" data points. This would severely reduce the amount of combinations to evaluate.
Is there adjacency data missing from the description?
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 19, 2012 at 2:48 pm
Les Cardwell (6/19/2012)
Chasing,>>All the possible combinations of those rows that added to 5?
All the possible DISTINCT combinations.
In the vein of being pedantic (because that's the only way I can contribute on this one π ), probabilistically speaking, combinations are distinct. Permutations have repeats, in all possible orders.
So, the permutation count for David's scenario is 50*49*48*47*46 = 50!/45! = 254,251,200
That would leave 120 (5*4*3*2*1) permutations for each distinct combination, so the combination count = 50!/(45!*5!) = 2,118,760 records.
I hope this is being printed on green-bar paper on a dot-matrix printer.
I'm going to go to my corner now.
June 19, 2012 at 2:58 pm
>>And so, a link to Hugo Kornelis's bin packing series, which may spark an idea or 2.
Thanks. It does have similarities. I'll have to give it a thorough read.
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 19, 2012 at 3:01 pm
>>Is there adjacency data missing from the description?
The adjacencies have already been resolved.
Thx,
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 19, 2012 at 3:40 pm
There's a simple solution for this. Anyone remember the challenge of obtaining every possible combination of the letters of the alphabet, starting A, AB, AC.....BC, BD, BE......ABCDEFGHIJKLMNOPQRSTUVWXYZ....ZY, ZZ?
Generate instead a list of integer rowIDs of the original data. Think CROSS APPLY and IN, and you've cracked it.
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
June 19, 2012 at 3:46 pm
So, the permutation count for David's scenario is 50*49*48*47*46 = 50!/45! = 254,251,200
That would leave 120 (5*4*3*2*1) permutations for each distinct combination, so the combination count = 50!/(45!*5!) = 2,118,760 records.
LOL...a math major I am not, but that would explain the problem, since there are ~240 possible combinations to resolve to 30. I do have a 'good enough' result by breaking it into smaller sections, but was curious about other possible algorithms to the seemingly simple problem initially stated, or maybe overlooking the obvious.
Thx,
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 19, 2012 at 3:56 pm
Chris,
Oddly...I haven't seen that one. Any idea where I might track it down?
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 19, 2012 at 7:29 pm
Les Cardwell (6/19/2012)
Lynn,It's a redistricting (route) requirement, where we're looking to contain each district to roughly the same number of meters (service locations). Urban growth and all that.
How many "routes" do you have to do this problem for, Les? To be sure, how many routes do you have and how many to you want? Also, just in case I have an epiphany, what are the values for the smallest and largest routes?
--Jeff Moden
Change is inevitable... Change for the better is not.
June 19, 2012 at 7:38 pm
While I am loathe to quote myself (well, not really, just saying), have you seen this article?
http://www.sqlservercentral.com/articles/sql+n-Tuples/89809/
A minor modification to calculate the value of each n-tuple and then select for those that equal 10 I believe gives you the result you're looking for.
DECLARE @t TABLE (Prod CHAR(10), value INT)
INSERT INTO @t
SELECT '1000', 3 UNION ALL SELECT '1001', 5 UNION ALL SELECT '1002', 6
UNION ALL SELECT '1003', 5 UNION ALL SELECT '1004', 8 UNION ALL SELECT '1005', 6
;WITH UNIQUEnTuples (n, Tuples, value) AS (
SELECT DISTINCT 1, CAST(Prod AS VARCHAR(max)), value
FROM @t
UNION ALL
SELECT 1 + n.n, t.Prod + ',' + n.Tuples, t.value + n.value
FROM @t t JOIN UNIQUEnTuples n ON t.Prod < n.Tuples
WHERE CHARINDEX(t.Prod, n.Tuples) = 0
)
SELECT *
FROM UNIQUEnTuples
WHERE value = 10
ORDER BY n, Tuples
If your Prod column is not-unique, just remove the DISTINCT.
Also, with 240 combinations, you'll probably need to set a leveling factor to limit to 1, 2, 3, etc. combinations of items, otherwise this algorithm will run a long time. Like this:
WHERE CHARINDEX(t.Prod, n.Tuples) = 0 AND n <= 3
Can't seem to make it say n less than or equal 3 in the SQL above.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 19, 2012 at 7:46 pm
I keep associating districts with a geographical area, so I cannot help but think that using distance alone is not going to serve you well in this case. Would it make more sens to turn it into a map-reduce type problem?
As in:
- figure out your total coverage area
- split your area in 2. and calculate the density. if the density it substantially higher on one side, split it again.
- keep running a split until you have the number of districts needed or the right average size of districts needed.
- normalize once you have the rough cut.
With coordinates for each point, should be easy enough to do the rough groupings.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
June 20, 2012 at 6:10 am
OK sports fans, I've now read through this entire thread and I'm now 90% the solution I posted before will work for the OP. I was a bit concerned about speed though, so I ran a little test with the following (modified) version of the code.
Constraints:
1. Max tuples combined is 3
2. Value must sum to 10 for combination
3. Total number of items = 250
Here is the test harness and the code.
DECLARE @t TABLE (Prod VARCHAR(4), value INT)
;WITH Tally (n) AS (
SELECT TOP 250 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM sys.all_columns)
INSERT INTO @t
SELECT RIGHT('000' + CAST(n AS VARCHAR), 4), ABS(CHECKSUM(NEWID())) % 10 + 1
FROM Tally
;WITH UNIQUEnTuples (n, Tuples, value, [check]) AS (
SELECT 1, CAST(Prod AS VARCHAR(max)), value, CAST(value AS VARCHAR(MAX))
FROM @t
UNION ALL
SELECT 1 + n.n, t.Prod + ',' + n.Tuples, n.value + t.value, [check]+'+'+CAST(t.value AS VARCHAR(MAX))
FROM @t t JOIN UNIQUEnTuples n ON t.Prod < n.Tuples
WHERE CHARINDEX(t.Prod, n.Tuples) = 0 AND n <= 3 AND t.value + n.value <= 10
)
SELECT *
FROM UNIQUEnTuples
WHERE value = 10
ORDER BY n, Tuples
Results:
1. Rows returned = 122000 give or take
2. Elapsed time = ~ 2 minutes
Improvements to the code:
1. I eliminated the DISTINCT clause on the anchor leg of the CTE because I didn't think it would be necessary.
2. Limited the number of levels examined to 3 or less
3. Removed rows where the value of the Tuples is > 10 (no longer need to be considered at level = n+1)
4. Added a check column to show you the amounts added together
By the way, this is similar to the traditional OR Knapsack problem (http://en.wikipedia.org/wiki/Knapsack_problem) except that the constraint on the KP is less than or equal, whereas the constraint here is equal.
The KP is traditionally solved using a Dynamic programming algorithm, which is also recursive. In this case, my solution is brute-force but sufficient for a problem of this small size.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 20, 2012 at 10:07 am
Sheesh...quite the brain trust up here. It's going to take me a bit to wade through these. I'll report the results after I come up for air π
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 21, 2012 at 6:13 am
David Webb-200187 (6/19/2012)
Hmmmm....Let's say you had 50 rows, all with a quantity of 1. The target quantity was 5.
How would you want that brought back? All the possible combinations of those rows that added to 5?
Just the rows that could possibly be a component of the 5? You'd end up bringing them all back. What would that tell you? Would you want all the combinations grouped by some artificial designator (like a solution_number)?
I agree with the idea that t-sql isn't the best vehicle for generating solution combinations like this. A number crunching package like SAS or SPSS would be a better bet ( or find someone who has an old APL compiler).
1,125,899,906,842,624 distinct combinations in 50 rows:
DROP TABLE #Sample
CREATE TABLE #Sample (Item INT, Qty INT)
INSERT INTO #Sample (Item, Qty)
SELECT 1, 3 UNION ALL
SELECT 2, 5 UNION ALL
SELECT 3, 6 UNION ALL
SELECT 4, 5 UNION ALL
SELECT 5, 8 UNION ALL
SELECT 6, 6
-----------------------------------------------------
-- Generate every row combination from the input set, concatenate into a string:
-- SELECT UniquePerms = POWER(2,n)-1
-- FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) d (n)
-- where n = number of rows (six, in this case)
-- the rCTE generates about a million rows in 30 seconds, corresponding to 20 input rows.
-----------------------------------------------------
;WITH Calculator AS(
SELECT
[Level]= 1,
MyString= CAST('01' AS VARCHAR(40)),
LastDigit= CAST(1 AS BIGINT),
Pos= 1,
MaxLevels= (SELECT COUNT(*) FROM #Sample)
UNION ALL
SELECT
[Level]= [Level] + 1,
MyString= y.MyString,
LastDigit= y.NewDigit,
Pos= y.Newpos,
MaxLevels
FROM Calculator c
CROSS APPLY (
SELECT Newpos, NewDigit,
MyString = CAST(LEFT(c.MyString,d.Newpos-1) + RIGHT('0'+CAST(d.NewDigit AS VARCHAR),2) AS VARCHAR(40))
FROM (
SELECT Newpos, NewDigit = CASE
WHEN c.LastDigit = c.MaxLevels THEN 1+CAST(SUBSTRING(c.MyString,x.Newpos,2) AS BIGINT)
ELSE 1+c.LastDigit END
FROM (SELECT Newpos = CASE WHEN c.LastDigit = c.MaxLevels THEN c.Pos-2 ELSE c.Pos+2 END) x (Newpos)
) d (Newpos, NewDigit)
) y (Newpos, NewDigit, MyString)
WHERE c.[Level] < POWER(2,MaxLevels)-1
)
SELECT
c.MyString,
s.SUMQuantity
FROM Calculator c
CROSS APPLY ( -- unpivot the string MyString and use to probe table #Sample
SELECT SUM(qty)
FROM #Sample s
INNER JOIN (
SELECT n, RowID = CAST(SUBSTRING(c.MyString,(n*2)-1,2) AS INT)
FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) d (n)
WHERE n <= Pos+1
) d ON d.RowID = s.Item
) s (SUMQuantity)
WHERE s.SUMQuantity = 10
OPTION (MAXRECURSION 0);
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
Viewing 15 posts - 16 through 30 (of 109 total)
You must be logged in to reply to this topic. Login to reply