Aggregate Query question

Question

Aggregate Query question

Viewing 15 posts - 16 through 30 (of 109 total)

You must be logged in to reply to this topic. Login to reply

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 1

This is something that may be better handled in an application of perhaps a CLR aggregate function of some sort.

Reminds me of something used for packaging parts, looking for the max number of items that would fit in a specific size box for example.

David Webb-CDS SSCoach Points: 17398 More actions · Answer 2

And so, a link to Hugo Kornelis's bin packing series, which may spark an idea or 2.

http://sqlblog.com/blogs/hugo_kornelis/archive/2007/11/30/bin-packing-part-1-setting-a-baseline.aspx

And then again, I might be wrong ...
David Webb

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

Les Cardwell (6/19/2012)
Lynn,
It's a redistricting (route) requirement, where we're looking to contain each district to roughly the same number of meters (service locations). Urban growth and all that.

That's interesting, because that sounds like it would impose some reasonably important links between data points. After all you wouldn't want to superimpose districts, so you wouldn't be looking for ANY set, but sets of "adjacent" data points. This would severely reduce the amount of combinations to evaluate.

Is there adjacency data missing from the description?

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

jeffem Ten Centuries Points: 1252 More actions · Answer 4

Les Cardwell (6/19/2012)
Chasing,
>>All the possible combinations of those rows that added to 5?
All the possible DISTINCT combinations.

In the vein of being pedantic (because that's the only way I can contribute on this one 😉 ), probabilistically speaking, combinations are distinct. Permutations have repeats, in all possible orders.

So, the permutation count for David's scenario is 50*49*48*47*46 = 50!/45! = 254,251,200

That would leave 120 (5*4*3*2*1) permutations for each distinct combination, so the combination count = 50!/(45!*5!) = 2,118,760 records.

I hope this is being printed on green-bar paper on a dot-matrix printer.

I'm going to go to my corner now.

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 5

>>And so, a link to Hugo Kornelis's bin packing series, which may spark an idea or 2.

Thanks. It does have similarities. I'll have to give it a thorough read.

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 6

>>Is there adjacency data missing from the description?

The adjacencies have already been resolved.

Thx,

~Les

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

ChrisM@home SSC-Insane Points: 24260 More actions · Answer 7

There's a simple solution for this. Anyone remember the challenge of obtaining every possible combination of the letters of the alphabet, starting A, AB, AC.....BC, BD, BE......ABCDEFGHIJKLMNOPQRSTUVWXYZ....ZY, ZZ?

Generate instead a list of integer rowIDs of the original data. Think CROSS APPLY and IN, and you've cracked it.

[font="Arial"]^{Low-hanging fruit picker and defender of the moggies}[/font]

For better assistance in answering your questions, please read this[/url].

Understanding and using APPLY, (I)[/url] and (II)[/url] Paul White[/url]

Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 8

So, the permutation count for David's scenario is 50*49*48*47*46 = 50!/45! = 254,251,200
That would leave 120 (5*4*3*2*1) permutations for each distinct combination, so the combination count = 50!/(45!*5!) = 2,118,760 records.

LOL...a math major I am not, but that would explain the problem, since there are ~240 possible combinations to resolve to 30. I do have a 'good enough' result by breaking it into smaller sections, but was curious about other possible algorithms to the seemingly simple problem initially stated, or maybe overlooking the obvious.

Thx,

~Les

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 9

Chris,

Oddly...I haven't seen that one. Any idea where I might track it down?

~Les

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

Jeff Moden SSC Guru Points: 1004434 More actions · Answer 10

Les Cardwell (6/19/2012)
Lynn,
It's a redistricting (route) requirement, where we're looking to contain each district to roughly the same number of meters (service locations). Urban growth and all that.

How many "routes" do you have to do this problem for, Les? To be sure, how many routes do you have and how many to you want? Also, just in case I have an epiphany, what are the values for the smallest and largest routes?

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Dwain Camps SSC Guru Points: 86978 More actions · Answer 11

While I am loathe to quote myself (well, not really, just saying), have you seen this article?

http://www.sqlservercentral.com/articles/sql+n-Tuples/89809/

A minor modification to calculate the value of each n-tuple and then select for those that equal 10 I believe gives you the result you're looking for.

DECLARE @t TABLE (Prod CHAR(10), value INT)

INSERT INTO @t

SELECT '1000', 3 UNION ALL SELECT '1001', 5 UNION ALL SELECT '1002', 6

UNION ALL SELECT '1003', 5 UNION ALL SELECT '1004', 8 UNION ALL SELECT '1005', 6

;WITH UNIQUEnTuples (n, Tuples, value) AS (

SELECT DISTINCT 1, CAST(Prod AS VARCHAR(max)), value

FROM @t

UNION ALL

SELECT 1 + n.n, t.Prod + ',' + n.Tuples, t.value + n.value

FROM @t t JOIN UNIQUEnTuples n ON t.Prod < n.Tuples

WHERE CHARINDEX(t.Prod, n.Tuples) = 0

)

SELECT *

FROM UNIQUEnTuples

WHERE value = 10

ORDER BY n, Tuples

If your Prod column is not-unique, just remove the DISTINCT.

Also, with 240 combinations, you'll probably need to set a leveling factor to limit to 1, 2, 3, etc. combinations of items, otherwise this algorithm will run a long time. Like this:

WHERE CHARINDEX(t.Prod, n.Tuples) = 0 AND n <= 3

Can't seem to make it say n less than or equal 3 in the SQL above.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 12

I keep associating districts with a geographical area, so I cannot help but think that using distance alone is not going to serve you well in this case. Would it make more sens to turn it into a map-reduce type problem?

As in:

- figure out your total coverage area

- split your area in 2. and calculate the density. if the density it substantially higher on one side, split it again.

- keep running a split until you have the number of districts needed or the right average size of districts needed.

- normalize once you have the rough cut.

With coordinates for each point, should be easy enough to do the rough groupings.

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Dwain Camps SSC Guru Points: 86978 More actions · Answer 13

OK sports fans, I've now read through this entire thread and I'm now 90% the solution I posted before will work for the OP. I was a bit concerned about speed though, so I ran a little test with the following (modified) version of the code.

Constraints:

1. Max tuples combined is 3

2. Value must sum to 10 for combination

3. Total number of items = 250

Here is the test harness and the code.

DECLARE @t TABLE (Prod VARCHAR(4), value INT)

;WITH Tally (n) AS (

SELECT TOP 250 ROW_NUMBER() OVER (ORDER BY (SELECT NULL))

FROM sys.all_columns)

INSERT INTO @t

SELECT RIGHT('000' + CAST(n AS VARCHAR), 4), ABS(CHECKSUM(NEWID())) % 10 + 1

FROM Tally

;WITH UNIQUEnTuples (n, Tuples, value, [check]) AS (

SELECT 1, CAST(Prod AS VARCHAR(max)), value, CAST(value AS VARCHAR(MAX))

FROM @t

UNION ALL

SELECT 1 + n.n, t.Prod + ',' + n.Tuples, n.value + t.value, [check]+'+'+CAST(t.value AS VARCHAR(MAX))

FROM @t t JOIN UNIQUEnTuples n ON t.Prod < n.Tuples

WHERE CHARINDEX(t.Prod, n.Tuples) = 0 AND n <= 3 AND t.value + n.value <= 10

)

SELECT *

FROM UNIQUEnTuples

WHERE value = 10

ORDER BY n, Tuples

Results:

1. Rows returned = 122000 give or take

2. Elapsed time = ~ 2 minutes

Improvements to the code:

1. I eliminated the DISTINCT clause on the anchor leg of the CTE because I didn't think it would be necessary.

2. Limited the number of levels examined to 3 or less

3. Removed rows where the value of the Tuples is > 10 (no longer need to be considered at level = n+1)

4. Added a check column to show you the amounts added together

By the way, this is similar to the traditional OR Knapsack problem (http://en.wikipedia.org/wiki/Knapsack_problem) except that the constraint on the KP is less than or equal, whereas the constraint here is equal.

The KP is traditionally solved using a Dynamic programming algorithm, which is also recursive. In this case, my solution is brute-force but sufficient for a problem of this small size.

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

Les Cardwell Ten Centuries Points: 1251 More actions · Answer 14

Sheesh...quite the brain trust up here. It's going to take me a bit to wade through these. I'll report the results after I come up for air 🙂

~Les

Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 15

David Webb-200187 (6/19/2012)
Hmmmm....
Let's say you had 50 rows, all with a quantity of 1. The target quantity was 5.
How would you want that brought back? All the possible combinations of those rows that added to 5?
Just the rows that could possibly be a component of the 5? You'd end up bringing them all back. What would that tell you? Would you want all the combinations grouped by some artificial designator (like a solution_number)?
I agree with the idea that t-sql isn't the best vehicle for generating solution combinations like this. A number crunching package like SAS or SPSS would be a better bet ( or find someone who has an old APL compiler).

1,125,899,906,842,624 distinct combinations in 50 rows:

DROP TABLE #Sample

CREATE TABLE #Sample (Item INT, Qty INT)

INSERT INTO #Sample (Item, Qty)

SELECT 1, 3 UNION ALL

SELECT 2, 5 UNION ALL

SELECT 3, 6 UNION ALL

SELECT 4, 5 UNION ALL

SELECT 5, 8 UNION ALL

SELECT 6, 6

-----------------------------------------------------

-- Generate every row combination from the input set, concatenate into a string:

-- SELECT UniquePerms = POWER(2,n)-1

-- FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) d (n)

-- where n = number of rows (six, in this case)

-- the rCTE generates about a million rows in 30 seconds, corresponding to 20 input rows.

-----------------------------------------------------

;WITH Calculator AS(

SELECT

[Level]= 1,

MyString= CAST('01' AS VARCHAR(40)),

LastDigit= CAST(1 AS BIGINT),

Pos= 1,

MaxLevels= (SELECT COUNT(*) FROM #Sample)

UNION ALL

SELECT

[Level]= [Level] + 1,

MyString= y.MyString,

LastDigit= y.NewDigit,

Pos= y.Newpos,

MaxLevels

FROM Calculator c

CROSS APPLY (

SELECT Newpos, NewDigit,

MyString = CAST(LEFT(c.MyString,d.Newpos-1) + RIGHT('0'+CAST(d.NewDigit AS VARCHAR),2) AS VARCHAR(40))

FROM (

SELECT Newpos, NewDigit = CASE

WHEN c.LastDigit = c.MaxLevels THEN 1+CAST(SUBSTRING(c.MyString,x.Newpos,2) AS BIGINT)

ELSE 1+c.LastDigit END

FROM (SELECT Newpos = CASE WHEN c.LastDigit = c.MaxLevels THEN c.Pos-2 ELSE c.Pos+2 END) x (Newpos)

) d (Newpos, NewDigit)

) y (Newpos, NewDigit, MyString)

WHERE c.[Level] < POWER(2,MaxLevels)-1

)

SELECT

c.MyString,

s.SUMQuantity

FROM Calculator c

CROSS APPLY ( -- unpivot the string MyString and use to probe table #Sample

SELECT SUM(qty)

FROM #Sample s

INNER JOIN (

SELECT n, RowID = CAST(SUBSTRING(c.MyString,(n*2)-1,2) AS INT)

FROM (VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10),(11),(12),(13),(14),(15),(16),(17),(18),(19),(20)) d (n)

WHERE n <= Pos+1

) d ON d.RowID = s.Item

) s (SUMQuantity)

WHERE s.SUMQuantity = 10

OPTION (MAXRECURSION 0);

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden