June 27, 2012 at 3:46 pm
Les Cardwell (6/27/2012)
Jeff,So, given 66 original districts (Sweden), that resolve into 151 candidates (population count between 5,900-6,000) resulting in 28 new districts (1 single, 16 doubles, and 11 triples), what would be the permutations for all possible orders if calculated without divide and conquer? From the outset, one could estimate the number of new districts to be ~28 by dividing the Sweden population by the desired population of the new districts
167,976 / 5,950 = 28.23
So would it be...
66!/(66-28)! = 1.0407674943659650459224651367354e+48
(???)
First, I will confess that I have not been following closely enough to fully understand the scenario (for instance, I don't know how the 151 number plays into it), but I will work off of this paragraph and hope you can correct any misunderstandings I have.
Since the 1 single, 16 doubles, and 11 triples add up to 66 (and based on your maths), I'm going to assume we're taking 66 districts, NOT breaking any apart, and reassembling them into 28 larger districts.
If there are absolutely NO conditions on which districts may be paired/tripled up (including a requirement that they be adjacent in order to be combined), then the numbers should be such:
Since all 66 districts will be used, and none twice, then we would have 66!/(66-28)! permutations as you show.
However, this counts order distinctly. For instance, if #29 and #17 were paired, then 29+17 would be a distinct value from 17+29. But it seems like we don't want that, since we're working with real objects. So, to reduce the permutations to combinations:
We have 16 pairs, so there are 16*(2!) dupes there; we have 11 triplets, so we have 11*(3!) dupes there. So now we show:
66!/[(66-28)! * (16*2!) * (11*3!)] = A mere 4.927876x10^44 combinations. :hehe:
June 27, 2012 at 6:19 pm
Les Cardwell (6/27/2012)
Dwain,I must be doing something wrong here. I'm using the CTE to attempt the same logic as SW7 to resolve those new Sweden districts with two old districts (doubles), but it doesn't complete, or maybe takes a very long time to complete.
;
WITH UNIQUEnTuples ( n, Tuples, [Population], [check] )
AS ( SELECT 1 ,
CAST(OID AS VARCHAR(MAX)) ,
[Population] ,
CAST([Population] AS VARCHAR(MAX))
FROM dbo.Remaining
UNION ALL
SELECT 1 + n.n ,
t.OID + ',' + CAST(n.Tuples AS VARCHAR(MAX)) ,
n.[Population] + t.[Population] ,
[check] + '+' + CAST(t.[Population] AS VARCHAR(MAX))
FROM dbo.Remaining t
JOIN UNIQUEnTuples n ON t.OID < n.Tuples
WHERE CHARINDEX(t.OID, n.Tuples) = 0
AND n <= 12 --24 doubles / 2 per new district
AND t.[Population] + n.[Population] <= 71930 --population of doubles (D1)
AND t.District3 = 0 --only select doubles
)
SELECT *
FROM UNIQUEnTuples
WHERE [Population] = 71930 --population of doubles (D1)
ORDER BY n , Tuples
--SELECT SUM(population) FROM dbo.D1 AS d = 71930
Does anything jump out?
Les,
I too have not had the opportunity to look closely at this problem. I was planning on doing so this weekend though. Stay tuned - hopefully I'll come up with something.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 28, 2012 at 12:42 am
Les,
I did take a quick 15 minute look. It appears you didn't implement Chris's speed improvement suggestions. However I doubt that will be sufficient for the size of this problem to help that much in the run time.
Admittedly, my 15 minute look can't possibly compare to 22 years of effort (albeit part time), but I will as I said take a long hard look at it this weekend. I already downloaded your attachments with the setup code (thanks for that).
[/analysis-pending]
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 28, 2012 at 6:50 am
Hello again Les!
Since I couldn't stop thinking about this today, I decided to learn what I could so as to ask you my questions prior to the weekend. To do so, I needed to go back to the beginning (sorry but I haven't been studying this problem for 22 years!). So, I proceeded to recreate your Remaining table using my logic, which is clearly different than yours in some respect. Which leads me to question
#1: In my Remaining table (logic is below), I come up with a couple of rows that you don't. Specifically this one:
nd1d2d3population
20613005997
Why would this row be eliminated from your Remaining table?
My logic results in 98 rows in the Remaining table while yours has 95:
;WITH Singles AS (
SELECT district
,d1=RIGHT('00'+CAST(district AS VARCHAR), 2)
,population
FROM Sweden),
Doubles AS (
SELECT m.district, neighbor
,d1=RIGHT('00'+CAST(m.district AS VARCHAR), 2)
,n1=RIGHT('00'+CAST(neighbor AS VARCHAR), 2)
,population=SUM(population)
FROM Map m
INNER JOIN Sweden s
ON m.district = s.district or m.neighbor = s.district
GROUP BY m.district, neighbor),
Combine2and3 ( n, d1, d2, d3, dall, [Population], [check]) AS (
SELECT 2
,d1
,n1
,'00'
,CAST(d1 + ',' + n1 AS VARCHAR(8000))
,population
,CAST(population AS VARCHAR(8000))
FROM Doubles
WHERE population < 6000
UNION ALL
SELECT 1 + n.n
,y.d1
,y.d2
,y.d3
,CASE WHEN d < n.d1 THEN d + ',' + n.dall
WHEN d > n.d2 THEN n.dall + ',' + d
ELSE n.d1 + ',' + d + ',' + n.d2 END
,n.population + s.population
,[check] + '+' + CAST(s.population AS VARCHAR)
FROM Combine2and3 n
INNER JOIN Doubles d ON (d.d1 IN (n.d1, n.d2) or d.n1 IN (n.d1, n.d2)) --AND
--d.d1 > LEFT(n.dall, 3)
CROSS APPLY (
SELECT CASE WHEN d.d1 IN (n.d1, n.d2) THEN d.n1 ELSE d.d1 END) x(d)
INNER JOIN Singles s ON s.d1 = d
CROSS APPLY (
SELECT d1=CASE WHEN d < n.d1 THEN d WHEN d > d2 THEN n.d1 ELSE n.d1 END
,d2=CASE WHEN d < n.d1 THEN n.d1 WHEN d > d2 THEN n.d2 ELSE d END
,d3=CASE WHEN d < n.d1 THEN n.d2 WHEN d > d2 THEN d ELSE n.d2 END) y
WHERE n < 3 AND y.d1 <> y.d2 AND y.d2 <> y.d3
),
Remaining AS (
SELECT n, d1, d2, d3, [population], [check]
,rn=ROW_NUMBER() OVER (PARTITION BY n, d1, d2, d3 ORDER BY (SELECT NULL))
FROM Combine2and3
WHERE population BETWEEN 5900 AND 6000)
SELECT *
FROM Remaining WHERE rn=1
ORDER BY n, population
Yes, I know, we haven't gotten to the fun part yet!
You may want to note that this logic orders the districts (mine are d1, d2, d3) so that they're in ascending sequence across the columns. I'm not sure why but somehow I think this will come in handy.
In your CTE, you reference OID, which is not in the Remaining table setup. Presumably this is just district1 + ',' + district2 + ',' district3 (if district 3 is zero ignore the last concatenation)? Making that assumption, I added a couple of CROSS APPLYs to get your rCTE to run.
Now, analyzing what you're doing in the rCTE you posted, what you're trying to do is combine 2,3-tuples to tuples to get combinations of tuples. And I don't think the logic being used to compare and flesh out duplicates is going to work in this case. At least not the way you coded it. Nonetheless, I believe there may be more that can be done.
For the record, this is my version of your rCTE looks like this, where I've commented out the code that I used to create the OID. This version is optimized according to the performance results posted by ChrisM. Removing the check column would also help performance.
;
WITH UNIQUEnTuples ( n, Tuples, [Population], [check] )
AS ( SELECT TOP 3 1 ,
CAST(OID AS VARCHAR(MAX)) ,
[Population] ,
CAST([Population] AS VARCHAR(MAX))
FROM dbo.Remaining
--CROSS APPLY (
-- SELECT RIGHT('00'+CAST(district1 AS VARCHAR), 2) + ',' +
-- RIGHT('00'+CAST(district2 AS VARCHAR), 2) + CASE WHEN knt = 3 THEN ',' +
-- RIGHT('00'+CAST(district3 AS VARCHAR), 2) ELSE '' END) x (OID)
UNION ALL
SELECT 1 + n.n ,
t.OID + ',' + CAST(n.Tuples AS VARCHAR(MAX)) ,
n.[Population] + t.[Population] ,
[check] + '+' + CAST(t.[Population] AS VARCHAR(MAX))
FROM UNIQUEnTuples n
JOIN --(SELECT * FROM dbo.Remaining
--CROSS APPLY (
--SELECT RIGHT('00'+CAST(district1 AS VARCHAR), 2) + ',' +
-- RIGHT('00'+CAST(district2 AS VARCHAR), 2) + CASE WHEN knt = 3 THEN ',' +
-- RIGHT('00'+CAST(district3 AS VARCHAR), 2) ELSE '' END) y (OID) ) t
dbo.Remaining) t
ON t.OID < n.Tuples
--JOIN UNIQUEnTuples n ON y.OID < n.Tuples
WHERE -- CHARINDEX(y.OID, n.Tuples) = 0 AND
n <= 12 --24 doubles / 2 per new district
AND t.[Population] + n.[Population] <= 71930 --population of doubles (D1)
AND t.District3 = 0 --only select doubles
)
SELECT *
FROM UNIQUEnTuples
WHERE [Population] = 71930 --population of doubles (D1)
ORDER BY n , Tuples
The performance hit is occuring because this logic includes the same district multiple times, thus creating unnecessary rows, within an n-Tuple. Here's an example where district 01 is repeated:
nTuplesPopulation
401,12,01,12,01,12,01,2823981
I'll work on it some more perhaps tonight or definitely over the weekend. I need to now rest a spell and collect my thoughts.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 28, 2012 at 7:39 am
jeffem (6/27/2012)
Les Cardwell (6/27/2012)
Jeff,So, given 66 original districts (Sweden), that resolve into 151 candidates (population count between 5,900-6,000) resulting in 28 new districts (1 single, 16 doubles, and 11 triples), what would be the permutations for all possible orders if calculated without divide and conquer? From the outset, one could estimate the number of new districts to be ~28 by dividing the Sweden population by the desired population of the new districts
167,976 / 5,950 = 28.23
So would it be...
66!/(66-28)! = 1.0407674943659650459224651367354e+48
(???)
First, I will confess that I have not been following closely enough to fully understand the scenario (for instance, I don't know how the 151 number plays into it), but I will work off of this paragraph and hope you can correct any misunderstandings I have.
Since the 1 single, 16 doubles, and 11 triples add up to 66 (and based on your maths), I'm going to assume we're taking 66 districts, NOT breaking any apart, and reassembling them into 28 larger districts.
If there are absolutely NO conditions on which districts may be paired/tripled up (including a requirement that they be adjacent in order to be combined), then the numbers should be such:
Since all 66 districts will be used, and none twice, then we would have 66!/(66-28)! permutations as you show.
However, this counts order distinctly. For instance, if #29 and #17 were paired, then 29+17 would be a distinct value from 17+29. But it seems like we don't want that, since we're working with real objects. So, to reduce the permutations to combinations:
We have 16 pairs, so there are 16*(2!) dupes there; we have 11 triplets, so we have 11*(3!) dupes there. So now we show:
66!/[(66-28)! * (16*2!) * (11*3!)] = A mere 4.927876x10^44 combinations. :hehe:
Okay, wow. Now that I'm thinking about this after some sleep, I see where I have erred.
We must divide by each combination reduction factor (2! for couples, 3! for triples) for each occurrence, which means we're dividing by 2 16 times and by 3 11 times. :/
66!/[(66-28)! * (2!^16) * (3!^11)] = 4.377375 x 10^34. My apologies!
June 28, 2012 at 8:46 am
I think this SQL is the correct algorithm, however I have not been able to arrive at a way yet to prune the resultset of unwanted results to pass to the next recursion. If it runs in real time, I'm fairly certain the result set will be what you want. Empasis on if there.
;WITH Singles AS (
SELECT district
,d1=RIGHT('00'+CAST(district AS VARCHAR), 2)
,population
FROM Sweden),
Doubles AS (
SELECT m.district, neighbor
,d1=RIGHT('00'+CAST(m.district AS VARCHAR), 2)
,n1=RIGHT('00'+CAST(neighbor AS VARCHAR), 2)
,population=SUM(population)
FROM Map m
INNER JOIN Sweden s
ON m.district = s.district or m.neighbor = s.district
GROUP BY m.district, neighbor),
Combine2and3 ( n, d1, d2, d3, dall, [Population], [check]) AS (
SELECT 2
,d1
,n1
,'00'
,CAST(d1 + ',' + n1 AS VARCHAR(8000))
,population
,CAST(population AS VARCHAR(8000))
FROM Doubles
WHERE population < 6000
UNION ALL
SELECT 1 + n.n
,y.d1
,y.d2
,y.d3
,CASE WHEN d < n.d1 THEN d + ',' + n.dall
WHEN d > n.d2 THEN n.dall + ',' + d
ELSE n.d1 + ',' + d + ',' + n.d2 END
,n.population + s.population
,[check] + '+' + CAST(s.population AS VARCHAR)
FROM Combine2and3 n
INNER JOIN Doubles d ON (d.d1 IN (n.d1, n.d2) or d.n1 IN (n.d1, n.d2)) --AND
--d.d1 > LEFT(n.dall, 3)
CROSS APPLY (
SELECT CASE WHEN d.d1 IN (n.d1, n.d2) THEN d.n1 ELSE d.d1 END) x(d)
INNER JOIN Singles s ON s.d1 = d
CROSS APPLY (
SELECT d1=CASE WHEN d < n.d1 THEN d WHEN d > d2 THEN n.d1 ELSE n.d1 END
,d2=CASE WHEN d < n.d1 THEN n.d1 WHEN d > d2 THEN n.d2 ELSE d END
,d3=CASE WHEN d < n.d1 THEN n.d2 WHEN d > d2 THEN d ELSE n.d2 END) y
WHERE n < 3 AND y.d1 <> y.d2 AND y.d2 <> y.d3
),
Remaining AS (
SELECT n, d1, d2, d3, dall, [population], [check]
FROM (
SELECT n, d1, d2, d3, dall, [population], [check]
,rn=ROW_NUMBER() OVER (PARTITION BY n, d1, d2, d3 ORDER BY (SELECT NULL))
FROM Combine2and3
WHERE population BETWEEN 5900 AND 6000) x
WHERE rn=1),
UNIQUEnTuples AS (
SELECT n=1, tuples='[' + d1 + ']', [Population]
FROM Singles
UNION ALL
SELECT n=1
,tuples='[' + r1.dall + ']'
,[Population]=r1.[Population]
FROM Remaining r1
UNION ALL
SELECT 1 + n.n
,tuples + ',[' + t.dall + ']'
,n.[Population] + t.[Population]
FROM UNIQUEnTuples n
INNER JOIN Remaining t
ON CHARINDEX(d1, tuples) + CHARINDEX(d2, tuples) + CHARINDEX(d3, tuples) = 0
WHERE n.n <= 2 AND t.[Population] + n.[Population] <= 167976
)
SELECT *
FROM UNIQUEnTuples
WHERE [Population] = 167976
ORDER BY n, population
It produces Tuples that look like this (just a few example rows):
ntuplesPopulation
<snip>
3[03],[21,33,51],[07,14,61]14998
3[03],[21,41,51],[14,56,61]14998
3[03],[21,41,51],[16,44,54]14998
3[03],[21,41,51],[20,29,39]14998
3[03],[21,41,51],[20,39,60]14998
<snip>
3[03,52],[34,38,44],[30,41,51]17997
3[03,52],[34,38,44],[31,62]17997
3[03,52],[30,41,51],[31,62]17997
3[03,52],[30,41,51],[34,38,44]17997
3[03,52],[31,62],[34,38,44]17997
3[03,52],[31,62],[30,41,51]17997
3[03,52],[30,41,51],[04,16,38]17998
3[03,52],[04,16,38],[30,41,51]17998
3[03,52],[04,16,38],[31,62]17998
3[03,52],[31,62],[04,16,38]17998
Note the lack of duplicated district IDs within a Tuple.
I'm going to try an abbreviated record set over night.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 28, 2012 at 12:24 pm
jeffem (6/28/2012)
Okay, wow. Now that I'm thinking about this after some sleep, I see where I have erred.We must divide by each combination reduction factor (2! for couples, 3! for triples) for each occurrence, which means we're dividing by 2 16 times and by 3 11 times. :/
66!/[(66-28)! * (2!^16) * (3!^11)] = 4.377375 x 10^34. My apologies!
LOL...I hardly think an apology is necessary 🙂 I wasn't sure how to account for the doubles/triples. Divide and conquer wins again...at least, if we want to compute it 'IN P' 😎
Thanks...
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 28, 2012 at 12:32 pm
dwain.c (6/28/2012)
Les,I did take a quick 15 minute look. It appears you didn't implement Chris's speed improvement suggestions. However I doubt that will be sufficient for the size of this problem to help that much in the run time.
Admittedly, my 15 minute look can't possibly compare to 22 years of effort (albeit part time), but I will as I said take a long hard look at it this weekend. I already downloaded your attachments with the setup code (thanks for that).
[/analysis-pending]
I did run the differences through Toad, and it only showed a 5% improvement, and I did intend to implement those if I could get the original to work first. I'm beginning to think this may be the Transactional vs. Set based dynamic since, I think, CTE is 'transactional'?
I wrote the code in a day back in '98, though I did spend a couple more days putzing with it back then. I've tossed it up from time to time to see if it would complete on the faster servers, and also at times to stress test a server. Ironically, I made a change in the test back then that would have increased the permutations rather than decrease them, but I didn't catch it at the time. Dropping from 33 (triples) to 30 (for 10 ordered sets) is exponentially significant, and may have kept it from resolving sooner.
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 28, 2012 at 1:20 pm
dwain.c (6/28/2012)
Hello again Les!Since I couldn't stop thinking about this today, I decided to learn what I could so as to ask you my questions prior to the weekend. To do so, I needed to go back to the beginning (sorry but I haven't been studying this problem for 22 years!). So, I proceeded to recreate your Remaining table using my logic, which is clearly different than yours in some respect. Which leads me to question
#1: In my Remaining table (logic is below), I come up with a couple of rows that you don't. Specifically this one:
nd1d2d3population
20613005997
Why would this row be eliminated from your Remaining table?
If you look in NewDistricts (should have been named U1), you'll see that this set exists there. These were sets from Candidates in which there was only one possible solution (eg. --Populate with those Candidates where the Old District only appears once). As such, there are six sets.
d1d2d3population
35205999
61305997
92606013
356306053
48005976
584726080
My logic results in 98 rows in the Remaining table while yours has 95:
I'm showing 139...
103 - triples
36 - doubles
0 - uniques
-------------
139 in Remaining
+ 6 NewDistricts (uniques)
--------
= 145
The original number of 'Candidates' is 151, but the result set was reduced by 6 in the code that generates Remaining (sw5) because it eliminates from Candidates any of the districts that exist in NewDistricts (final results for those districts that can only exist in one set).
You may want to note that this logic orders the districts (mine are d1, d2, d3) so that they're in ascending sequence across the columns. I'm not sure why but somehow I think this will come in handy.
Ordering the districts reduces Candidates from 234 to 151 (hence the commented out code in sw3). I 'intuitively' think it's important in coming up with a multidimensional 'key', which I've yet to manifest. The next milepost I suppose :ermm:
In your CTE, you reference OID, which is not in the Remaining table setup. Presumably this is just district1 + ',' + district2 + ',' district3 (if district 3 is zero ignore the last concatenation)? Making that assumption, I added a couple of CROSS APPLYs to get your rCTE to run.
I added OID for the CTE purposes. Had a difficult time to get it to work as an INT, so changed it to VARCHAR.
Now, analyzing what you're doing in the rCTE you posted, what you're trying to do is combine 2,3-tuples to tuples to get combinations of tuples. And I don't think the logic being used to compare and flesh out duplicates is going to work in this case. At least not the way you coded it. Nonetheless, I believe there may be more that can be done.
For the record, this is my version of your rCTE looks like this, where I've commented out the code that I used to create the OID. This version is optimized according to the performance results posted by ChrisM. Removing the check column would also help performance.
;
WITH UNIQUEnTuples ( n, Tuples, [Population], [check] )
AS ( SELECT TOP 3 1 ,
CAST(OID AS VARCHAR(MAX)) ,
[Population] ,
CAST([Population] AS VARCHAR(MAX))
FROM dbo.Remaining
--CROSS APPLY (
-- SELECT RIGHT('00'+CAST(district1 AS VARCHAR), 2) + ',' +
-- RIGHT('00'+CAST(district2 AS VARCHAR), 2) + CASE WHEN knt = 3 THEN ',' +
-- RIGHT('00'+CAST(district3 AS VARCHAR), 2) ELSE '' END) x (OID)
UNION ALL
SELECT 1 + n.n ,
t.OID + ',' + CAST(n.Tuples AS VARCHAR(MAX)) ,
n.[Population] + t.[Population] ,
[check] + '+' + CAST(t.[Population] AS VARCHAR(MAX))
FROM UNIQUEnTuples n
JOIN --(SELECT * FROM dbo.Remaining
--CROSS APPLY (
--SELECT RIGHT('00'+CAST(district1 AS VARCHAR), 2) + ',' +
-- RIGHT('00'+CAST(district2 AS VARCHAR), 2) + CASE WHEN knt = 3 THEN ',' +
-- RIGHT('00'+CAST(district3 AS VARCHAR), 2) ELSE '' END) y (OID) ) t
dbo.Remaining) t
ON t.OID < n.Tuples
--JOIN UNIQUEnTuples n ON y.OID < n.Tuples
WHERE -- CHARINDEX(y.OID, n.Tuples) = 0 AND
n <= 12 --24 doubles / 2 per new district
AND t.[Population] + n.[Population] <= 71930 --population of doubles (D1)
AND t.District3 = 0 --only select doubles
)
SELECT *
FROM UNIQUEnTuples
WHERE [Population] = 71930 --population of doubles (D1)
ORDER BY n , Tuples
The performance hit is occuring because this logic includes the same district multiple times, thus creating unnecessary rows, within an n-Tuple. Here's an example where district 01 is repeated:
nTuplesPopulation
401,12,01,12,01,12,01,2823981
I'll work on it some more perhaps tonight or definitely over the weekend. I need to now rest a spell and collect my thoughts.
That makes sense, and I considered it after testing against Remaining. The CTE logic works better against, say, T1, since it only reads 1 district at a time, rather than 'sets of sets' (doubles, triples). I'll work on this approach some more as well.
Thx,
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 28, 2012 at 1:24 pm
dwain.c (6/28/2012)
I think this SQL is the correct algorithm, however I have not been able to arrive at a way yet to prune the resultset of unwanted results to pass to the next recursion. If it runs in real time, I'm fairly certain the result set will be what you want. Empasis on if there....
It produces Tuples that look like this (just a few example rows):
ntuplesPopulation
<snip>
3[03],[21,33,51],[07,14,61]14998
3[03],[21,41,51],[14,56,61]14998
3[03],[21,41,51],[16,44,54]14998
3[03],[21,41,51],[20,29,39]14998
3[03],[21,41,51],[20,39,60]14998
<snip>
3[03,52],[34,38,44],[30,41,51]17997
3[03,52],[34,38,44],[31,62]17997
3[03,52],[30,41,51],[31,62]17997
3[03,52],[30,41,51],[34,38,44]17997
3[03,52],[31,62],[34,38,44]17997
3[03,52],[31,62],[30,41,51]17997
3[03,52],[30,41,51],[04,16,38]17998
3[03,52],[04,16,38],[30,41,51]17998
3[03,52],[04,16,38],[31,62]17998
3[03,52],[31,62],[04,16,38]17998
Note the lack of duplicated district IDs within a Tuple.
I'm going to try an abbreviated record set over night.
I responded, then went back over the code again. Impressive CTE! 😎 ....and I think the right approach. Should prove interesting.
Thx,
~Les
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 28, 2012 at 8:04 pm
Les,
I'll need to go over your posts carefully to thoroughly understand but I think what you're saying I can grok.
Last night when I was working on this, I realized that the rCTE last step is simply creating too many rows that are not useful in the final solution set except to generate more combinations of districts. These need to be pruned and I don't think it's possible to do so within the rCTE structure. While the SQL I gave you may eventually run, I wouldn't count on it doing so in any reasonable period of time.
My limited test ran for one hour but it was on a very small subset. It did at least produce the expected results.
As such things will be, sleeping on it I may have come up with an alternate approach. I ran some initial tests this morning and it appears promising, albeit also pretty slow. Before reaching the ultimate result set, I'm needing to tweak it a bit to eek every microsecond of performance out of it. I am hopeful it might work though.
Once again, stay tuned as I am now hooked.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 29, 2012 at 9:57 am
This is a great CTE study for me...
BTW, I ran it against this server, and it ran in 7min...however, no rows were returned.
Singles - you can limit this to just one tuple by adding...
WITH Singles
AS ( SELECT district ,
d1 = RIGHT('00' + CAST(district AS VARCHAR), 2) ,
population
FROM Sweden
WHEREpopulation BETWEEN 5900 AND 6000
),
Doubles
Still looking through the rest.
Dr. Les Cardwell, DCS-DSS
Enterprise Data Architect
Central Lincoln PUD
June 30, 2012 at 5:39 am
Les Cardwell (6/29/2012)
This is a great CTE study for me...BTW, I ran it against this server, and it ran in 7min...however, no rows were returned.
Singles - you can limit this to just one tuple by adding...
WITH Singles
AS ( SELECT district ,
d1 = RIGHT('00' + CAST(district AS VARCHAR), 2) ,
population
FROM Sweden
WHEREpopulation BETWEEN 5900 AND 6000
),
Doubles
Still looking through the rest.
Now this is interesting. I'm still working on my new approach. I've been able to get up to 24 combined districts, combining the 2 and 3 tuples according to the map. Row counts are too high to proceed using the same technique so I wasn't sure how to proceed until I saw this (new idea!).
Not sure why my original CTE returned now rows.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 30, 2012 at 5:52 am
Sorry but I do know why the CTE returns 0 rows. You must still have the population equality check on the last line. Try taking that out.
You'll find that the reason it only takes 7 minutes to run is the limiter I put in (n <= 3 I think it was). To make it solve the problem, you'd need to take that out. But I'd suggest just incrementing the 3 by 1 a couple of times and watch the impact it has on the run times before taking it out entirely.
My thought question: Have you ever been told that your query runs too fast?
My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.
Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St
June 30, 2012 at 6:02 am
You can take advantage of cube subclause . This generates all the permutation and combinations...
See the below example...
If the dynamic code is becoming bigger.. You can have two part..first till beforeCTe code..other one is just for finalcte..
Issues with this : Grouping sets generates a maximum number of 4096 expression so that is one of the main disadvantage. Simialry you can have just 4096 columns in a select statement. Other limitations are below.
http://msdn.microsoft.com/en-us/library/ms143432.aspx
One workaorund if the number of rows are too much is that you break up into a multple problems..Like you mentioned there are around 160 dist. You can break it into 5 sets of 28..Put the results into 5 tables..Then corss join these tables to come up with final solution..
There are way too many answer to this post so following up is quite difficult.Do you want to have dup districts?
Means can a tupple be like this 'dist1,dist1,dist1' = 300 if dist1 has 100 rows?
drop table a
go
create table a( id int not null,dst varchar(2) not null,)
go
insert into a(id,dst)
select 10,'A'
union all
select 11,'B'
union all
select 20,'C'
union all
select 100,'D'
go
with mycte
(
id,a,b,c,d
)as
(
select id,[A],,[C],[D]
from (
select id,dst,dst as dst1
from a
) a
pivot
(
max(dst1)
for dst in ([A],,[C],[D] )
) as pvt
),beforecte
as
(
select a,B,C,d,SUM(id)sm ,GROUPING_ID(a,B,C,d) grp
from mycte
group by cube(a,B,C,d)
),
finalcte as
(
select case
when grp = 0 then coalesce(a,b,c,d)
when grp = 1 then 'd'
when grp = 2 then 'c'
when grp = 3 then 'c' + ',' + 'd'
when grp = 4 then 'b'
when grp = 5 then 'b' + ',' + 'd'
when grp = 6 then 'b' + ',' + 'c'
when grp = 7 then 'b' + ',' + 'c' + ',' + 'd'
when grp = 8 then 'a'
when grp = 9 then 'a' + ',' + 'd'
when grp = 10 then 'a' + ',' + 'c'
when grp = 11 then 'a' + ',' + 'c' + ',' + 'd'
when grp = 12 then 'a' + ',' + 'b'
when grp = 13 then 'a' + ',' + 'b' + ',' + 'd'
when grp = 14 then 'a' + ',' + 'b' + ',' + 'c'
when grp = 15 then 'a' + ',' + 'b' + ',' + 'c' + ',' + 'd'
end dstlist
,* from beforecte
where coalesce(a,b,c,d,'!!') = '!!'
)
select distinct dstlist,sm,grp
--,a,b,c,d
from finalcte where dstlist is not null
order by dstlist asc
GulliMeel
Finding top n Worst Performing queries[/url]
Improve the performance of Merge Join(special case)
How to Post Performance Problem -Gail Shaw[/url]
Viewing 15 posts - 61 through 75 (of 109 total)
You must be logged in to reply to this topic. Login to reply