Aggregate Query question

  • Les Cardwell (6/27/2012)


    Jeff,

    So, given 66 original districts (Sweden), that resolve into 151 candidates (population count between 5,900-6,000) resulting in 28 new districts (1 single, 16 doubles, and 11 triples), what would be the permutations for all possible orders if calculated without divide and conquer? From the outset, one could estimate the number of new districts to be ~28 by dividing the Sweden population by the desired population of the new districts

    167,976 / 5,950 = 28.23

    So would it be...

    66!/(66-28)! = 1.0407674943659650459224651367354e+48

    (???)

    First, I will confess that I have not been following closely enough to fully understand the scenario (for instance, I don't know how the 151 number plays into it), but I will work off of this paragraph and hope you can correct any misunderstandings I have.

    Since the 1 single, 16 doubles, and 11 triples add up to 66 (and based on your maths), I'm going to assume we're taking 66 districts, NOT breaking any apart, and reassembling them into 28 larger districts.

    If there are absolutely NO conditions on which districts may be paired/tripled up (including a requirement that they be adjacent in order to be combined), then the numbers should be such:

    Since all 66 districts will be used, and none twice, then we would have 66!/(66-28)! permutations as you show.

    However, this counts order distinctly. For instance, if #29 and #17 were paired, then 29+17 would be a distinct value from 17+29. But it seems like we don't want that, since we're working with real objects. So, to reduce the permutations to combinations:

    We have 16 pairs, so there are 16*(2!) dupes there; we have 11 triplets, so we have 11*(3!) dupes there. So now we show:

    66!/[(66-28)! * (16*2!) * (11*3!)] = A mere 4.927876x10^44 combinations. :hehe:

  • Les Cardwell (6/27/2012)


    Dwain,

    I must be doing something wrong here. I'm using the CTE to attempt the same logic as SW7 to resolve those new Sweden districts with two old districts (doubles), but it doesn't complete, or maybe takes a very long time to complete.

    ;

    WITH UNIQUEnTuples ( n, Tuples, [Population], [check] )

    AS ( SELECT 1 ,

    CAST(OID AS VARCHAR(MAX)) ,

    [Population] ,

    CAST([Population] AS VARCHAR(MAX))

    FROM dbo.Remaining

    UNION ALL

    SELECT 1 + n.n ,

    t.OID + ',' + CAST(n.Tuples AS VARCHAR(MAX)) ,

    n.[Population] + t.[Population] ,

    [check] + '+' + CAST(t.[Population] AS VARCHAR(MAX))

    FROM dbo.Remaining t

    JOIN UNIQUEnTuples n ON t.OID < n.Tuples

    WHERE CHARINDEX(t.OID, n.Tuples) = 0

    AND n <= 12 --24 doubles / 2 per new district

    AND t.[Population] + n.[Population] <= 71930 --population of doubles (D1)

    AND t.District3 = 0 --only select doubles

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE [Population] = 71930 --population of doubles (D1)

    ORDER BY n , Tuples

    --SELECT SUM(population) FROM dbo.D1 AS d = 71930

    Does anything jump out?

    Les,

    I too have not had the opportunity to look closely at this problem. I was planning on doing so this weekend though. Stay tuned - hopefully I'll come up with something.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Les,

    I did take a quick 15 minute look. It appears you didn't implement Chris's speed improvement suggestions. However I doubt that will be sufficient for the size of this problem to help that much in the run time.

    Admittedly, my 15 minute look can't possibly compare to 22 years of effort (albeit part time), but I will as I said take a long hard look at it this weekend. I already downloaded your attachments with the setup code (thanks for that).

    [/analysis-pending]


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Hello again Les!

    Since I couldn't stop thinking about this today, I decided to learn what I could so as to ask you my questions prior to the weekend. To do so, I needed to go back to the beginning (sorry but I haven't been studying this problem for 22 years!). So, I proceeded to recreate your Remaining table using my logic, which is clearly different than yours in some respect. Which leads me to question

    #1: In my Remaining table (logic is below), I come up with a couple of rows that you don't. Specifically this one:

    nd1d2d3population

    20613005997

    Why would this row be eliminated from your Remaining table?

    My logic results in 98 rows in the Remaining table while yours has 95:

    ;WITH Singles AS (

    SELECT district

    ,d1=RIGHT('00'+CAST(district AS VARCHAR), 2)

    ,population

    FROM Sweden),

    Doubles AS (

    SELECT m.district, neighbor

    ,d1=RIGHT('00'+CAST(m.district AS VARCHAR), 2)

    ,n1=RIGHT('00'+CAST(neighbor AS VARCHAR), 2)

    ,population=SUM(population)

    FROM Map m

    INNER JOIN Sweden s

    ON m.district = s.district or m.neighbor = s.district

    GROUP BY m.district, neighbor),

    Combine2and3 ( n, d1, d2, d3, dall, [Population], [check]) AS (

    SELECT 2

    ,d1

    ,n1

    ,'00'

    ,CAST(d1 + ',' + n1 AS VARCHAR(8000))

    ,population

    ,CAST(population AS VARCHAR(8000))

    FROM Doubles

    WHERE population < 6000

    UNION ALL

    SELECT 1 + n.n

    ,y.d1

    ,y.d2

    ,y.d3

    ,CASE WHEN d < n.d1 THEN d + ',' + n.dall

    WHEN d > n.d2 THEN n.dall + ',' + d

    ELSE n.d1 + ',' + d + ',' + n.d2 END

    ,n.population + s.population

    ,[check] + '+' + CAST(s.population AS VARCHAR)

    FROM Combine2and3 n

    INNER JOIN Doubles d ON (d.d1 IN (n.d1, n.d2) or d.n1 IN (n.d1, n.d2)) --AND

    --d.d1 > LEFT(n.dall, 3)

    CROSS APPLY (

    SELECT CASE WHEN d.d1 IN (n.d1, n.d2) THEN d.n1 ELSE d.d1 END) x(d)

    INNER JOIN Singles s ON s.d1 = d

    CROSS APPLY (

    SELECT d1=CASE WHEN d < n.d1 THEN d WHEN d > d2 THEN n.d1 ELSE n.d1 END

    ,d2=CASE WHEN d < n.d1 THEN n.d1 WHEN d > d2 THEN n.d2 ELSE d END

    ,d3=CASE WHEN d < n.d1 THEN n.d2 WHEN d > d2 THEN d ELSE n.d2 END) y

    WHERE n < 3 AND y.d1 <> y.d2 AND y.d2 <> y.d3

    ),

    Remaining AS (

    SELECT n, d1, d2, d3, [population], [check]

    ,rn=ROW_NUMBER() OVER (PARTITION BY n, d1, d2, d3 ORDER BY (SELECT NULL))

    FROM Combine2and3

    WHERE population BETWEEN 5900 AND 6000)

    SELECT *

    FROM Remaining WHERE rn=1

    ORDER BY n, population

    Yes, I know, we haven't gotten to the fun part yet!

    You may want to note that this logic orders the districts (mine are d1, d2, d3) so that they're in ascending sequence across the columns. I'm not sure why but somehow I think this will come in handy.

    In your CTE, you reference OID, which is not in the Remaining table setup. Presumably this is just district1 + ',' + district2 + ',' district3 (if district 3 is zero ignore the last concatenation)? Making that assumption, I added a couple of CROSS APPLYs to get your rCTE to run.

    Now, analyzing what you're doing in the rCTE you posted, what you're trying to do is combine 2,3-tuples to tuples to get combinations of tuples. And I don't think the logic being used to compare and flesh out duplicates is going to work in this case. At least not the way you coded it. Nonetheless, I believe there may be more that can be done.

    For the record, this is my version of your rCTE looks like this, where I've commented out the code that I used to create the OID. This version is optimized according to the performance results posted by ChrisM. Removing the check column would also help performance.

    ;

    WITH UNIQUEnTuples ( n, Tuples, [Population], [check] )

    AS ( SELECT TOP 3 1 ,

    CAST(OID AS VARCHAR(MAX)) ,

    [Population] ,

    CAST([Population] AS VARCHAR(MAX))

    FROM dbo.Remaining

    --CROSS APPLY (

    -- SELECT RIGHT('00'+CAST(district1 AS VARCHAR), 2) + ',' +

    -- RIGHT('00'+CAST(district2 AS VARCHAR), 2) + CASE WHEN knt = 3 THEN ',' +

    -- RIGHT('00'+CAST(district3 AS VARCHAR), 2) ELSE '' END) x (OID)

    UNION ALL

    SELECT 1 + n.n ,

    t.OID + ',' + CAST(n.Tuples AS VARCHAR(MAX)) ,

    n.[Population] + t.[Population] ,

    [check] + '+' + CAST(t.[Population] AS VARCHAR(MAX))

    FROM UNIQUEnTuples n

    JOIN --(SELECT * FROM dbo.Remaining

    --CROSS APPLY (

    --SELECT RIGHT('00'+CAST(district1 AS VARCHAR), 2) + ',' +

    -- RIGHT('00'+CAST(district2 AS VARCHAR), 2) + CASE WHEN knt = 3 THEN ',' +

    -- RIGHT('00'+CAST(district3 AS VARCHAR), 2) ELSE '' END) y (OID) ) t

    dbo.Remaining) t

    ON t.OID < n.Tuples

    --JOIN UNIQUEnTuples n ON y.OID < n.Tuples

    WHERE -- CHARINDEX(y.OID, n.Tuples) = 0 AND

    n <= 12 --24 doubles / 2 per new district

    AND t.[Population] + n.[Population] <= 71930 --population of doubles (D1)

    AND t.District3 = 0 --only select doubles

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE [Population] = 71930 --population of doubles (D1)

    ORDER BY n , Tuples

    The performance hit is occuring because this logic includes the same district multiple times, thus creating unnecessary rows, within an n-Tuple. Here's an example where district 01 is repeated:

    nTuplesPopulation

    401,12,01,12,01,12,01,2823981

    I'll work on it some more perhaps tonight or definitely over the weekend. I need to now rest a spell and collect my thoughts.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • jeffem (6/27/2012)


    Les Cardwell (6/27/2012)


    Jeff,

    So, given 66 original districts (Sweden), that resolve into 151 candidates (population count between 5,900-6,000) resulting in 28 new districts (1 single, 16 doubles, and 11 triples), what would be the permutations for all possible orders if calculated without divide and conquer? From the outset, one could estimate the number of new districts to be ~28 by dividing the Sweden population by the desired population of the new districts

    167,976 / 5,950 = 28.23

    So would it be...

    66!/(66-28)! = 1.0407674943659650459224651367354e+48

    (???)

    First, I will confess that I have not been following closely enough to fully understand the scenario (for instance, I don't know how the 151 number plays into it), but I will work off of this paragraph and hope you can correct any misunderstandings I have.

    Since the 1 single, 16 doubles, and 11 triples add up to 66 (and based on your maths), I'm going to assume we're taking 66 districts, NOT breaking any apart, and reassembling them into 28 larger districts.

    If there are absolutely NO conditions on which districts may be paired/tripled up (including a requirement that they be adjacent in order to be combined), then the numbers should be such:

    Since all 66 districts will be used, and none twice, then we would have 66!/(66-28)! permutations as you show.

    However, this counts order distinctly. For instance, if #29 and #17 were paired, then 29+17 would be a distinct value from 17+29. But it seems like we don't want that, since we're working with real objects. So, to reduce the permutations to combinations:

    We have 16 pairs, so there are 16*(2!) dupes there; we have 11 triplets, so we have 11*(3!) dupes there. So now we show:

    66!/[(66-28)! * (16*2!) * (11*3!)] = A mere 4.927876x10^44 combinations. :hehe:

    Okay, wow. Now that I'm thinking about this after some sleep, I see where I have erred.

    We must divide by each combination reduction factor (2! for couples, 3! for triples) for each occurrence, which means we're dividing by 2 16 times and by 3 11 times. :/

    66!/[(66-28)! * (2!^16) * (3!^11)] = 4.377375 x 10^34. My apologies!

  • I think this SQL is the correct algorithm, however I have not been able to arrive at a way yet to prune the resultset of unwanted results to pass to the next recursion. If it runs in real time, I'm fairly certain the result set will be what you want. Empasis on if there.

    ;WITH Singles AS (

    SELECT district

    ,d1=RIGHT('00'+CAST(district AS VARCHAR), 2)

    ,population

    FROM Sweden),

    Doubles AS (

    SELECT m.district, neighbor

    ,d1=RIGHT('00'+CAST(m.district AS VARCHAR), 2)

    ,n1=RIGHT('00'+CAST(neighbor AS VARCHAR), 2)

    ,population=SUM(population)

    FROM Map m

    INNER JOIN Sweden s

    ON m.district = s.district or m.neighbor = s.district

    GROUP BY m.district, neighbor),

    Combine2and3 ( n, d1, d2, d3, dall, [Population], [check]) AS (

    SELECT 2

    ,d1

    ,n1

    ,'00'

    ,CAST(d1 + ',' + n1 AS VARCHAR(8000))

    ,population

    ,CAST(population AS VARCHAR(8000))

    FROM Doubles

    WHERE population < 6000

    UNION ALL

    SELECT 1 + n.n

    ,y.d1

    ,y.d2

    ,y.d3

    ,CASE WHEN d < n.d1 THEN d + ',' + n.dall

    WHEN d > n.d2 THEN n.dall + ',' + d

    ELSE n.d1 + ',' + d + ',' + n.d2 END

    ,n.population + s.population

    ,[check] + '+' + CAST(s.population AS VARCHAR)

    FROM Combine2and3 n

    INNER JOIN Doubles d ON (d.d1 IN (n.d1, n.d2) or d.n1 IN (n.d1, n.d2)) --AND

    --d.d1 > LEFT(n.dall, 3)

    CROSS APPLY (

    SELECT CASE WHEN d.d1 IN (n.d1, n.d2) THEN d.n1 ELSE d.d1 END) x(d)

    INNER JOIN Singles s ON s.d1 = d

    CROSS APPLY (

    SELECT d1=CASE WHEN d < n.d1 THEN d WHEN d > d2 THEN n.d1 ELSE n.d1 END

    ,d2=CASE WHEN d < n.d1 THEN n.d1 WHEN d > d2 THEN n.d2 ELSE d END

    ,d3=CASE WHEN d < n.d1 THEN n.d2 WHEN d > d2 THEN d ELSE n.d2 END) y

    WHERE n < 3 AND y.d1 <> y.d2 AND y.d2 <> y.d3

    ),

    Remaining AS (

    SELECT n, d1, d2, d3, dall, [population], [check]

    FROM (

    SELECT n, d1, d2, d3, dall, [population], [check]

    ,rn=ROW_NUMBER() OVER (PARTITION BY n, d1, d2, d3 ORDER BY (SELECT NULL))

    FROM Combine2and3

    WHERE population BETWEEN 5900 AND 6000) x

    WHERE rn=1),

    UNIQUEnTuples AS (

    SELECT n=1, tuples='[' + d1 + ']', [Population]

    FROM Singles

    UNION ALL

    SELECT n=1

    ,tuples='[' + r1.dall + ']'

    ,[Population]=r1.[Population]

    FROM Remaining r1

    UNION ALL

    SELECT 1 + n.n

    ,tuples + ',[' + t.dall + ']'

    ,n.[Population] + t.[Population]

    FROM UNIQUEnTuples n

    INNER JOIN Remaining t

    ON CHARINDEX(d1, tuples) + CHARINDEX(d2, tuples) + CHARINDEX(d3, tuples) = 0

    WHERE n.n <= 2 AND t.[Population] + n.[Population] <= 167976

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE [Population] = 167976

    ORDER BY n, population

    It produces Tuples that look like this (just a few example rows):

    ntuplesPopulation

    <snip>

    3[03],[21,33,51],[07,14,61]14998

    3[03],[21,41,51],[14,56,61]14998

    3[03],[21,41,51],[16,44,54]14998

    3[03],[21,41,51],[20,29,39]14998

    3[03],[21,41,51],[20,39,60]14998

    <snip>

    3[03,52],[34,38,44],[30,41,51]17997

    3[03,52],[34,38,44],[31,62]17997

    3[03,52],[30,41,51],[31,62]17997

    3[03,52],[30,41,51],[34,38,44]17997

    3[03,52],[31,62],[34,38,44]17997

    3[03,52],[31,62],[30,41,51]17997

    3[03,52],[30,41,51],[04,16,38]17998

    3[03,52],[04,16,38],[30,41,51]17998

    3[03,52],[04,16,38],[31,62]17998

    3[03,52],[31,62],[04,16,38]17998

    Note the lack of duplicated district IDs within a Tuple.

    I'm going to try an abbreviated record set over night.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • jeffem (6/28/2012)


    Okay, wow. Now that I'm thinking about this after some sleep, I see where I have erred.

    We must divide by each combination reduction factor (2! for couples, 3! for triples) for each occurrence, which means we're dividing by 2 16 times and by 3 11 times. :/

    66!/[(66-28)! * (2!^16) * (3!^11)] = 4.377375 x 10^34. My apologies!

    LOL...I hardly think an apology is necessary 🙂 I wasn't sure how to account for the doubles/triples. Divide and conquer wins again...at least, if we want to compute it 'IN P' 😎

    Thanks...

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • dwain.c (6/28/2012)


    Les,

    I did take a quick 15 minute look. It appears you didn't implement Chris's speed improvement suggestions. However I doubt that will be sufficient for the size of this problem to help that much in the run time.

    Admittedly, my 15 minute look can't possibly compare to 22 years of effort (albeit part time), but I will as I said take a long hard look at it this weekend. I already downloaded your attachments with the setup code (thanks for that).

    [/analysis-pending]

    I did run the differences through Toad, and it only showed a 5% improvement, and I did intend to implement those if I could get the original to work first. I'm beginning to think this may be the Transactional vs. Set based dynamic since, I think, CTE is 'transactional'?

    I wrote the code in a day back in '98, though I did spend a couple more days putzing with it back then. I've tossed it up from time to time to see if it would complete on the faster servers, and also at times to stress test a server. Ironically, I made a change in the test back then that would have increased the permutations rather than decrease them, but I didn't catch it at the time. Dropping from 33 (triples) to 30 (for 10 ordered sets) is exponentially significant, and may have kept it from resolving sooner.

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • dwain.c (6/28/2012)


    Hello again Les!

    Since I couldn't stop thinking about this today, I decided to learn what I could so as to ask you my questions prior to the weekend. To do so, I needed to go back to the beginning (sorry but I haven't been studying this problem for 22 years!). So, I proceeded to recreate your Remaining table using my logic, which is clearly different than yours in some respect. Which leads me to question

    #1: In my Remaining table (logic is below), I come up with a couple of rows that you don't. Specifically this one:

    nd1d2d3population

    20613005997

    Why would this row be eliminated from your Remaining table?

    If you look in NewDistricts (should have been named U1), you'll see that this set exists there. These were sets from Candidates in which there was only one possible solution (eg. --Populate with those Candidates where the Old District only appears once). As such, there are six sets.

    d1d2d3population

    35205999

    61305997

    92606013

    356306053

    48005976

    584726080

    My logic results in 98 rows in the Remaining table while yours has 95:

    I'm showing 139...

    103 - triples

    36 - doubles

    0 - uniques

    -------------

    139 in Remaining

    + 6 NewDistricts (uniques)

    --------

    = 145

    The original number of 'Candidates' is 151, but the result set was reduced by 6 in the code that generates Remaining (sw5) because it eliminates from Candidates any of the districts that exist in NewDistricts (final results for those districts that can only exist in one set).

    You may want to note that this logic orders the districts (mine are d1, d2, d3) so that they're in ascending sequence across the columns. I'm not sure why but somehow I think this will come in handy.

    Ordering the districts reduces Candidates from 234 to 151 (hence the commented out code in sw3). I 'intuitively' think it's important in coming up with a multidimensional 'key', which I've yet to manifest. The next milepost I suppose :ermm:

    In your CTE, you reference OID, which is not in the Remaining table setup. Presumably this is just district1 + ',' + district2 + ',' district3 (if district 3 is zero ignore the last concatenation)? Making that assumption, I added a couple of CROSS APPLYs to get your rCTE to run.

    I added OID for the CTE purposes. Had a difficult time to get it to work as an INT, so changed it to VARCHAR.

    Now, analyzing what you're doing in the rCTE you posted, what you're trying to do is combine 2,3-tuples to tuples to get combinations of tuples. And I don't think the logic being used to compare and flesh out duplicates is going to work in this case. At least not the way you coded it. Nonetheless, I believe there may be more that can be done.

    For the record, this is my version of your rCTE looks like this, where I've commented out the code that I used to create the OID. This version is optimized according to the performance results posted by ChrisM. Removing the check column would also help performance.

    ;

    WITH UNIQUEnTuples ( n, Tuples, [Population], [check] )

    AS ( SELECT TOP 3 1 ,

    CAST(OID AS VARCHAR(MAX)) ,

    [Population] ,

    CAST([Population] AS VARCHAR(MAX))

    FROM dbo.Remaining

    --CROSS APPLY (

    -- SELECT RIGHT('00'+CAST(district1 AS VARCHAR), 2) + ',' +

    -- RIGHT('00'+CAST(district2 AS VARCHAR), 2) + CASE WHEN knt = 3 THEN ',' +

    -- RIGHT('00'+CAST(district3 AS VARCHAR), 2) ELSE '' END) x (OID)

    UNION ALL

    SELECT 1 + n.n ,

    t.OID + ',' + CAST(n.Tuples AS VARCHAR(MAX)) ,

    n.[Population] + t.[Population] ,

    [check] + '+' + CAST(t.[Population] AS VARCHAR(MAX))

    FROM UNIQUEnTuples n

    JOIN --(SELECT * FROM dbo.Remaining

    --CROSS APPLY (

    --SELECT RIGHT('00'+CAST(district1 AS VARCHAR), 2) + ',' +

    -- RIGHT('00'+CAST(district2 AS VARCHAR), 2) + CASE WHEN knt = 3 THEN ',' +

    -- RIGHT('00'+CAST(district3 AS VARCHAR), 2) ELSE '' END) y (OID) ) t

    dbo.Remaining) t

    ON t.OID < n.Tuples

    --JOIN UNIQUEnTuples n ON y.OID < n.Tuples

    WHERE -- CHARINDEX(y.OID, n.Tuples) = 0 AND

    n <= 12 --24 doubles / 2 per new district

    AND t.[Population] + n.[Population] <= 71930 --population of doubles (D1)

    AND t.District3 = 0 --only select doubles

    )

    SELECT *

    FROM UNIQUEnTuples

    WHERE [Population] = 71930 --population of doubles (D1)

    ORDER BY n , Tuples

    The performance hit is occuring because this logic includes the same district multiple times, thus creating unnecessary rows, within an n-Tuple. Here's an example where district 01 is repeated:

    nTuplesPopulation

    401,12,01,12,01,12,01,2823981

    I'll work on it some more perhaps tonight or definitely over the weekend. I need to now rest a spell and collect my thoughts.

    That makes sense, and I considered it after testing against Remaining. The CTE logic works better against, say, T1, since it only reads 1 district at a time, rather than 'sets of sets' (doubles, triples). I'll work on this approach some more as well.

    Thx,

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • dwain.c (6/28/2012)


    I think this SQL is the correct algorithm, however I have not been able to arrive at a way yet to prune the resultset of unwanted results to pass to the next recursion. If it runs in real time, I'm fairly certain the result set will be what you want. Empasis on if there.

    ...

    It produces Tuples that look like this (just a few example rows):

    ntuplesPopulation

    <snip>

    3[03],[21,33,51],[07,14,61]14998

    3[03],[21,41,51],[14,56,61]14998

    3[03],[21,41,51],[16,44,54]14998

    3[03],[21,41,51],[20,29,39]14998

    3[03],[21,41,51],[20,39,60]14998

    <snip>

    3[03,52],[34,38,44],[30,41,51]17997

    3[03,52],[34,38,44],[31,62]17997

    3[03,52],[30,41,51],[31,62]17997

    3[03,52],[30,41,51],[34,38,44]17997

    3[03,52],[31,62],[34,38,44]17997

    3[03,52],[31,62],[30,41,51]17997

    3[03,52],[30,41,51],[04,16,38]17998

    3[03,52],[04,16,38],[30,41,51]17998

    3[03,52],[04,16,38],[31,62]17998

    3[03,52],[31,62],[04,16,38]17998

    Note the lack of duplicated district IDs within a Tuple.

    I'm going to try an abbreviated record set over night.

    I responded, then went back over the code again. Impressive CTE! 😎 ....and I think the right approach. Should prove interesting.

    Thx,

    ~Les

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Les,

    I'll need to go over your posts carefully to thoroughly understand but I think what you're saying I can grok.

    Last night when I was working on this, I realized that the rCTE last step is simply creating too many rows that are not useful in the final solution set except to generate more combinations of districts. These need to be pruned and I don't think it's possible to do so within the rCTE structure. While the SQL I gave you may eventually run, I wouldn't count on it doing so in any reasonable period of time.

    My limited test ran for one hour but it was on a very small subset. It did at least produce the expected results.

    As such things will be, sleeping on it I may have come up with an alternate approach. I ran some initial tests this morning and it appears promising, albeit also pretty slow. Before reaching the ultimate result set, I'm needing to tweak it a bit to eek every microsecond of performance out of it. I am hopeful it might work though.

    Once again, stay tuned as I am now hooked.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • This is a great CTE study for me...

    BTW, I ran it against this server, and it ran in 7min...however, no rows were returned.

    Singles - you can limit this to just one tuple by adding...

    WITH Singles

    AS ( SELECT district ,

    d1 = RIGHT('00' + CAST(district AS VARCHAR), 2) ,

    population

    FROM Sweden

    WHEREpopulation BETWEEN 5900 AND 6000

    ),

    Doubles

    Still looking through the rest.

    Dr. Les Cardwell, DCS-DSS
    Enterprise Data Architect
    Central Lincoln PUD

  • Les Cardwell (6/29/2012)


    This is a great CTE study for me...

    BTW, I ran it against this server, and it ran in 7min...however, no rows were returned.

    Singles - you can limit this to just one tuple by adding...

    WITH Singles

    AS ( SELECT district ,

    d1 = RIGHT('00' + CAST(district AS VARCHAR), 2) ,

    population

    FROM Sweden

    WHEREpopulation BETWEEN 5900 AND 6000

    ),

    Doubles

    Still looking through the rest.

    Now this is interesting. I'm still working on my new approach. I've been able to get up to 24 combined districts, combining the 2 and 3 tuples according to the map. Row counts are too high to proceed using the same technique so I wasn't sure how to proceed until I saw this (new idea!).

    Not sure why my original CTE returned now rows.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • Sorry but I do know why the CTE returns 0 rows. You must still have the population equality check on the last line. Try taking that out.

    You'll find that the reason it only takes 7 minutes to run is the limiter I put in (n <= 3 I think it was). To make it solve the problem, you'd need to take that out. But I'd suggest just incrementing the 3 by 1 a couple of times and watch the impact it has on the run times before taking it out entirely.


    My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]

    My thought question: Have you ever been told that your query runs too fast?

    My advice:
    INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
    The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

    Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
    Since random numbers are too important to be left to chance, let's generate some![/url]
    Learn to understand recursive CTEs by example.[/url]
    [url url=http://www.sqlservercentral.com/articles/St

  • You can take advantage of cube subclause . This generates all the permutation and combinations...

    See the below example...

    If the dynamic code is becoming bigger.. You can have two part..first till beforeCTe code..other one is just for finalcte..

    Issues with this : Grouping sets generates a maximum number of 4096 expression so that is one of the main disadvantage. Simialry you can have just 4096 columns in a select statement. Other limitations are below.

    http://msdn.microsoft.com/en-us/library/ms143432.aspx

    One workaorund if the number of rows are too much is that you break up into a multple problems..Like you mentioned there are around 160 dist. You can break it into 5 sets of 28..Put the results into 5 tables..Then corss join these tables to come up with final solution..

    There are way too many answer to this post so following up is quite difficult.Do you want to have dup districts?

    Means can a tupple be like this 'dist1,dist1,dist1' = 300 if dist1 has 100 rows?

    drop table a

    go

    create table a( id int not null,dst varchar(2) not null,)

    go

    insert into a(id,dst)

    select 10,'A'

    union all

    select 11,'B'

    union all

    select 20,'C'

    union all

    select 100,'D'

    go

    with mycte

    (

    id,a,b,c,d

    )as

    (

    select id,[A],,[C],[D]

    from (

    select id,dst,dst as dst1

    from a

    ) a

    pivot

    (

    max(dst1)

    for dst in ([A],,[C],[D] )

    ) as pvt

    ),beforecte

    as

    (

    select a,B,C,d,SUM(id)sm ,GROUPING_ID(a,B,C,d) grp

    from mycte

    group by cube(a,B,C,d)

    ),

    finalcte as

    (

    select case

    when grp = 0 then coalesce(a,b,c,d)

    when grp = 1 then 'd'

    when grp = 2 then 'c'

    when grp = 3 then 'c' + ',' + 'd'

    when grp = 4 then 'b'

    when grp = 5 then 'b' + ',' + 'd'

    when grp = 6 then 'b' + ',' + 'c'

    when grp = 7 then 'b' + ',' + 'c' + ',' + 'd'

    when grp = 8 then 'a'

    when grp = 9 then 'a' + ',' + 'd'

    when grp = 10 then 'a' + ',' + 'c'

    when grp = 11 then 'a' + ',' + 'c' + ',' + 'd'

    when grp = 12 then 'a' + ',' + 'b'

    when grp = 13 then 'a' + ',' + 'b' + ',' + 'd'

    when grp = 14 then 'a' + ',' + 'b' + ',' + 'c'

    when grp = 15 then 'a' + ',' + 'b' + ',' + 'c' + ',' + 'd'

    end dstlist

    ,* from beforecte

    where coalesce(a,b,c,d,'!!') = '!!'

    )

    select distinct dstlist,sm,grp

    --,a,b,c,d

    from finalcte where dstlist is not null

    order by dstlist asc

    GulliMeel

    Finding top n Worst Performing queries[/url]
    Improve the performance of Merge Join(special case)
    How to Post Performance Problem -Gail Shaw[/url]

Viewing 15 posts - 61 through 75 (of 109 total)

You must be logged in to reply to this topic. Login to reply