How to divide results into two somewhat-equal piles?

  • I don't even know what to search for to find the answer to my question.

    I have a table that lists items (fires) in categories (fire centres). I have six categories and each can have 0 or more items. The number of items varies on a daily basis.

    What I want to do is write a query that will divide them up in such a way that I have roughly the same number of items in each of two groups (columns on a web page). I've faked it right now by creating another table that lists each of the six categories and assigns them either to column 1 or column 2, but the number of items keeps changing and so I'm always having to adjust that table.

    Can anyone suggest a way I might dynamically code this?

    Here's the web page http://bcwildfire.ca/ if it helps to better understand my problem!

    Thanks!

  • You probably want to look at the NTILE windowed aggregate.

    For more infor look here:

    http://msdn.microsoft.com/en-us/library/ms175126.aspx

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Thanks, that looks promising. Is there a way to tell it to keep all of the items for one category together, though? If it has to be that one column has 30 records and the other 18 because there's no better way to break them up, so be it -- rather than making each of them have 24 records and one category is split over the two columns.

    Doing the query as listed below results in my Coastal fire centre having one record in column 1 and the rest in column 2. I need to keep all of the Coastal records together.

    create table FireList (

    FireCentreName varchar(15)

    , ProjectID int

    , DisplayName varchar(50)

    )

    go

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Prince George', 250, 'Junction of Smith and Liard River')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Kamloops', 260, 'Terrace Mountain')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Kamloops', 263, 'Mount McLean')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 266, 'Wolf River')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Northwest', 270, 'Salmon River Rd. - Kispiox Valley')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Kamloops', 271, 'Momich Lake Fires')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Cariboo', 278, 'Corkscrew Basin')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Southeast', 280, 'Galena Bay')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Kamloops', 281, 'Intlpam')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 283, 'Camel Back East')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 284, 'Copper Mt')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 286, 'Antler Lake')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 288, 'Blackcomb Mountain')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Cariboo', 289, 'Heckman Pass Area')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Cariboo', 290, 'Choelquoit Lake')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Cariboo', 291, '5 Miles E of Sugarloaf Mtn')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 294, 'Uztlius')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 295, 'North End of Pitt Lake')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Cariboo', 296, '1km S of Chilcotin River in Alexis Creek')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 299, 'Mid Slope East Side Saloompt')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Prince George', 300, 'Southwest of MacKenzie along Phillips Creek')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Prince George', 301, '9 km Southwest of McBride')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Kamloops', 302, 'Yalakom Fires')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Northwest', 303, 'West of Pimpernel Mountain')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Northwest', 304, 'North Side of the Morice River')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Kamloops', 305, 'Brookmere')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Northwest', 306, 'NE of Duck Lake')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Cariboo', 307, 'Alexis Creek')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Kamloops', 308, 'Seton Portage')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Prince George', 309, 'Manson Creek')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 310, 'Nuxalk Mountain, Bella Coola')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 311, 'Ruby Bowl, Blackcomb Mountain')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Cariboo', 312, '61 Km on 4000rd')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Southeast', 313, 'Beaton Incomappleux')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 315, 'East side of Nimpkish')

    Insert FireList (FireCentreName, ProjectId, DisplayName) values ('Coastal', 316, 'Spuzzum Creek')

    go

    select FireCentreName, ProjectID, DisplayName

    ,NTILE(2) OVER(ORDER BY FireCentreName DESC) AS 'ColumnNumber'

    from FireList

    --drop table FireList

    Results of query:

    FireCentreNameProjectIDDisplayNameColumnNumber

    Southeast280Galena Bay1

    Southeast313Beaton Incomappleux1

    Prince George309Manson Creek1

    Prince George300Southwest of MacKenzie along Phillips Creek1

    Prince George3019 km Southwest of McBride1

    Prince George250Junction of Smith and Liard River1

    Northwest270Salmon River Rd. - Kispiox Valley1

    Northwest303West of Pimpernel Mountain1

    Northwest304North Side of the Morice River1

    Northwest306NE of Duck Lake1

    Kamloops308Seton Portage1

    Kamloops305Brookmere1

    Kamloops302Yalakom Fires1

    Kamloops271Momich Lake Fires1

    Kamloops260Terrace Mountain1

    Kamloops263Mount McLean1

    Kamloops281Intlpam1

    Coastal283Camel Back East1

    Coastal284Copper Mt2

    Coastal286Antler Lake2

    Coastal288Blackcomb Mountain2

    Coastal266Wolf River2

    Coastal294Uztlius2

    Coastal295North End of Pitt Lake2

    Coastal299Mid Slope East Side Saloompt2

    Coastal310Nuxalk Mountain, Bella Coola2

    Coastal311Ruby Bowl, Blackcomb Mountain2

    Coastal315East side of Nimpkish2

    Coastal316Spuzzum Creek2

    Cariboo31261 Km on 4000rd2

    Cariboo307Alexis Creek2

    Cariboo2961km S of Chilcotin River in Alexis Creek2

    Cariboo278Corkscrew Basin2

    Cariboo289Heckman Pass Area2

    Cariboo290Choelquoit Lake2

    Cariboo2915 Miles E of Sugarloaf Mtn2

  • So what I've done on my website is the ASP code only looks at the column number when it changes categories, and ignores it from then on. So that keeps my categories together, but it doesn't necessarily end up with anything approaching a balanced column (although right now it just happens to come out pretty well)

    However, if category 1 (Coastal) had 30 records, and all the rest together had 32, the NTILE would put the first row for the second category (Cariboo) into column 1, and then I would ignore the fact that the rest of the Cariboo fires were in column 2, so it would come out pretty lopsided.

    So it's still not ideal the way I have it... but I'm getting closer.

  • I should mention that I don't care at all what sequence the categories come out in. Right now the NTILE query is causing them to be alphabetical, but that's unimportant.

  • Using the sample table from the code above, will this do the trick for you? Ordering by newID basically randomizes it. Then the row_number() is assigned and the odd numbers all go in column 1. This ensures as even a split as possible.

    Please note, if you have three rows (or any odd number) column 1 will always get more rows than column 2, because the row numbering begins with 1. We could randomly generate an integer between 0 and 9, but that would sometimes lead to one column having two+ rows more than the other column. Let me know if you would rather use that approach.

    ;with cte1 as

    (select *,row_number() over (partition by fireCentreName order by newid()) as rowno from firelist)

    ,cte2 as

    (select *, case when right(rowNo,1) in (1,3,5,7,9) then 1 else 2 end as colNo

    from cte1)

    select FireCentreName,ProjectID,DisplayName,ColNo

    from cte2

    order by FireCentreName, ProjectID

    __________________________________________________

    Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
    Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills

  • The ntile(2) function will get you close but you still have to worry about the fire center that is in both columns. This tests for that and updates the center to be in 1 or the other column.

    SELECT *, ntile(2) OVER (ORDER BY FireCentreName) AS rowNum

    INTO #temp

    FROM FireList

    DECLARE @misMatch VARCHAR(15)

    SELECT DISTINCT @misMatch = fireCentreName FROM #temp

    WHERE rowNum = 1

    AND fireCentreName IN (SELECT fireCentreName FROM #temp WHERE rowNum = 2)

    IF (SELECT COUNT(*) FROM #temp WHERE fireCentreName = @misMatch AND rowNum = 1) > (SELECT COUNT(*) FROM #temp WHERE fireCentreName = @misMatch AND rowNum = 2)

    UPDATE #temp SET rowNum = 1 WHERE FireCentreName = @misMatch

    ELSE

    UPDATE #temp SET rowNum = 2 WHERE FireCentreName = @misMatch

    SELECT * FROM #temp

  • Not guaranteed to split equally, but groups nicely:

    select FireCentreName, ProjectID, DisplayName, RANK() OVER(ORDER BY FireCentreName) % 2 + 1 AS 'ColumnNumber'

    from FireList

    order by ColumnNumber, FireCentreName -- Not required, just for display clarity

  • Thats nice. I never even thought to do a mod on the rank() function.

  • Matt Wilhoite (8/7/2009)


    Thats nice. I never even thought to do a mod on the rank() function.

    Thank you Matt :w00t:

  • "Not guaranteed to spit evenly" is a mild understatement. If you change one row of the sample data to move a station from Kamloops to Cariboo, all the groups have even counts and the entire list will show up in column 2. Using DENSE_RANK instead works better, but there still may be degenerate cases.

    Some problems call for an actual algorithm that can't be expressed in a single query. You might consider doing the column split in your web code instead of trying to do it in T-SQL, but it can be done. This code splits the sample data into two columns by assigning each group (in descending group size order) to the shortest column. It produces two columns of 18 (even if one Kamloops is changed to Cariboo).

    SELECT IDENTITY(INT,1,1) AS ID, FireCentreName, COUNT(*) AS [Count], 1 AS [Column]

    INTO #fc

    FROM dbo.FireList

    GROUP BY FireCentreName

    ORDER BY COUNT(*) DESC

    DECLARE @ID INT, @n INT, @Col1 INT, @Col2 INT

    SELECT @Col1 = 0, @Col2 = 0, @ID = 1, @n = MAX(ID) FROM #fc

    WHILE @ID <= @n BEGIN

    IF @Col1 <= @Col2

    UPDATE #fc SET [Column] = 1, @Col1 = @Col1 + [Count], @ID = @ID + 1 WHERE ID = @ID

    ELSE

    UPDATE #fc SET [Column] = 2, @Col2 = @Col2 + [Count], @ID = @ID + 1 WHERE ID = @ID

    END

    SELECT [Column], SUM([Count]) FROM #fc GROUP BY [Column]

    SELECT c.[Column], f.FireCentreName, f.ProjectID, f.DisplayName

    FROM FireList AS f

    INNER JOIN #fc AS c ON f.FireCentreName = c.FireCentreName

    ORDER BY c.[Column]

  • Here's an attempt:

    DECLARE@median SMALLINT

    , @denseRank SMALLINT

    SELECT@median = CEILING(MAX([RowNum])/2)

    FROM(SELECTROW_NUMBER() OVER (ORDER BY [FireCentreName]) [RowNum]

    FROM[FireList]) median PRINT @median

    SELECT@denseRank = MAX(dR)

    FROM(SELECTDENSE_RANK() OVER(ORDER BY [FireCentreName]) [dR]

    FROM[FireList]) denseRank PRINT @denseRank

    SELECTf.[FireCentreName]

    , f.[ProjectID]

    , f.[DisplayName]

    , DENSE_RANK() OVER(ORDER BY f.[FireCentreName]) [DenseRanked]

    , RANK() OVER(ORDER BY f.[FireCentreName]) [Ranked]

    , CASE

    WHEN @denseRank <= 2THEN DENSE_RANK() OVER(ORDER BY f.[FireCentreName])

    WHEN RANK() OVER(ORDER BY f.[FireCentreName]) < @medianTHEN 1

    ELSE 2 END [ColumnNum]

    FROM[FireList] f

    [font="Arial Narrow"]bc[/font]

  • Scott Coleman (8/7/2009)


    Using DENSE_RANK instead works better, but there still may be degenerate cases.

    As it happens, originally I did use DENSE_RANK, but changed it on a whim as I posted.

    DENSE_RANK seems to be fine - not bad considering it's one line.

  • Thanks so much for the replies, guys! I'm on holidays for a week but will try these out when I get back!

  • Beverley (8/9/2009)


    Thanks so much for the replies, guys! I'm on holidays for a week but will try these out when I get back!

    Perfect! Hey guys... I've got this really nasty problem... would you do all my work for me so I don't have to worry about it when I go on holiday next week? Thanks so much! 😛

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 15 posts - 1 through 15 (of 25 total)

You must be logged in to reply to this topic. Login to reply