Need the Best Solution for splitting into batches

  • I have the below requirement:

    I need to split the one column into 3 columns such that any one row doesn't have the same value for any two columns. I have a solution below, but it is not the minimum rows possible. i need to rearrange them to get as minimum rows as possible.

    Below are the table scripts required for the exercise.

    CREATE TABLE [dbo].[Table_2](

    [a] [int] NULL

    ) ON [PRIMARY]

    INSERT INTO [dbo].[Table_2]([a])

    SELECT 1 UNION ALL

    SELECT 1 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 2 UNION ALL

    SELECT 2 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 3 UNION ALL

    SELECT 1 UNION ALL

    SELECT 2 UNION ALL

    SELECT 4 UNION ALL

    SELECT 5 UNION ALL

    SELECT 6 UNION ALL

    SELECT 7 UNION ALL

    SELECT 8 UNION ALL

    SELECT 9 UNION ALL

    SELECT 10 UNION ALL

    SELECT 3

    Below is the solution I had written:

    DECLARE @maxPID INT

    DECLARE @pID INT=1

    CREATE TABLE #SplitBatches(

    Batch1 VARCHAR(50),

    Batch2 VARCHAR(50),

    Batch3 VARCHAR(50))

    SELECT @maxPID=MAX(pID)

    FROM

    (SELECT ROW_NUMBER() OVER(PARTITION BY a ORDER BY a) AS pID,

    a AS List

    FROM dbo.Table_2

    ) AS cl

    WHILE(@pID<=@maxPID)

    BEGIN

    ;WITH ListCTE AS

    (

    SELECT

    ROW_NUMBER() OVER(PARTITION BY a ORDER BY a) AS pID,

    CAST(a AS VARCHAR(10)) AS List

    FROM dbo.Table_2

    )

    INSERT INTO #SplitBatches

    SELECT

    ISNULL([0],''),

    ISNULL([1],''),

    ISNULL([2],'')

    FROM (SELECT

    (ROW_NUMBER()OVER(ORDER BY List)-1)/3 AS BatchNumber,

    (ROW_NUMBER()OVER(ORDER BY List)-1)%3 AS BatchOrder,

    List AS ClientListing

    FROM ListCTE WHERE pID=@pID)

    ClientListingCTE

    PIVOT (MAX(ClientListing) FOR batchorder IN ([0],[1],[2])) AS pvt

    SET @pID=@pID+1

    END

    SELECT Batch1,Batch2,Batch3 FROM #SplitBatches

    DROP TABLE #SplitBatches

    But I need the solution such that there are as minimum rows as possible in the result set.

    Please let me know if you have any suggestions.

    Thanks and Regards,

    Praveena

  • would just adding a group by to your final statement get desired results

    SELECT Batch1,Batch2,Batch3 FROM #SplitBatches

    GROUP BY Batch1,Batch2,Batch3

    Bob
    -----------------------------------------------------------------------------
    How to post to get the best help[/url]

  • SELECT

    col1 = MAX(CASE WHEN seq = 1 THEN a END),

    col2 = MAX(CASE WHEN seq = 2 THEN a END),

    col3 = MAX(CASE WHEN seq = 3 THEN a END)

    FROM (

    SELECT

    a,

    Batch = CAST(rn/3.1 AS INT),

    Seq = ISNULL(NULLIF(rn%3,0),3)

    FROM (

    SELECT

    a,

    rn = ROW_NUMBER() OVER(ORDER BY (SELECT NULL))

    FROM Table_2

    ) s

    ) b

    GROUP BY Batch

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Bob,

    thank you for the reply. I need all the 19 values even if they are duplicated(count of rows in Table_2). Just rearranged in batches of 3, such that no two columns have the same value for a row. Adding a group by would eliminate some of those values.

  • ChrisM,

    No two columns should have same value within a row. With the query you provided, below is the result set:

    111

    222

    123

    124

    567

    8910

    3NULLNULL

    First two rows have same values for all the columns.

    Thank you very much for the quick responses.

  • The solution I provided would give the results.

    1102

    345

    678

    9

    123

    12

    12

    12

    But rearranging them to something like below would give less number of rows within the results. I removed the second row and added values to the last 3 rows, thus 1 less row in the result set. Not sure if it is possible though.

    1102

    678

    9

    123

    125

    124

    12 3

    Thanks and Regards,

    Praveena.

  • venus.pvr (9/11/2012)


    ChrisM,

    No two columns should have same value within a row. With the query you provided, below is the result set:

    111

    222

    123

    124

    567

    8910

    3NULLNULL

    First two rows have same values for all the columns.

    Thank you very much for the quick responses.

    Yes, you are right. Try this.

    SELECT a.a, b.a, c.a

    FROM (

    SELECT TOP 33 percent a, rn = ROW_NUMBER() OVER(ORDER BY a)

    FROM Table_2 ORDER BY a ASC

    ) a

    LEFT JOIN (

    SELECT TOP 50 percent a, rn = ROW_NUMBER() OVER(ORDER BY a)

    FROM (SELECT TOP 66 percent a FROM Table_2 ORDER BY a DESC) d ORDER BY a ASC

    ) b ON b.rn = a.rn

    LEFT JOIN (

    SELECT a, rn = ROW_NUMBER() OVER(ORDER BY a)

    FROM (SELECT TOP 33 percent a FROM Table_2 ORDER BY a DESC) d

    ) c ON c.rn = a.rn

    It ain't perfect and won't work with every set - it depends upon the distribution of the data. But it works with your sample.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply