SELECTing values on different records based on rules

  • Hey everyone,

    I've been trying to wrap my mind around a problem lately, and I can't make it. Out of desperation, I turn to you for new ideas to accomplish the task.

    First, a quick recap of my situation :

    I must produce a specific ResultSet from a SQL query. There are rules about that ResultSet.

    1 - A column value must not exceed a maximum value defined by column (for example, the value of c30 must not exceed 999.99, the value of c32 must not exceed 9999.99, etc).

    2 - There must be a maximum of 6 values per record.

    In either case, the solution is to add a new record and copy the rest of the values in there. For instance, if I must write 1500 in c30 and the limit is 999.99, I'll have a record with 999.99 and a second record with 500.01.

    I handled the first rule on my own (for the record, I made a temp table aside containing the maximum values per columns and I joined it on a Tally table using the result of a division between the actual value and the maximum, therefore creating enough records every time). The one that's been giving me problems is the second one. I keep having problems to make a query that will transform the data the way I want.

    Here's two small test cases, for testing purposes.

    DECLARE @test-2 TABLE(c30 money, c32 money, c33 money, c34 money, c40 money, c71 money, c84 money, c85 money)

    --Test values #1

    DELETE @test-2

    INSERT @test-2

    SELECT 9999.99, 98.54, 847.55, 99999.99, 34.55, 365.74, 8462.57, 0 UNION ALL

    SELECT 2587.4, 0, 0, 18.42, 0, 0, 0, 0

    SELECT * FROM @test-2

    --Expected result for values #1

    SELECT 9999.99, 98.54, 847.55, 99999.99, 34.55, 365.74,0, 0 UNION ALL

    SELECT 2587.4, 0, 0, 18.42, 0, 0, 8462.57, 0

    --Test values #2

    DELETE @test-2

    INSERT @test-2

    SELECT 9999.99, 99.99, 999.99, 99999.99, 99.99, 9999.99, 8462.57, 0 UNION ALL

    SELECT 2587.4, 98.54, 847.55, 18.42, 34.55, 365.74, 0, 0

    SELECT * FROM @test-2

    --Expected result for values #2

    SELECT 9999.99, 99.99, 999.99, 99999.99, 99.99, 9999.99, 0, 0 UNION ALL

    SELECT 2587.4, 98.54, 847.55, 18.42, 34.55, 365.74, 0, 0 UNION ALL

    SELECT 0, 0, 0, 0, 0, 0, 8462.57, 0

    Does anyone have an idea on how I could tackle this problem?

  • :blink: :blink: :blink: Uhh... Huh? :ermm: :ermm: :ermm:

    In either case, the solution is to add a new record and copy the rest of the values in there. For instance, if I must write 1500 in c30 and the limit is 999.99, I'll have a record with 999.99 and a second record with 500.01.

    That. Makes sense.

    Everything else about the post is extremely confusing. Your rules don't correlate to your test data in any way that I can see (you state c30 has a max of 999.99 but your actual number is 9999.99).

    Did you perhaps forget to include the "rules" for each given column in this example? Cuz I don't see anything indicating the maximum per column.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • OK, I think I see part of it. The 8462.57 value is creating its own line because it is the 7th value (zeroes not counted apparently) in the row and you only allow 6. Correct?

    What in the world do you need something like this for? Before people here spend a bunch of time attempting to do something that should never be done in SQL, I think it'd be good to convince people they're not just wasting their time.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Other than the confusion - I can't help but think we've got a normalization disaster going on with is making this problem much harder to handle. Whe're the relationship between the values you're breaking into smaller chunks and said smaller chunks (so - when one changes, you can adjust, etc...). Why do we have the same piece of data repeated 8 times in a single row (violation of normalization)?

    More information as to what you're ACTUALLY doing (vs. this particular implementation) might help get you better ideas (and ones designed to work well with SQL server).

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • The query is designed to be used by the front-end application to generate a data file. This part is the final portion, where I take the data I've compiled so far and apply the rules to it before returning it.

    The columns (c30, c32, c34, c35, etc) are simply different values of a record I need to send to the application. I have a record per employee. The example values are for a single employee, in the end I'll have to handle multiple employees, but I figure if I can get something working with a single record, it's gonna be easy to do so for multiple records.

    Also, the columns are NOT coming straight from tables - they're computed in a temporary table from a lot of tables using some more logic. The result is a set of values (c30, c32, c34, c35, etc) per employees, and I must apply the last two rules to them before sending it back to the application.

    As for the example I've given to the limit of c30, I'm aware that it does not match the example data I've sent. The portion about the limit is already handled, and is explained only to show the case where a value would already be splitted in two records without confusing everyone (something that, I fear, I've failed at).

    Sorry for the confusion. I hope it's a bit better now.

  • Jonathan Melo (10/14/2009)


    The example values are for a single employee, in the end I'll have to handle multiple employees, but I figure if I can get something working with a single record, it's gonna be easy to do so for multiple records.

    While that is not necessarily a false statement, by not including the details on things like that, and not giving us the part of the solution you've already written, you've more or less ensured this thing will be massive RBAR. We don't know what you've already done (to see if we can combine steps) and we don't know how this data is keyed per employee to possibly do something like this at a set level as opposed to trying to make it work for one person. So while you will likely be able to throw the whole thing in a cursor, you're not likely to get a performant method.

    That said, that objection may be mostly academic in nature. I'll think about this for a bit and see what I can come up with.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • So I thought about it for a while, this is what I came up with. I'll state up front that I can already see half a dozen ways to improve this / clean it up / make it better, but I'll leave it here for now. I'll post this in the body for people who want to read it, and as an attachment so that the formatting isn't completely boned.

    One thing of note is that this does not use any logic to determine what the best value to move down a row is when there are more than 6 values in a row. That was not stated in the original problem so I didn't add it in. Without this, you aren't efficiently utilizing all of your columns and could fail to move rows around.

    If anyone wants to take this a bit further and optimize sections or get rid of the cursor, please post back the final script when you're done.

    /*

    =============================================================================================

    CREATE DATE: 10/15/2009

    LAST MODIFIED:10/15/2009

    CREATED BY:SETH PHELABAUM

    PURPOSE:Do crazy things for reasons I don't understand

    NOTES:This assumes a hard-coded limit of $5000 for all columns(can obviously be expanded upon).

    The unpivot could be made dynamic.

    There's way too much hardcoded. Things like Max Values Per Row should be a variable set once at the top.

    ISSUES:Uses a cursor and GOTO's for flow control. This could likely be improved.

    The cursor and high number of updates will likely cause this to scale very poorly.

    Revision History:

    Date ByChange Made

    -------- --- -------------------------------------

    =============================================================================================

    */

    SET ANSI_WARNINGS OFF

    SET NOCOUNT ON

    ------------------ DECLARE VARIABLES --------------------

    DECLARE @Seqint,

    @Identint,

    @ColNamevarchar(10),

    @ColNumint,

    @Amountmoney,

    @ColCountint,

    @OverFlowint,

    @ChangeMadebit

    --------------------- Test Data Setup ---------------------

    DECLARE @test-2 TABLE(Ident int, c30 money, c32 money, c33 money, c34 money, c40 money, c71 money, c84 money, c85 money)

    INSERT @test-2

    SELECT 1, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 0 UNION ALL

    SELECT 1, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

    SELECT 1, 6000,0,0,0,0,0,0,0 UNION ALL

    SELECT 2, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 500 UNION ALL

    SELECT 2, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

    SELECT 2, 6000,0,0,0,0,0,0,0

    SELECT * FROM @test-2

    -----------------------------------------------------------

    -------- Add Sequencing Column and Sequence Rows ---------- (Would use RowNumber in 2K5+)

    SELECT CAST(0 AS int) seq, *

    INTO #Test

    FROM @test-2

    CREATE CLUSTERED INDEX Quirk ON #Test (Ident)

    SET @Seq = 0

    UPDATE #Test

    SET @Seq = Seq = CASE WHEN @Ident = Ident THEN @Seq + 1 ELSE 1 END,

    @Ident = Ident

    FROM #Test OPTION (MAXDOP 1)

    --SELECT * FROM #Test

    -----------------------------------------------------------

    ----------------- Add Additional Rows ---------------------

    INSERT INTO #Test

    SELECT T2.N, T2.Ident, c30, c32, c33, c34, c40, c71, c84, c85

    FROM (SELECT Ident, N

    FROM (SELECT DISTINCT IDent FROM #Test) T, Tally T1

    WHERE T1.N <= 6) T2

    LEFT JOIN #Test T ON T.Ident = T2.Ident AND T.Seq = T2.N

    WHERE T.Ident IS NULL

    --SELECT * FROM #test

    -----------------------------------------------------------

    ----------------- Unpivot the Data ------------------------

    CREATE TABLE #UPTest(

    UPIdentityint IDENTITY(1,1),

    Identint,

    Seqint,

    ColNamevarchar(10),

    ColNumint,

    Amountmoney,

    ColCountint,

    OverFlowint)

    INSERT INTO #UPTest(Ident, Seq, ColName, ColNum, Amount, ColCount, Overflow)

    SELECT Ident, Seq, 'C30', 1, C30, 0, 0 FROM #Test UNION ALL

    SELECT Ident, Seq, 'C32', 2, C32, 0, 0 FROM #Test UNION ALL

    SELECT Ident, Seq, 'C33', 3, C33, 0, 0 FROM #Test UNION ALL

    SELECT Ident, Seq, 'C34', 4, C34, 0, 0 FROM #Test UNION ALL

    SELECT Ident, Seq, 'C40', 5, C40, 0, 0 FROM #Test UNION ALL

    SELECT Ident, Seq, 'C71', 6, C71, 0, 0 FROM #Test UNION ALL

    SELECT Ident, Seq, 'C84', 7, C84, 0, 0 FROM #Test UNION ALL

    SELECT Ident, Seq, 'C85', 8, C85, 0, 0 FROM #Test

    CREATE CLUSTERED INDEX CI_UpdTest ON #UPTest (Ident, Seq, ColName, Amount)

    -----------------------------------------------------------

    ----------------- Identify Problems -----------------------

    UPDATE #UPTest

    SET ColCount = UP2.CC,

    OverFlow = CASE WHEN Amount > 5000 THEN 1 ELSE 0 END

    FROM #UPTest

    INNER JOIN (SELECT ident, Seq, COUNT(ColName) CC

    FROM #UPTest

    WHERE ISNULL(Amount,0.00) <> 0.00

    GROUP BY Ident, Seq) UP2

    ON #UPTest.Ident = UP2.Ident AND #UPTest.Seq = UP2.Seq

    --SELECT * FROM #UPTest

    -----------------------------------------------------------

    ---- If there are any issues, fix, otherwise Finalize -----

    IF EXISTS (SELECT * FROM #UPTest WHERE OverFlow = 1 OR ColCount > 6)

    BEGIN

    SET @ChangeMade = 1

    GOTO ProblemFix

    END

    ELSE

    GOTO Finalize

    ---------------- Fix Problems ----------------------------- (Crazy RBAR :: Focus here for improvement)

    --(I'm not convinced this can't be done in a set based manner, but unfortunately, I'm out of time to spend on this.)

    ProblemFix:

    IF @ChangeMade = 0

    GOTO Finalize

    ELSE

    SET @ChangeMade = 0

    BEGIN

    PRINT 'Cursor Starting'

    DECLARE IDCursor CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY

    FOR

    SELECT Ident, Seq, ColName, ColNum, Amount, ColCount, OverFlow

    FROM #UPTest

    WHERE (ColCount > 6 AND ColNum > 6)

    OR OverFlow = 1

    ORDER BY Ident, Seq, OverFlow DESC, ColCount

    OPEN IDCursor

    FETCH NEXT FROM IDCursor INTO @Ident, @Seq, @ColName, @ColNum, @Amount, @ColCount, @Overflow

    WHILE @@FETCH_STATUS = 0

    BEGIN

    --------------- Too Many Column Fixes ---------------------

    IF @ColCount > 6 AND @ColNum > 6

    AND EXISTS (SELECT * FROM #UPTEST

    WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)

    BEGIN

    UPDATE #UPTest

    SET Amount = 0.00, --Max Allowed Value

    ColCount = ColCount - 1,

    OverFlow = 0

    WHERE ColName = @ColName AND Ident = @Ident AND Seq = @Seq

    UPDATE #UPTest

    SET Amount = @Amount,

    OverFlow = CASE WHEN @Amount > 5000 THEN 1 ELSE 0 END

    FROM #UPTest

    WHERE ColName = @ColName

    AND Ident = @Ident

    AND Seq = (SELECT MIN(Seq)

    FROM #UPTest

    WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)

    SET @ChangeMade = 1 -- Additional Flow Control

    SET @OverFlow = 0 -- Need to do this one on another run through

    END

    -----------------------------------------------------------

    --------------- Number Too Big Fixes ----------------------

    IF @OverFlow = 1

    AND EXISTS (SELECT * FROM #UPTEST

    WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)

    BEGIN

    UPDATE #UPTest

    SET Amount = 5000.00, --Max Allowed Value

    OverFlow = 0

    WHERE ColName = @ColName AND Ident = @Ident AND Seq = @Seq

    UPDATE #UPTest

    SET Amount = @Amount - 5000.00,

    OverFlow = CASE WHEN @Amount - 5000.00 > 5000.00 THEN 1 ELSE 0 END

    FROM #UPTest

    WHERE ColName = @ColName

    AND Ident = @Ident

    AND Seq = (SELECT MIN(Seq)

    FROM #UPTest

    WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)

    SET @ChangeMade = 1 -- Additional Flow Control

    END

    -----------------------------------------------------------

    FETCH NEXT FROM IDCursor INTO @Ident, @Seq, @ColName, @ColNum, @Amount, @ColCount, @Overflow

    END

    CLOSE IDCursor

    DEALLOCATE IDCursor

    IF EXISTS (SELECT * FROM #UPTest WHERE OverFlow = 1 OR ColCount > 6)

    GOTO ProblemFix

    ELSE

    GOTO Finalize

    END

    Finalize: -- Repivot

    SELECT Ident, C30, C32, C33, C34, C40, C71, C84, C85

    FROM (

    SELECT Ident, Seq,

    SUM(CASE WHEN ColName = 'C30' THEN Amount ELSE 0 END) C30,

    SUM(CASE WHEN ColName = 'C32' THEN Amount ELSE 0 END) C32,

    SUM(CASE WHEN ColName = 'C33' THEN Amount ELSE 0 END) C33,

    SUM(CASE WHEN ColName = 'C34' THEN Amount ELSE 0 END) C34,

    SUM(CASE WHEN ColName = 'C40' THEN Amount ELSE 0 END) C40,

    SUM(CASE WHEN ColName = 'C71' THEN Amount ELSE 0 END) C71,

    SUM(CASE WHEN ColName = 'C84' THEN Amount ELSE 0 END) C84,

    SUM(CASE WHEN ColName = 'C85' THEN Amount ELSE 0 END) C85,

    SUM(Amount) RowSum

    FROM #UPTest

    GROUP BY Ident, Seq) A

    WHERE RowSum > 0

    ORDER BY Ident, Seq

    DROP TABLE #UPTest

    DROP TABLE #Test

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • First, Garadin, I'd like to thank you a lot for the effort you're putting in this problem. It looks like you went through a lot of trouble to make that script, and I appreciate it!

    Now, to the problem itself... the "crazy things for a reason you don't understand" do have a valid reason behind them 😀 This procedure is supposed to gather and format data to send back to the front-end application, so it can generate a file that will be sent to a third party. The file being in XML, it has it's own rules that I must follow, and for design reasons, the application should do the bare minimum (which is, take the recordset and fill the XML file with it, no extra data manipulation).

    I do understand that SQL isn't the perfect choice to handle these kind of limitations, but I know it's feasible and the design of the solution flows a lot better it it's made that way.

    As for the code so far, I've been working on it on my side since I've posted. I came up with something that works quite well, but doesn't handle every situation it needs to (still having problems when I get over 2 rows generated, the values just won't follow). However, it doesn't make use of any cursor (alright, I cheated juste a little, since I loop to execute some code once per column), no GOTO (this time, no cheating), and as far as I know no RBAR.

    I've made a cutdown example (using your numbers and limits as test data) that also includes my solution to handle the limits on the values :

    /*

    Note : Sys_Sequence is the tally table. It starts at 0 and goes up. Here's a small script to create if if need be. DON'T FORGET TO CLEAN UP!

    CREATE TABLE Sys_Sequence(Number int PRIMARY KEY)

    DECLARE @i int

    SELECT @i = 0

    WHILE @i <= 200

    BEGIN

    INSERT Sys_Sequence

    SELECT @i

    SELECT @i = @i + 1

    END

    */

    -- Cautious pre-run cleanup

    IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL

    DROP TABLE #Temp_Values

    IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL

    DROP TABLE #Multiplied_Values

    -- Test data table

    CREATE TABLE #Temp_Values

    (

    FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

    PeopleId int,

    Case_30 money,

    Case_30_MaxValue money DEFAULT(5000), -- Determines the maximum value for the equivalent column

    Case_30_SlipCount AS Floor((Case_30-.01) / Case_30_MaxValue), -- Determines the number of slips we'll need to write that value

    Case_32 money,

    Case_32_MaxValue money DEFAULT(5000),

    Case_32_SlipCount AS Floor((Case_32-.01) / Case_32_MaxValue),

    Case_33 money,

    Case_33_MaxValue money DEFAULT(5000),

    Case_33_SlipCount AS Floor((Case_33-.01) / Case_33_MaxValue),

    Case_34 money,

    Case_34_MaxValue money DEFAULT(5000),

    Case_34_SlipCount AS Floor((Case_34-.01) / Case_34_MaxValue),

    Case_40 money,

    Case_40_MaxValue money DEFAULT(5000),

    Case_40_SlipCount AS Floor((Case_40-.01) / Case_40_MaxValue),

    Case_71 money,

    Case_71_MaxValue money DEFAULT(5000),

    Case_71_SlipCount AS Floor((Case_71-.01) / Case_71_MaxValue),

    Case_84 money,

    Case_84_MaxValue money DEFAULT(5000),

    Case_84_SlipCount AS Floor((Case_84-.01) / Case_84_MaxValue),

    Case_85 money,

    Case_85_MaxValue money DEFAULT(5000),

    Case_85_SlipCount AS Floor((Case_85-.01) / Case_85_MaxValue),

    PRIMARY KEY(FederalEmployer, PeopleId)

    )

    -- Final data table

    CREATE TABLE #Multiplied_Values

    (

    FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

    PeopleId int,

    MultiSlipCounter int,

    OtherCount int DEFAULT(0),

    Case_30 money DEFAULT(0),

    Case_32 money DEFAULT(0),

    Case_33 money DEFAULT(0),

    Case_34 money DEFAULT(0),

    Case_40 money DEFAULT(0),

    Case_71 money DEFAULT(0),

    Case_84 money DEFAULT(0),

    Case_85 money DEFAULT(0),

    PRIMARY KEY(FederalEmployer, PeopleId, MultiSlipCounter)

    )

    -- Insert test data

    INSERT #Temp_Values

    (

    FederalEmployer,

    PeopleId,

    Case_30,

    Case_32,

    Case_33,

    Case_34,

    Case_40,

    Case_71,

    Case_84,

    Case_85

    )

    SELECT '1', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 0 UNION ALL

    SELECT '1', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

    SELECT '1', 5, 6000,0,0,0,0,0,0,0 UNION ALL

    SELECT '2', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 500 UNION ALL

    SELECT '2', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

    SELECT '2', 5, 6000,0,0,0,0,0,0,0

    SELECT * FROM #Temp_Values

    -- Multiply the records to respect the max value per column

    INSERT #Multiplied_Values

    SELECT

    FederalEmployer,

    PeopleId,

    Sys_Sequence.Number,

    0,

    -- If we're after the number of slip we'll need to write the value, write 0

    -- If we're exactly on the number of slip, write the value minus the maximum value multiplied by the number of slips already done so far

    -- If we're before the number of slip we'll need to write the value, write the maximum value

    CASE WHEN Number > Case_30_SlipCount THEN 0 WHEN Number = Case_30_SlipCount THEN Case_30 - (Case_30_MaxValue * Number) ELSE Case_30_MaxValue END,

    CASE WHEN Number > Case_32_SlipCount THEN 0 WHEN Number = Case_32_SlipCount THEN Case_32 - (Case_32_MaxValue * Number) ELSE Case_32_MaxValue END,

    CASE WHEN Number > Case_33_SlipCount THEN 0 WHEN Number = Case_33_SlipCount THEN Case_33 - (Case_33_MaxValue * Number) ELSE Case_33_MaxValue END,

    CASE WHEN Number > Case_34_SlipCount THEN 0 WHEN Number = Case_34_SlipCount THEN Case_34 - (Case_34_MaxValue * Number) ELSE Case_34_MaxValue END,

    CASE WHEN Number > Case_40_SlipCount THEN 0 WHEN Number = Case_40_SlipCount THEN Case_40 - (Case_40_MaxValue * Number) ELSE Case_40_MaxValue END,

    CASE WHEN Number > Case_71_SlipCount THEN 0 WHEN Number = Case_71_SlipCount THEN Case_71 - (Case_71_MaxValue * Number) ELSE Case_71_MaxValue END,

    CASE WHEN Number > Case_84_SlipCount THEN 0 WHEN Number = Case_84_SlipCount THEN Case_84 - (Case_84_MaxValue * Number) ELSE Case_84_MaxValue END,

    CASE WHEN Number > Case_85_SlipCount THEN 0 WHEN Number = Case_85_SlipCount THEN Case_85 - (Case_85_MaxValue * Number) ELSE Case_85_MaxValue END

    FROM

    #Temp_Values AS Temp_Values

    INNER JOIN Sys_Sequence

    ON Sys_Sequence.Number <= Case_30_SlipCount

    OR Sys_Sequence.Number <= Case_32_SlipCount

    OR Sys_Sequence.Number <= Case_33_SlipCount

    OR Sys_Sequence.Number <= Case_34_SlipCount

    OR Sys_Sequence.Number <= Case_40_SlipCount

    OR Sys_Sequence.Number <= Case_71_SlipCount

    OR Sys_Sequence.Number <= Case_84_SlipCount

    OR Sys_Sequence.Number <= Case_85_SlipCount

    -- Declare the variable for the next part

    DECLARE @MultipliedCounter int

    SELECT @MultipliedCounter = 0

    -- For each of the 8 columns, validate if we need to move the info to another record (max 6 info per record)

    WHILE @MultipliedCounter < 8

    BEGIN

    -- Update the count of info found per record so far

    UPDATE #Multiplied_Values

    SET

    OtherCount = CASE WHEN @MultipliedCounter >= 0 AND Case_30 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN @MultipliedCounter >= 1 AND Case_32 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN @MultipliedCounter >= 2 AND Case_33 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN @MultipliedCounter >= 3 AND Case_34 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN @MultipliedCounter >= 4 AND Case_40 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN @MultipliedCounter >= 5 AND Case_71 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN @MultipliedCounter >= 6 AND Case_84 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN @MultipliedCounter >= 7 AND Case_85 <> 0 THEN 1 ELSE 0 END

    -- Insert an empty record when we detect we'll need one

    INSERT #Multiplied_Values(FederalEmployer, PeopleId, MultiSlipCounter)

    SELECT

    Multi.FederalEmployer,

    Multi.PeopleId,

    Multi.MultiSlipCounter+1

    FROM

    #Multiplied_Values AS Multi

    LEFT JOIN #Multiplied_Values AS Next

    ON Next.FederalEmployer = Multi.FederalEmployer AND

    Next.PeopleId = Multi.PeopleId AND

    Next.MultiSlipCounter = Multi.MultiSlipCounter + 1

    WHERE

    -- Only if there is no next record

    Next.PeopleId IS NULL AND

    -- And the current record is not only full (at 6), but also has another value to add (7 and greater)

    Multi.OtherCount > 6

    -- The "lower records" will pull the data from the record above them if they still have room and the "top record" is full

    -- At the same time, if a "top record" is full and there's a value, we clear it

    UPDATE Multi

    SET

    Case_30 = CASE @MultipliedCounter WHEN 0 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_30 ELSE 0 END ELSE Multi.Case_30 END,

    Case_32 = CASE @MultipliedCounter WHEN 1 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_32 ELSE 0 END ELSE Multi.Case_32 END,

    Case_33 = CASE @MultipliedCounter WHEN 2 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_33 ELSE 0 END ELSE Multi.Case_33 END,

    Case_34 = CASE @MultipliedCounter WHEN 3 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_34 ELSE 0 END ELSE Multi.Case_34 END,

    Case_40 = CASE @MultipliedCounter WHEN 4 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_40 ELSE 0 END ELSE Multi.Case_40 END,

    Case_71 = CASE @MultipliedCounter WHEN 5 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_71 ELSE 0 END ELSE Multi.Case_71 END,

    Case_84 = CASE @MultipliedCounter WHEN 6 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_84 ELSE 0 END ELSE Multi.Case_84 END,

    Case_85 = CASE @MultipliedCounter WHEN 7 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_85 ELSE 0 END ELSE Multi.Case_85 END

    FROM

    #Multiplied_Values AS Multi

    LEFT JOIN #Multiplied_Values AS Prev

    ON Prev.FederalEmployer = Multi.FederalEmployer AND

    Prev.PeopleId = Multi.PeopleId AND

    Prev.MultiSlipCounter + 1 = Multi.MultiSlipCounter AND

    CASE @MultipliedCounter

    WHEN 0 THEN Prev.Case_30

    WHEN 1 THEN Prev.Case_32

    WHEN 2 THEN Prev.Case_33

    WHEN 3 THEN Prev.Case_34

    WHEN 4 THEN Prev.Case_40

    WHEN 5 THEN Prev.Case_71

    WHEN 6 THEN Prev.Case_84

    WHEN 7 THEN Prev.Case_85

    END <> 0

    WHERE

    -- Only if the current record is empty (to import the data) or full (to clear the data)

    NOT (Multi.OtherCount BETWEEN 1 AND 6) OR

    -- Else, there must be no data present if we want to import something in this line

    CASE @MultipliedCounter

    WHEN 0 THEN Multi.Case_30

    WHEN 1 THEN Multi.Case_32

    WHEN 2 THEN Multi.Case_33

    WHEN 3 THEN Multi.Case_34

    WHEN 4 THEN Multi.Case_40

    WHEN 5 THEN Multi.Case_71

    WHEN 6 THEN Multi.Case_84

    WHEN 7 THEN Multi.Case_85

    END = 0

    SELECT @MultipliedCounter = @MultipliedCounter + 1

    END

    SELECT * FROM #Multiplied_Values

    -- Final cleanup

    IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL

    DROP TABLE #Multiplied_Values

    IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL

    DROP TABLE #Temp_Values

    As a side note, your script has problems splitting values twice over the limit. It just lets them slip by, it seems ;-).

  • Jonathan Melo (10/16/2009)


    First, Garadin, I'd like to thank you a lot for the effort you're putting in this problem. It looks like you went through a lot of trouble to make that script, and I appreciate it!

    It was a fun challenge. At some point if I have a bit more time I'll look at removing the cursor and cleaning this up a lot(Mainly for my own peace of mind, I understand you are already working on an alternate solution). This was a very rough draft, but wanted to throw it out there in case you were still struggling.

    Jonathan Melo (10/16/2009)


    Now, to the problem itself... the "crazy things for a reason you don't understand" do have a valid reason behind them 😀 This procedure is supposed to gather and format data to send back to the front-end application, so it can generate a file that will be sent to a third party. The file being in XML, it has it's own rules that I must follow, and for design reasons, the application should do the bare minimum (which is, take the recordset and fill the XML file with it, no extra data manipulation).

    I assumed there *were* reasons behind it, I was just noting that I didn't know what they were :hehe:. I tend to be a little frivolous in my script headers, especially if I'm not happy with where it's at in terms of performance or design currently. I have a fairly large hatred of cursors, so this one falls firmly into that category =).

    Jonathan Melo (10/16/2009)


    As for the code so far, I've been working on it on my side since I've posted. I came up with something that works quite well, but doesn't handle every situation it needs to (still having problems when I get over 2 rows generated, the values just won't follow). However, it doesn't make use of any cursor (alright, I cheated juste a little, since I loop to execute some code once per column), no GOTO (this time, no cheating), and as far as I know no RBAR.

    I'd definitely be interested to see how they matched up on a larger dataset. The tally join in yours is pretty hefty and seems to me to be massive RBAR. Hidden RBAR in code that looks set based is often more dangerous than just straight out loops, so be careful there =). I can't really test these side by side yet as yours still returns different results than mine, but yours is actually a bit slower on these 6 rows of test data. Being slower on 6 rows doesn't mean that yours won't blow it away on a larger dataset though, which is the normal problem with cursors. Out of curiosity, (I haven't had a chance to examine yours in full detail yet), but are you attempting to handle moving the best piece of data in any given row so that you don't overflow your 6 allotted rows?

    Jonathan Melo (10/16/2009)


    As a side note, your script has problems splitting values twice over the limit. It just lets them slip by, it seems ;-).

    It seems to work properly for me. It does stop at 6 rows though and won't error, it just stops processing. For example, changing c30 to 12999.99 instead of 9999.99 generates 2 5000 rows and 1 2999.99 row in my tests. What test data did you use on mine that is coming out incorrectly?

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (10/16/2009)


    I assumed there *were* reasons behind it, I was just noting that I didn't know what they were :hehe:. I tend to be a little frivolous in my script headers, especially if I'm not happy with where it's at in terms of performance or design currently. I have a fairly large hatred of cursors, so this one falls firmly into that category =).

    I despise cursors myself. When I'm really, really stuck to go row by row, I use a WHILE construct with a key instead. Somewhere, in my head, there's a voice telling me that it's safer and possibly faster to do it this way. Go figure.

    Garadin (10/16/2009)


    I'd definitely be interested to see how they matched up on a larger dataset. The tally join in yours is pretty hefty and seems to me to be massive RBAR. Hidden RBAR in code that looks set based is often more dangerous than just straight out loops, so be careful there =). I can't really test these side by side yet as yours still returns different results than mine, but yours is actually a bit slower on these 6 rows of test data. Being slower on 6 rows doesn't mean that yours won't blow it away on a larger dataset though, which is the normal problem with cursors.

    I was under the impression that joining on the tally table this way was the trick to avoid RBAR, since it only joins one time when needed, but will join twice or more times when I need to duplicate rows. However, in the second part of the script (where I try to apply the "max 6 values per record" rule), I perform a WHILE based on columns, and I increment, only affecting the column I want every time. I suppose, then, that this is actually a switched RBAR, or a CBAC (Column-by-Agonizing-Column) of some sort.

    As for the different results, my script uses a FederalEmployer and a PeopleId as a key. I used your test data, but I assigned the keys differently to fit my situation.

    Garadin (10/16/2009)


    Out of curiosity, (I haven't had a chance to examine yours in full detail yet), but are you attempting to handle moving the best piece of data in any given row so that you don't overflow your 6 allotted rows?

    Not quite sure of what you mean exactly, but my script will move data and fill every record to the max before creating a new one, yes. The only problem is, it has trouble to handle moving a multi-record value correctly. When running the script, you'll see the behavior if you look at FederalEmployer 1 and PeopleId 3, column Case_84. The value is correctly split by the code into multiple rows, but since this one has to shift on a new record and it's composed of multiple rows, there's a 5000 that gets lost. This is where I'm currently stuck.

    Garadin (10/16/2009)


    It seems to work properly for me. It does stop at 6 rows though and won't error, it just stops processing. For example, changing c30 to 12999.99 instead of 9999.99 generates 2 5000 rows and 1 2999.99 row in my tests. What test data did you use on mine that is coming out incorrectly?

    Try putting 16999.99 instead, you'll see a 5000, 5000 and a 6999.99 trailing at the end, while it should be 5000, 5000, 5000, 1999.99. 🙂

  • Jonathan Melo (10/16/2009)


    I despise cursors myself. When I'm really, really stuck to go row by row, I use a WHILE construct with a key instead. Somewhere, in my head, there's a voice telling me that it's safer and possibly faster to do it this way. Go figure.

    Eh, for the most part, a loop is a loop IMO. Some RBAR is actually much worse than curors/while loops would ever be and it looks completely innocuous. Triangular Joins, correlated subqueries, cross joins and certain placings of UDF's come to mind as very common culprits that are often overlooked until your database asplodes.

    Jonathan Melo (10/16/2009)


    I was under the impression that joining on the tally table this way was the trick to avoid RBAR, since it only joins one time when needed, but will join twice or more times when I need to duplicate rows. However, in the second part of the script (where I try to apply the "max 6 values per record" rule), I perform a WHILE based on columns, and I increment, only affecting the column I want every time. I suppose, then, that this is actually a switched RBAR, or a CBAC (Column-by-Agonizing-Column) of some sort.

    Tally tables *can* be used to avoid loops, but they don't automatically make the query set based. In my tests, that tally table section is 70% of the total cost. I worry whenever I see <,> or 'OR' in a join. In most cases it may be unfounded concern though. In this particular case, I think the issue is that you're not stating a hard value limiter for Number and it's doing a table scan. Adding a WHERE Number < 50 AND and putting the rest of your join criteria in parens speeds the entire thing up drastically.

    Jonathan Melo (10/16/2009)


    As for the different results, my script uses a FederalEmployer and a PeopleId as a key. I used your test data, but I assigned the keys differently to fit my situation.

    I guess I'm confused on your output. I see where the issue is with chopping off 5000, but where are the other values like 847.55 going?

    Jonathan Melo (10/16/2009)


    Not quite sure of what you mean exactly, but my script will move data and fill every record to the max before creating a new one, yes. The only problem is, it has trouble to handle moving a multi-record value correctly. When running the script, you'll see the behavior if you look at FederalEmployer 1 and PeopleId 3, column Case_84. The value is correctly split by the code into multiple rows, but since this one has to shift on a new record and it's composed of multiple rows, there's a 5000 that gets lost. This is where I'm currently stuck.

    What I mean is this. Say for this example you have a limit of 3 fields per row, a limit of 3 rows and a limit of 20 per field.

    If the program was really intelligent, it would handle a situation like this:

    Start Data

    10101030<-- 4 rows, 4th column has too high a value. Basically need 2 rows to

    0055<-- resolve this currently. One to move the whole value to and 1 more to split it.

    00100

    My Version

    1010100

    0055<-- Moves the fourth value because the row already has 3.

    001030<-- The 30 cannot be split because we are out of rows.

    Smarter Version

    1001020<-- Moves the 10 in col2 down 1 instead of moving col4, splits col4 in place.

    01055

    001010

    That seems like it complicates this task severely though.

    Jonathan Melo (10/16/2009)


    Try putting 16999.99 instead, you'll see a 5000, 5000 and a 6999.99 trailing at the end, while it should be 5000, 5000, 5000, 1999.99. 🙂

    It only does this if you leave 6000 in the third record as well, at which point it cannot resolve that last split because it would put it at 7 rows. If you remove the 6000, it processes properly because it has enough rows to do so.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • How's this coming John? Get it all figured out?

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Yeah, I've been thinking about it over the weekend, and had a flash monday evening in the shower that made me jump to my current version.

    As of now, it works. It will split any values larger than the specified limit into multiple records, and will make sure there are never more than 6 values on the same record by moving the extra values to another record (creating it if need be). It also meets my other goals, which means no cursors, no hard limits (besides the hardcode 20, which I'll patch up to be dynamic in the short future), somewhat easy to maintain (e.g. if I have to add a new column, which is the most probable case) and it optimizes the split to create the less possible records.

    For the purpose of the stripped-down version I'll show as an example, I used again the values you put into your first script, but I group them by FederalEmployer and PeopleId, which matches my way of handling the data. The results vary a bit from your example, but if you put a unique identifier on every record of your test data (from 1 to 6), they match up (except my script has optimized the split and uses 2 records where yours uses 3).

    /*

    Note : Sys_Sequence is the tally table. It starts at 0 and goes up. Here's a small script to create it if need be. DON'T FORGET TO CLEAN UP!

    CREATE TABLE Sys_Sequence(Number int PRIMARY KEY)

    DECLARE @i int

    SELECT @i = 0

    WHILE @i <= 200

    BEGIN

    INSERT Sys_Sequence

    SELECT @i

    SELECT @i = @i + 1

    END

    */

    -- Cautious pre-run cleanup

    IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL

    DROP TABLE #Temp_Values

    IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL

    DROP TABLE #Multiplied_Values

    IF OBJECT_ID('TempDB..#Columns') IS NOT NULL

    DROP TABLE #Columns

    -- Test data table

    CREATE TABLE #Temp_Values

    (

    FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

    PeopleId int,

    /*Value*/ /*Maximum value for the column*/ /*Number of slips we'll need to write the current value*/

    Case_30 money, Case_30_MaxValue money DEFAULT(5000), Case_30_SlipCount AS Floor((Case_30-.01) / Case_30_MaxValue),

    Case_32 money, Case_32_MaxValue money DEFAULT(5000), Case_32_SlipCount AS Floor((Case_32-.01) / Case_32_MaxValue),

    Case_33 money, Case_33_MaxValue money DEFAULT(5000), Case_33_SlipCount AS Floor((Case_33-.01) / Case_33_MaxValue),

    Case_34 money, Case_34_MaxValue money DEFAULT(5000), Case_34_SlipCount AS Floor((Case_34-.01) / Case_34_MaxValue),

    Case_40 money, Case_40_MaxValue money DEFAULT(5000), Case_40_SlipCount AS Floor((Case_40-.01) / Case_40_MaxValue),

    Case_71 money, Case_71_MaxValue money DEFAULT(5000), Case_71_SlipCount AS Floor((Case_71-.01) / Case_71_MaxValue),

    Case_84 money, Case_84_MaxValue money DEFAULT(5000), Case_84_SlipCount AS Floor((Case_84-.01) / Case_84_MaxValue),

    Case_85 money, Case_85_MaxValue money DEFAULT(5000), Case_85_SlipCount AS Floor((Case_85-.01) / Case_85_MaxValue),

    PRIMARY KEY(FederalEmployer, PeopleId)

    )

    -- Final data table

    CREATE TABLE #Multiplied_Values

    (

    FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

    PeopleId int,

    MultiSlipCounter int,

    OtherCount int DEFAULT(0),

    Case_30 money DEFAULT(0),

    Case_32 money DEFAULT(0),

    Case_33 money DEFAULT(0),

    Case_34 money DEFAULT(0),

    Case_40 money DEFAULT(0),

    Case_71 money DEFAULT(0),

    Case_84 money DEFAULT(0),

    Case_85 money DEFAULT(0),

    PRIMARY KEY(FederalEmployer, PeopleId, MultiSlipCounter)

    )

    -- Helper table, to handle each column by People/FederalEmployer, from the column having the highest amount to the lowest

    CREATE TABLE #Columns

    (

    MyKey int IDENTITY(1, 1),

    FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

    PeopleId int,

    KeyByPeople int,

    Col int,

    SlipCount int,

    PRIMARY KEY (FederalEmployer, PeopleId, MyKey)

    )

    -- Index to help performance on larger datasets

    CREATE INDEX IX_Temp_Columns_ColByPeople ON #Columns(KeyByPeople, Col)

    -- Insert test data

    INSERT #Temp_Values

    (

    FederalEmployer,

    PeopleId,

    Case_30,

    Case_32,

    Case_33,

    Case_34,

    Case_40,

    Case_71,

    Case_84,

    Case_85

    )

    SELECT '1', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 0 UNION ALL

    SELECT '1', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

    SELECT '1', 5, 6000,0,0,0,0,0,0,0 UNION ALL

    SELECT '2', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 500 UNION ALL

    SELECT '2', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

    SELECT '2', 5, 6000,0,0,0,0,0,0,0

    SELECT * FROM #Temp_Values

    -- Multiply the records to respect the max value per column

    INSERT #Multiplied_Values

    SELECT

    FederalEmployer,

    PeopleId,

    Sys_Sequence.Number,

    0,

    -- If we're after the number of slip we'll need to write the value, write 0

    -- If we're exactly on the number of slip, write the value minus the maximum value multiplied by the number of slips already done so far

    -- If we're before the number of slip we'll need to write the value, write the maximum value

    CASE WHEN Number > Case_30_SlipCount THEN 0 WHEN Number = Case_30_SlipCount THEN Case_30 - (Case_30_MaxValue * Number) ELSE Case_30_MaxValue END,

    CASE WHEN Number > Case_32_SlipCount THEN 0 WHEN Number = Case_32_SlipCount THEN Case_32 - (Case_32_MaxValue * Number) ELSE Case_32_MaxValue END,

    CASE WHEN Number > Case_33_SlipCount THEN 0 WHEN Number = Case_33_SlipCount THEN Case_33 - (Case_33_MaxValue * Number) ELSE Case_33_MaxValue END,

    CASE WHEN Number > Case_34_SlipCount THEN 0 WHEN Number = Case_34_SlipCount THEN Case_34 - (Case_34_MaxValue * Number) ELSE Case_34_MaxValue END,

    CASE WHEN Number > Case_40_SlipCount THEN 0 WHEN Number = Case_40_SlipCount THEN Case_40 - (Case_40_MaxValue * Number) ELSE Case_40_MaxValue END,

    CASE WHEN Number > Case_71_SlipCount THEN 0 WHEN Number = Case_71_SlipCount THEN Case_71 - (Case_71_MaxValue * Number) ELSE Case_71_MaxValue END,

    CASE WHEN Number > Case_84_SlipCount THEN 0 WHEN Number = Case_84_SlipCount THEN Case_84 - (Case_84_MaxValue * Number) ELSE Case_84_MaxValue END,

    CASE WHEN Number > Case_85_SlipCount THEN 0 WHEN Number = Case_85_SlipCount THEN Case_85 - (Case_85_MaxValue * Number) ELSE Case_85_MaxValue END

    FROM

    #Temp_Values AS Temp_Values

    INNER JOIN Sys_Sequence

    ON Sys_Sequence.Number <= Case_30_SlipCount

    OR Sys_Sequence.Number <= Case_32_SlipCount

    OR Sys_Sequence.Number <= Case_33_SlipCount

    OR Sys_Sequence.Number <= Case_34_SlipCount

    OR Sys_Sequence.Number <= Case_40_SlipCount

    OR Sys_Sequence.Number <= Case_71_SlipCount

    OR Sys_Sequence.Number <= Case_84_SlipCount

    OR Sys_Sequence.Number <= Case_85_SlipCount

    WHERE

    -- Currently limiting to 20 slips maximum, to enabled a better execution plan. Change to the real calculated max in the future to remove the limitation.

    Sys_Sequence.Number < 20

    -- Load every People/FederalEmployer, and determine the order in which the columns will be handled (highest amount to lowest amount)

    INSERT #Columns(FederalEmployer, PeopleId, Col, SlipCount)

    SELECT

    FederalEmployer,

    PeopleId,

    ColKey,

    SlipCount

    FROM

    (

    SELECT FederalEmployer, PeopleId, 0 AS ColKey, Case_30_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_30_SlipCount UNION ALL

    SELECT FederalEmployer, PeopleId, 1 AS ColKey, Case_32_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_32_SlipCount UNION ALL

    SELECT FederalEmployer, PeopleId, 2 AS ColKey, Case_33_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_33_SlipCount UNION ALL

    SELECT FederalEmployer, PeopleId, 3 AS ColKey, Case_34_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_34_SlipCount UNION ALL

    SELECT FederalEmployer, PeopleId, 4 AS ColKey, Case_40_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_40_SlipCount UNION ALL

    SELECT FederalEmployer, PeopleId, 5 AS ColKey, Case_71_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_71_SlipCount UNION ALL

    SELECT FederalEmployer, PeopleId, 6 AS ColKey, Case_84_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_84_SlipCount UNION ALL

    SELECT FederalEmployer, PeopleId, 7 AS ColKey, Case_85_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_85_SlipCount

    ) AS Val

    WHERE

    SlipCount >= 0 -- Quickly remove any combo where no work needs to be done

    ORDER BY

    FederalEmployer,

    PeopleId,

    SlipCount DESC,

    ColKey

    -- Give an index (from 0 to 7) to all columns in the combo, per Employee/FederalEmployer

    UPDATE #Columns

    SET

    KeyByPeople = MyKey - MinByPeople.MinKey

    FROM

    #Columns AS Columns

    INNER JOIN (SELECT FederalEmployer, PeopleId, Min(MyKey) AS MinKey FROM #Columns GROUP BY FederalEmployer, PeopleId) AS MinByPeople

    ON Columns.FederalEmployer = MinByPeople.FederalEmployer AND

    Columns.PeopleId = MinByPeople.PeopleId

    -- Declare the variables for the next part

    DECLARE @ColumnCounter int, @MaxColumnCounter int

    SELECT @ColumnCounter = Min(KeyByPeople), @MaxColumnCounter = Max(KeyByPeople) FROM #Columns

    -- For each of the columns, validate if we need to move the info to another record (max 6 info per record)

    WHILE @ColumnCounter <= @MaxColumnCounter

    BEGIN

    -- Update the count of info found per record so far

    UPDATE #Multiplied_Values

    SET

    OtherCount = UpdatedCount.CountSoFar

    FROM

    #Multiplied_Values AS Multi

    INNER JOIN

    (

    SELECT

    Multi.FederalEmployer,

    Multi.PeopleId,

    Multi.MultiSlipCounter,

    Sum(CASE WHEN Columns.Col = 0 AND Case_30 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN Columns.Col = 1 AND Case_32 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN Columns.Col = 2 AND Case_33 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN Columns.Col = 3 AND Case_34 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN Columns.Col = 4 AND Case_40 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN Columns.Col = 5 AND Case_71 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN Columns.Col = 6 AND Case_84 <> 0 THEN 1 ELSE 0 END+

    CASE WHEN Columns.Col = 7 AND Case_85 <> 0 THEN 1 ELSE 0 END) AS CountSoFar

    FROM

    #Multiplied_Values AS Multi

    INNER JOIN #Columns AS Columns

    ON Multi.FederalEmployer = Columns.FederalEmployer AND

    Multi.PeopleId = Columns.PeopleId AND

    Columns.KeyByPeople <= @ColumnCounter

    GROUP BY

    Multi.FederalEmployer,

    Multi.PeopleId,

    Multi.MultiSlipCounter

    ) AS UpdatedCount

    ON Multi.FederalEmployer = UpdatedCount.FederalEmployer AND

    Multi.PeopleId = UpdatedCount.PeopleId AND

    Multi.MultiSlipCounter = UpdatedCount.MultiSlipCounter

    -- Insert an empty record when we detect we'll need one

    INSERT #Multiplied_Values(FederalEmployer, PeopleId, MultiSlipCounter)

    SELECT

    Multi.FederalEmployer,

    Multi.PeopleId,

    -- Adds one line either for half a value expressed on multiple lines, or multiples lines at once, without violating the primary key

    NextMultiSlip.NextMultiSlipCounter+Columns.SlipCount-Multi.MultiSlipCounter

    FROM

    #Multiplied_Values AS Multi

    INNER JOIN (SELECT FederalEmployer, PeopleId, Max(MultiSlipCounter) + 1 AS NextMultiSlipCounter FROM #Multiplied_Values GROUP BY FederalEmployer, PeopleId) AS NextMultiSlip

    ON Multi.FederalEmployer = NextMultiSlip.FederalEmployer AND

    Multi.PeopleId = NextMultiSlip.PeopleId

    INNER JOIN #Columns AS Columns

    ON Multi.FederalEmployer = Columns.FederalEmployer AND

    Multi.PeopleId = Columns.PeopleId AND

    Columns.KeyByPeople = @ColumnCounter

    LEFT JOIN #Multiplied_Values AS Next

    ON Next.FederalEmployer = Multi.FederalEmployer AND

    Next.PeopleId = Multi.PeopleId AND

    -- Next is after Multi

    Next.MultiSlipCounter > Multi.MultiSlipCounter AND

    -- Multi and Next are "synchronized" to handle values expressed on multiple lines

    ((Next.MultiSlipCounter - Multi.MultiSlipCounter) % (Columns.SlipCount+1) = 0) AND

    -- Multi must "export" data

    Multi.OtherCount > 6 AND

    -- Next can "get" data

    Next.OtherCount < 6

    WHERE

    -- Only if there is no next record

    Next.PeopleId IS NULL AND

    -- And the current record is not only full (at 6), but also has another value to add (7 and greater)

    Multi.OtherCount > 6

    -- The "lower records" will pull the data from the record above them if they still have room and the "top record" is full

    -- At the same time, if a "top record" is full and there's a value, we clear it

    UPDATE Multi

    SET

    Case_30 = CASE DataJumpInfo.Col WHEN 0 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_30 ELSE 0 END ELSE Multi.Case_30 END,

    Case_32 = CASE DataJumpInfo.Col WHEN 1 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_32 ELSE 0 END ELSE Multi.Case_32 END,

    Case_33 = CASE DataJumpInfo.Col WHEN 2 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_33 ELSE 0 END ELSE Multi.Case_33 END,

    Case_34 = CASE DataJumpInfo.Col WHEN 3 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_34 ELSE 0 END ELSE Multi.Case_34 END,

    Case_40 = CASE DataJumpInfo.Col WHEN 4 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_40 ELSE 0 END ELSE Multi.Case_40 END,

    Case_71 = CASE DataJumpInfo.Col WHEN 5 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_71 ELSE 0 END ELSE Multi.Case_71 END,

    Case_84 = CASE DataJumpInfo.Col WHEN 6 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_84 ELSE 0 END ELSE Multi.Case_84 END,

    Case_85 = CASE DataJumpInfo.Col WHEN 7 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_85 ELSE 0 END ELSE Multi.Case_85 END

    FROM

    (-- Find the lower difference before a valid "export" target

    SELECT

    Multi.FederalEmployer,

    Multi.PeopleId,

    Columns.Col,

    Min(Multi.MultiSlipCounter - Prev.MultiSlipCounter) AS MultiSlipDifference

    FROM

    #Multiplied_Values AS Multi

    INNER JOIN #Columns AS Columns

    ON Multi.FederalEmployer = Columns.FederalEmployer AND

    Multi.PeopleId = Columns.PeopleId AND

    Columns.KeyByPeople = @ColumnCounter

    LEFT JOIN #Multiplied_Values AS Prev

    ON Prev.FederalEmployer = Multi.FederalEmployer AND

    Prev.PeopleId = Multi.PeopleId AND

    -- Prev is before Multi

    Prev.MultiSlipCounter < Multi.MultiSlipCounter AND

    -- Multi and Prev are "synchronized" to handle values expressed on multiple lines

    ((Multi.MultiSlipCounter - Prev.MultiSlipCounter) % (Columns.SlipCount+1) = 0) AND

    -- Prev must "export" data

    Prev.OtherCount > 6 AND

    -- Multi can "get" data

    Multi.OtherCount < 6

    WHERE

    -- Only if we are already full and a value now needs to be "cleared"

    Multi.OtherCount > 6 OR

    -- Else, there must be data to import AND there must be no data present if we want to import something in this line

    (Prev.OtherCount > 6 AND

    CASE Columns.Col

    WHEN 0 THEN Multi.Case_30

    WHEN 1 THEN Multi.Case_32

    WHEN 2 THEN Multi.Case_33

    WHEN 3 THEN Multi.Case_34

    WHEN 4 THEN Multi.Case_40

    WHEN 5 THEN Multi.Case_71

    WHEN 6 THEN Multi.Case_84

    WHEN 7 THEN Multi.Case_85

    END = 0)

    GROUP BY

    Multi.FederalEmployer,

    Multi.PeopleId,

    Columns.Col

    ) AS DataJumpInfo

    INNER JOIN #Multiplied_Values AS Multi

    ON Multi.FederalEmployer = DataJumpInfo.FederalEmployer AND

    Multi.PeopleId = DataJumpInfo.PeopleId

    LEFT JOIN #Multiplied_Values AS Prev

    ON Prev.FederalEmployer = Multi.FederalEmployer AND

    Prev.PeopleId = Multi.PeopleId AND

    -- Make sure we move the data only to the first place available

    Multi.MultiSlipCounter - Prev.MultiSlipCounter = DataJumpInfo.MultiSlipDifference

    SELECT @ColumnCounter = @ColumnCounter + 1

    END

    SELECT * FROM #Multiplied_Values

    -- Final cleanup

    IF OBJECT_ID('TempDB..#Columns') IS NOT NULL

    DROP TABLE #Columns

    IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL

    DROP TABLE #Multiplied_Values

    IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL

    DROP TABLE #Temp_Values

    As an aside, what made me end up with the final solution was a lot of pieces I already saw in possible solutions, but the driving force was the helper table to optimize the split (which I was thinking about ever since you asked if I was optimizing the split). Once this was in place, the rest was much easier to do :-).

    Thanks a lot for your help, Garadin!

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply