October 13, 2009 at 1:12 pm
Hey everyone,
I've been trying to wrap my mind around a problem lately, and I can't make it. Out of desperation, I turn to you for new ideas to accomplish the task.
First, a quick recap of my situation :
I must produce a specific ResultSet from a SQL query. There are rules about that ResultSet.
1 - A column value must not exceed a maximum value defined by column (for example, the value of c30 must not exceed 999.99, the value of c32 must not exceed 9999.99, etc).
2 - There must be a maximum of 6 values per record.
In either case, the solution is to add a new record and copy the rest of the values in there. For instance, if I must write 1500 in c30 and the limit is 999.99, I'll have a record with 999.99 and a second record with 500.01.
I handled the first rule on my own (for the record, I made a temp table aside containing the maximum values per columns and I joined it on a Tally table using the result of a division between the actual value and the maximum, therefore creating enough records every time). The one that's been giving me problems is the second one. I keep having problems to make a query that will transform the data the way I want.
Here's two small test cases, for testing purposes.
DECLARE @test-2 TABLE(c30 money, c32 money, c33 money, c34 money, c40 money, c71 money, c84 money, c85 money)
--Test values #1
DELETE @test-2
INSERT @test-2
SELECT 9999.99, 98.54, 847.55, 99999.99, 34.55, 365.74, 8462.57, 0 UNION ALL
SELECT 2587.4, 0, 0, 18.42, 0, 0, 0, 0
SELECT * FROM @test-2
--Expected result for values #1
SELECT 9999.99, 98.54, 847.55, 99999.99, 34.55, 365.74,0, 0 UNION ALL
SELECT 2587.4, 0, 0, 18.42, 0, 0, 8462.57, 0
--Test values #2
DELETE @test-2
INSERT @test-2
SELECT 9999.99, 99.99, 999.99, 99999.99, 99.99, 9999.99, 8462.57, 0 UNION ALL
SELECT 2587.4, 98.54, 847.55, 18.42, 34.55, 365.74, 0, 0
SELECT * FROM @test-2
--Expected result for values #2
SELECT 9999.99, 99.99, 999.99, 99999.99, 99.99, 9999.99, 0, 0 UNION ALL
SELECT 2587.4, 98.54, 847.55, 18.42, 34.55, 365.74, 0, 0 UNION ALL
SELECT 0, 0, 0, 0, 0, 0, 8462.57, 0
Does anyone have an idea on how I could tackle this problem?
October 13, 2009 at 8:35 pm
:blink: :blink: :blink: Uhh... Huh? :ermm: :ermm: :ermm:
In either case, the solution is to add a new record and copy the rest of the values in there. For instance, if I must write 1500 in c30 and the limit is 999.99, I'll have a record with 999.99 and a second record with 500.01.
That. Makes sense.
Everything else about the post is extremely confusing. Your rules don't correlate to your test data in any way that I can see (you state c30 has a max of 999.99 but your actual number is 9999.99).
Did you perhaps forget to include the "rules" for each given column in this example? Cuz I don't see anything indicating the maximum per column.
October 13, 2009 at 8:57 pm
OK, I think I see part of it. The 8462.57 value is creating its own line because it is the 7th value (zeroes not counted apparently) in the row and you only allow 6. Correct?
What in the world do you need something like this for? Before people here spend a bunch of time attempting to do something that should never be done in SQL, I think it'd be good to convince people they're not just wasting their time.
October 13, 2009 at 10:18 pm
Other than the confusion - I can't help but think we've got a normalization disaster going on with is making this problem much harder to handle. Whe're the relationship between the values you're breaking into smaller chunks and said smaller chunks (so - when one changes, you can adjust, etc...). Why do we have the same piece of data repeated 8 times in a single row (violation of normalization)?
More information as to what you're ACTUALLY doing (vs. this particular implementation) might help get you better ideas (and ones designed to work well with SQL server).
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
October 14, 2009 at 6:51 am
The query is designed to be used by the front-end application to generate a data file. This part is the final portion, where I take the data I've compiled so far and apply the rules to it before returning it.
The columns (c30, c32, c34, c35, etc) are simply different values of a record I need to send to the application. I have a record per employee. The example values are for a single employee, in the end I'll have to handle multiple employees, but I figure if I can get something working with a single record, it's gonna be easy to do so for multiple records.
Also, the columns are NOT coming straight from tables - they're computed in a temporary table from a lot of tables using some more logic. The result is a set of values (c30, c32, c34, c35, etc) per employees, and I must apply the last two rules to them before sending it back to the application.
As for the example I've given to the limit of c30, I'm aware that it does not match the example data I've sent. The portion about the limit is already handled, and is explained only to show the case where a value would already be splitted in two records without confusing everyone (something that, I fear, I've failed at).
Sorry for the confusion. I hope it's a bit better now.
October 14, 2009 at 7:26 am
Jonathan Melo (10/14/2009)
The example values are for a single employee, in the end I'll have to handle multiple employees, but I figure if I can get something working with a single record, it's gonna be easy to do so for multiple records.
While that is not necessarily a false statement, by not including the details on things like that, and not giving us the part of the solution you've already written, you've more or less ensured this thing will be massive RBAR. We don't know what you've already done (to see if we can combine steps) and we don't know how this data is keyed per employee to possibly do something like this at a set level as opposed to trying to make it work for one person. So while you will likely be able to throw the whole thing in a cursor, you're not likely to get a performant method.
That said, that objection may be mostly academic in nature. I'll think about this for a bit and see what I can come up with.
October 15, 2009 at 4:03 pm
So I thought about it for a while, this is what I came up with. I'll state up front that I can already see half a dozen ways to improve this / clean it up / make it better, but I'll leave it here for now. I'll post this in the body for people who want to read it, and as an attachment so that the formatting isn't completely boned.
One thing of note is that this does not use any logic to determine what the best value to move down a row is when there are more than 6 values in a row. That was not stated in the original problem so I didn't add it in. Without this, you aren't efficiently utilizing all of your columns and could fail to move rows around.
If anyone wants to take this a bit further and optimize sections or get rid of the cursor, please post back the final script when you're done.
/*
=============================================================================================
CREATE DATE: 10/15/2009
LAST MODIFIED:10/15/2009
CREATED BY:SETH PHELABAUM
PURPOSE:Do crazy things for reasons I don't understand
NOTES:This assumes a hard-coded limit of $5000 for all columns(can obviously be expanded upon).
The unpivot could be made dynamic.
There's way too much hardcoded. Things like Max Values Per Row should be a variable set once at the top.
ISSUES:Uses a cursor and GOTO's for flow control. This could likely be improved.
The cursor and high number of updates will likely cause this to scale very poorly.
Revision History:
Date ByChange Made
-------- --- -------------------------------------
=============================================================================================
*/
SET ANSI_WARNINGS OFF
SET NOCOUNT ON
------------------ DECLARE VARIABLES --------------------
DECLARE @Seqint,
@Identint,
@ColNamevarchar(10),
@ColNumint,
@Amountmoney,
@ColCountint,
@OverFlowint,
@ChangeMadebit
--------------------- Test Data Setup ---------------------
DECLARE @test-2 TABLE(Ident int, c30 money, c32 money, c33 money, c34 money, c40 money, c71 money, c84 money, c85 money)
INSERT @test-2
SELECT 1, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 0 UNION ALL
SELECT 1, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL
SELECT 1, 6000,0,0,0,0,0,0,0 UNION ALL
SELECT 2, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 500 UNION ALL
SELECT 2, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL
SELECT 2, 6000,0,0,0,0,0,0,0
SELECT * FROM @test-2
-----------------------------------------------------------
-------- Add Sequencing Column and Sequence Rows ---------- (Would use RowNumber in 2K5+)
SELECT CAST(0 AS int) seq, *
INTO #Test
FROM @test-2
CREATE CLUSTERED INDEX Quirk ON #Test (Ident)
SET @Seq = 0
UPDATE #Test
SET @Seq = Seq = CASE WHEN @Ident = Ident THEN @Seq + 1 ELSE 1 END,
@Ident = Ident
FROM #Test OPTION (MAXDOP 1)
--SELECT * FROM #Test
-----------------------------------------------------------
----------------- Add Additional Rows ---------------------
INSERT INTO #Test
SELECT T2.N, T2.Ident, c30, c32, c33, c34, c40, c71, c84, c85
FROM (SELECT Ident, N
FROM (SELECT DISTINCT IDent FROM #Test) T, Tally T1
WHERE T1.N <= 6) T2
LEFT JOIN #Test T ON T.Ident = T2.Ident AND T.Seq = T2.N
WHERE T.Ident IS NULL
--SELECT * FROM #test
-----------------------------------------------------------
----------------- Unpivot the Data ------------------------
CREATE TABLE #UPTest(
UPIdentityint IDENTITY(1,1),
Identint,
Seqint,
ColNamevarchar(10),
ColNumint,
Amountmoney,
ColCountint,
OverFlowint)
INSERT INTO #UPTest(Ident, Seq, ColName, ColNum, Amount, ColCount, Overflow)
SELECT Ident, Seq, 'C30', 1, C30, 0, 0 FROM #Test UNION ALL
SELECT Ident, Seq, 'C32', 2, C32, 0, 0 FROM #Test UNION ALL
SELECT Ident, Seq, 'C33', 3, C33, 0, 0 FROM #Test UNION ALL
SELECT Ident, Seq, 'C34', 4, C34, 0, 0 FROM #Test UNION ALL
SELECT Ident, Seq, 'C40', 5, C40, 0, 0 FROM #Test UNION ALL
SELECT Ident, Seq, 'C71', 6, C71, 0, 0 FROM #Test UNION ALL
SELECT Ident, Seq, 'C84', 7, C84, 0, 0 FROM #Test UNION ALL
SELECT Ident, Seq, 'C85', 8, C85, 0, 0 FROM #Test
CREATE CLUSTERED INDEX CI_UpdTest ON #UPTest (Ident, Seq, ColName, Amount)
-----------------------------------------------------------
----------------- Identify Problems -----------------------
UPDATE #UPTest
SET ColCount = UP2.CC,
OverFlow = CASE WHEN Amount > 5000 THEN 1 ELSE 0 END
FROM #UPTest
INNER JOIN (SELECT ident, Seq, COUNT(ColName) CC
FROM #UPTest
WHERE ISNULL(Amount,0.00) <> 0.00
GROUP BY Ident, Seq) UP2
ON #UPTest.Ident = UP2.Ident AND #UPTest.Seq = UP2.Seq
--SELECT * FROM #UPTest
-----------------------------------------------------------
---- If there are any issues, fix, otherwise Finalize -----
IF EXISTS (SELECT * FROM #UPTest WHERE OverFlow = 1 OR ColCount > 6)
BEGIN
SET @ChangeMade = 1
GOTO ProblemFix
END
ELSE
GOTO Finalize
---------------- Fix Problems ----------------------------- (Crazy RBAR :: Focus here for improvement)
--(I'm not convinced this can't be done in a set based manner, but unfortunately, I'm out of time to spend on this.)
ProblemFix:
IF @ChangeMade = 0
GOTO Finalize
ELSE
SET @ChangeMade = 0
BEGIN
PRINT 'Cursor Starting'
DECLARE IDCursor CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY
FOR
SELECT Ident, Seq, ColName, ColNum, Amount, ColCount, OverFlow
FROM #UPTest
WHERE (ColCount > 6 AND ColNum > 6)
OR OverFlow = 1
ORDER BY Ident, Seq, OverFlow DESC, ColCount
OPEN IDCursor
FETCH NEXT FROM IDCursor INTO @Ident, @Seq, @ColName, @ColNum, @Amount, @ColCount, @Overflow
WHILE @@FETCH_STATUS = 0
BEGIN
--------------- Too Many Column Fixes ---------------------
IF @ColCount > 6 AND @ColNum > 6
AND EXISTS (SELECT * FROM #UPTEST
WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)
BEGIN
UPDATE #UPTest
SET Amount = 0.00, --Max Allowed Value
ColCount = ColCount - 1,
OverFlow = 0
WHERE ColName = @ColName AND Ident = @Ident AND Seq = @Seq
UPDATE #UPTest
SET Amount = @Amount,
OverFlow = CASE WHEN @Amount > 5000 THEN 1 ELSE 0 END
FROM #UPTest
WHERE ColName = @ColName
AND Ident = @Ident
AND Seq = (SELECT MIN(Seq)
FROM #UPTest
WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)
SET @ChangeMade = 1 -- Additional Flow Control
SET @OverFlow = 0 -- Need to do this one on another run through
END
-----------------------------------------------------------
--------------- Number Too Big Fixes ----------------------
IF @OverFlow = 1
AND EXISTS (SELECT * FROM #UPTEST
WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)
BEGIN
UPDATE #UPTest
SET Amount = 5000.00, --Max Allowed Value
OverFlow = 0
WHERE ColName = @ColName AND Ident = @Ident AND Seq = @Seq
UPDATE #UPTest
SET Amount = @Amount - 5000.00,
OverFlow = CASE WHEN @Amount - 5000.00 > 5000.00 THEN 1 ELSE 0 END
FROM #UPTest
WHERE ColName = @ColName
AND Ident = @Ident
AND Seq = (SELECT MIN(Seq)
FROM #UPTest
WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)
SET @ChangeMade = 1 -- Additional Flow Control
END
-----------------------------------------------------------
FETCH NEXT FROM IDCursor INTO @Ident, @Seq, @ColName, @ColNum, @Amount, @ColCount, @Overflow
END
CLOSE IDCursor
DEALLOCATE IDCursor
IF EXISTS (SELECT * FROM #UPTest WHERE OverFlow = 1 OR ColCount > 6)
GOTO ProblemFix
ELSE
GOTO Finalize
END
Finalize: -- Repivot
SELECT Ident, C30, C32, C33, C34, C40, C71, C84, C85
FROM (
SELECT Ident, Seq,
SUM(CASE WHEN ColName = 'C30' THEN Amount ELSE 0 END) C30,
SUM(CASE WHEN ColName = 'C32' THEN Amount ELSE 0 END) C32,
SUM(CASE WHEN ColName = 'C33' THEN Amount ELSE 0 END) C33,
SUM(CASE WHEN ColName = 'C34' THEN Amount ELSE 0 END) C34,
SUM(CASE WHEN ColName = 'C40' THEN Amount ELSE 0 END) C40,
SUM(CASE WHEN ColName = 'C71' THEN Amount ELSE 0 END) C71,
SUM(CASE WHEN ColName = 'C84' THEN Amount ELSE 0 END) C84,
SUM(CASE WHEN ColName = 'C85' THEN Amount ELSE 0 END) C85,
SUM(Amount) RowSum
FROM #UPTest
GROUP BY Ident, Seq) A
WHERE RowSum > 0
ORDER BY Ident, Seq
DROP TABLE #UPTest
DROP TABLE #Test
October 16, 2009 at 7:09 am
First, Garadin, I'd like to thank you a lot for the effort you're putting in this problem. It looks like you went through a lot of trouble to make that script, and I appreciate it!
Now, to the problem itself... the "crazy things for a reason you don't understand" do have a valid reason behind them 😀 This procedure is supposed to gather and format data to send back to the front-end application, so it can generate a file that will be sent to a third party. The file being in XML, it has it's own rules that I must follow, and for design reasons, the application should do the bare minimum (which is, take the recordset and fill the XML file with it, no extra data manipulation).
I do understand that SQL isn't the perfect choice to handle these kind of limitations, but I know it's feasible and the design of the solution flows a lot better it it's made that way.
As for the code so far, I've been working on it on my side since I've posted. I came up with something that works quite well, but doesn't handle every situation it needs to (still having problems when I get over 2 rows generated, the values just won't follow). However, it doesn't make use of any cursor (alright, I cheated juste a little, since I loop to execute some code once per column), no GOTO (this time, no cheating), and as far as I know no RBAR.
I've made a cutdown example (using your numbers and limits as test data) that also includes my solution to handle the limits on the values :
/*
Note : Sys_Sequence is the tally table. It starts at 0 and goes up. Here's a small script to create if if need be. DON'T FORGET TO CLEAN UP!
CREATE TABLE Sys_Sequence(Number int PRIMARY KEY)
DECLARE @i int
SELECT @i = 0
WHILE @i <= 200
BEGIN
INSERT Sys_Sequence
SELECT @i
SELECT @i = @i + 1
END
*/
-- Cautious pre-run cleanup
IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL
DROP TABLE #Temp_Values
IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL
DROP TABLE #Multiplied_Values
-- Test data table
CREATE TABLE #Temp_Values
(
FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,
PeopleId int,
Case_30 money,
Case_30_MaxValue money DEFAULT(5000), -- Determines the maximum value for the equivalent column
Case_30_SlipCount AS Floor((Case_30-.01) / Case_30_MaxValue), -- Determines the number of slips we'll need to write that value
Case_32 money,
Case_32_MaxValue money DEFAULT(5000),
Case_32_SlipCount AS Floor((Case_32-.01) / Case_32_MaxValue),
Case_33 money,
Case_33_MaxValue money DEFAULT(5000),
Case_33_SlipCount AS Floor((Case_33-.01) / Case_33_MaxValue),
Case_34 money,
Case_34_MaxValue money DEFAULT(5000),
Case_34_SlipCount AS Floor((Case_34-.01) / Case_34_MaxValue),
Case_40 money,
Case_40_MaxValue money DEFAULT(5000),
Case_40_SlipCount AS Floor((Case_40-.01) / Case_40_MaxValue),
Case_71 money,
Case_71_MaxValue money DEFAULT(5000),
Case_71_SlipCount AS Floor((Case_71-.01) / Case_71_MaxValue),
Case_84 money,
Case_84_MaxValue money DEFAULT(5000),
Case_84_SlipCount AS Floor((Case_84-.01) / Case_84_MaxValue),
Case_85 money,
Case_85_MaxValue money DEFAULT(5000),
Case_85_SlipCount AS Floor((Case_85-.01) / Case_85_MaxValue),
PRIMARY KEY(FederalEmployer, PeopleId)
)
-- Final data table
CREATE TABLE #Multiplied_Values
(
FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,
PeopleId int,
MultiSlipCounter int,
OtherCount int DEFAULT(0),
Case_30 money DEFAULT(0),
Case_32 money DEFAULT(0),
Case_33 money DEFAULT(0),
Case_34 money DEFAULT(0),
Case_40 money DEFAULT(0),
Case_71 money DEFAULT(0),
Case_84 money DEFAULT(0),
Case_85 money DEFAULT(0),
PRIMARY KEY(FederalEmployer, PeopleId, MultiSlipCounter)
)
-- Insert test data
INSERT #Temp_Values
(
FederalEmployer,
PeopleId,
Case_30,
Case_32,
Case_33,
Case_34,
Case_40,
Case_71,
Case_84,
Case_85
)
SELECT '1', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 0 UNION ALL
SELECT '1', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL
SELECT '1', 5, 6000,0,0,0,0,0,0,0 UNION ALL
SELECT '2', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 500 UNION ALL
SELECT '2', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL
SELECT '2', 5, 6000,0,0,0,0,0,0,0
SELECT * FROM #Temp_Values
-- Multiply the records to respect the max value per column
INSERT #Multiplied_Values
SELECT
FederalEmployer,
PeopleId,
Sys_Sequence.Number,
0,
-- If we're after the number of slip we'll need to write the value, write 0
-- If we're exactly on the number of slip, write the value minus the maximum value multiplied by the number of slips already done so far
-- If we're before the number of slip we'll need to write the value, write the maximum value
CASE WHEN Number > Case_30_SlipCount THEN 0 WHEN Number = Case_30_SlipCount THEN Case_30 - (Case_30_MaxValue * Number) ELSE Case_30_MaxValue END,
CASE WHEN Number > Case_32_SlipCount THEN 0 WHEN Number = Case_32_SlipCount THEN Case_32 - (Case_32_MaxValue * Number) ELSE Case_32_MaxValue END,
CASE WHEN Number > Case_33_SlipCount THEN 0 WHEN Number = Case_33_SlipCount THEN Case_33 - (Case_33_MaxValue * Number) ELSE Case_33_MaxValue END,
CASE WHEN Number > Case_34_SlipCount THEN 0 WHEN Number = Case_34_SlipCount THEN Case_34 - (Case_34_MaxValue * Number) ELSE Case_34_MaxValue END,
CASE WHEN Number > Case_40_SlipCount THEN 0 WHEN Number = Case_40_SlipCount THEN Case_40 - (Case_40_MaxValue * Number) ELSE Case_40_MaxValue END,
CASE WHEN Number > Case_71_SlipCount THEN 0 WHEN Number = Case_71_SlipCount THEN Case_71 - (Case_71_MaxValue * Number) ELSE Case_71_MaxValue END,
CASE WHEN Number > Case_84_SlipCount THEN 0 WHEN Number = Case_84_SlipCount THEN Case_84 - (Case_84_MaxValue * Number) ELSE Case_84_MaxValue END,
CASE WHEN Number > Case_85_SlipCount THEN 0 WHEN Number = Case_85_SlipCount THEN Case_85 - (Case_85_MaxValue * Number) ELSE Case_85_MaxValue END
FROM
#Temp_Values AS Temp_Values
INNER JOIN Sys_Sequence
ON Sys_Sequence.Number <= Case_30_SlipCount
OR Sys_Sequence.Number <= Case_32_SlipCount
OR Sys_Sequence.Number <= Case_33_SlipCount
OR Sys_Sequence.Number <= Case_34_SlipCount
OR Sys_Sequence.Number <= Case_40_SlipCount
OR Sys_Sequence.Number <= Case_71_SlipCount
OR Sys_Sequence.Number <= Case_84_SlipCount
OR Sys_Sequence.Number <= Case_85_SlipCount
-- Declare the variable for the next part
DECLARE @MultipliedCounter int
SELECT @MultipliedCounter = 0
-- For each of the 8 columns, validate if we need to move the info to another record (max 6 info per record)
WHILE @MultipliedCounter < 8
BEGIN
-- Update the count of info found per record so far
UPDATE #Multiplied_Values
SET
OtherCount = CASE WHEN @MultipliedCounter >= 0 AND Case_30 <> 0 THEN 1 ELSE 0 END+
CASE WHEN @MultipliedCounter >= 1 AND Case_32 <> 0 THEN 1 ELSE 0 END+
CASE WHEN @MultipliedCounter >= 2 AND Case_33 <> 0 THEN 1 ELSE 0 END+
CASE WHEN @MultipliedCounter >= 3 AND Case_34 <> 0 THEN 1 ELSE 0 END+
CASE WHEN @MultipliedCounter >= 4 AND Case_40 <> 0 THEN 1 ELSE 0 END+
CASE WHEN @MultipliedCounter >= 5 AND Case_71 <> 0 THEN 1 ELSE 0 END+
CASE WHEN @MultipliedCounter >= 6 AND Case_84 <> 0 THEN 1 ELSE 0 END+
CASE WHEN @MultipliedCounter >= 7 AND Case_85 <> 0 THEN 1 ELSE 0 END
-- Insert an empty record when we detect we'll need one
INSERT #Multiplied_Values(FederalEmployer, PeopleId, MultiSlipCounter)
SELECT
Multi.FederalEmployer,
Multi.PeopleId,
Multi.MultiSlipCounter+1
FROM
#Multiplied_Values AS Multi
LEFT JOIN #Multiplied_Values AS Next
ON Next.FederalEmployer = Multi.FederalEmployer AND
Next.PeopleId = Multi.PeopleId AND
Next.MultiSlipCounter = Multi.MultiSlipCounter + 1
WHERE
-- Only if there is no next record
Next.PeopleId IS NULL AND
-- And the current record is not only full (at 6), but also has another value to add (7 and greater)
Multi.OtherCount > 6
-- The "lower records" will pull the data from the record above them if they still have room and the "top record" is full
-- At the same time, if a "top record" is full and there's a value, we clear it
UPDATE Multi
SET
Case_30 = CASE @MultipliedCounter WHEN 0 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_30 ELSE 0 END ELSE Multi.Case_30 END,
Case_32 = CASE @MultipliedCounter WHEN 1 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_32 ELSE 0 END ELSE Multi.Case_32 END,
Case_33 = CASE @MultipliedCounter WHEN 2 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_33 ELSE 0 END ELSE Multi.Case_33 END,
Case_34 = CASE @MultipliedCounter WHEN 3 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_34 ELSE 0 END ELSE Multi.Case_34 END,
Case_40 = CASE @MultipliedCounter WHEN 4 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_40 ELSE 0 END ELSE Multi.Case_40 END,
Case_71 = CASE @MultipliedCounter WHEN 5 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_71 ELSE 0 END ELSE Multi.Case_71 END,
Case_84 = CASE @MultipliedCounter WHEN 6 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_84 ELSE 0 END ELSE Multi.Case_84 END,
Case_85 = CASE @MultipliedCounter WHEN 7 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_85 ELSE 0 END ELSE Multi.Case_85 END
FROM
#Multiplied_Values AS Multi
LEFT JOIN #Multiplied_Values AS Prev
ON Prev.FederalEmployer = Multi.FederalEmployer AND
Prev.PeopleId = Multi.PeopleId AND
Prev.MultiSlipCounter + 1 = Multi.MultiSlipCounter AND
CASE @MultipliedCounter
WHEN 0 THEN Prev.Case_30
WHEN 1 THEN Prev.Case_32
WHEN 2 THEN Prev.Case_33
WHEN 3 THEN Prev.Case_34
WHEN 4 THEN Prev.Case_40
WHEN 5 THEN Prev.Case_71
WHEN 6 THEN Prev.Case_84
WHEN 7 THEN Prev.Case_85
END <> 0
WHERE
-- Only if the current record is empty (to import the data) or full (to clear the data)
NOT (Multi.OtherCount BETWEEN 1 AND 6) OR
-- Else, there must be no data present if we want to import something in this line
CASE @MultipliedCounter
WHEN 0 THEN Multi.Case_30
WHEN 1 THEN Multi.Case_32
WHEN 2 THEN Multi.Case_33
WHEN 3 THEN Multi.Case_34
WHEN 4 THEN Multi.Case_40
WHEN 5 THEN Multi.Case_71
WHEN 6 THEN Multi.Case_84
WHEN 7 THEN Multi.Case_85
END = 0
SELECT @MultipliedCounter = @MultipliedCounter + 1
END
SELECT * FROM #Multiplied_Values
-- Final cleanup
IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL
DROP TABLE #Multiplied_Values
IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL
DROP TABLE #Temp_Values
As a side note, your script has problems splitting values twice over the limit. It just lets them slip by, it seems ;-).
October 16, 2009 at 8:00 am
Jonathan Melo (10/16/2009)
First, Garadin, I'd like to thank you a lot for the effort you're putting in this problem. It looks like you went through a lot of trouble to make that script, and I appreciate it!
It was a fun challenge. At some point if I have a bit more time I'll look at removing the cursor and cleaning this up a lot(Mainly for my own peace of mind, I understand you are already working on an alternate solution). This was a very rough draft, but wanted to throw it out there in case you were still struggling.
Jonathan Melo (10/16/2009)
Now, to the problem itself... the "crazy things for a reason you don't understand" do have a valid reason behind them 😀 This procedure is supposed to gather and format data to send back to the front-end application, so it can generate a file that will be sent to a third party. The file being in XML, it has it's own rules that I must follow, and for design reasons, the application should do the bare minimum (which is, take the recordset and fill the XML file with it, no extra data manipulation).
I assumed there *were* reasons behind it, I was just noting that I didn't know what they were :hehe:. I tend to be a little frivolous in my script headers, especially if I'm not happy with where it's at in terms of performance or design currently. I have a fairly large hatred of cursors, so this one falls firmly into that category =).
Jonathan Melo (10/16/2009)
As for the code so far, I've been working on it on my side since I've posted. I came up with something that works quite well, but doesn't handle every situation it needs to (still having problems when I get over 2 rows generated, the values just won't follow). However, it doesn't make use of any cursor (alright, I cheated juste a little, since I loop to execute some code once per column), no GOTO (this time, no cheating), and as far as I know no RBAR.
I'd definitely be interested to see how they matched up on a larger dataset. The tally join in yours is pretty hefty and seems to me to be massive RBAR. Hidden RBAR in code that looks set based is often more dangerous than just straight out loops, so be careful there =). I can't really test these side by side yet as yours still returns different results than mine, but yours is actually a bit slower on these 6 rows of test data. Being slower on 6 rows doesn't mean that yours won't blow it away on a larger dataset though, which is the normal problem with cursors. Out of curiosity, (I haven't had a chance to examine yours in full detail yet), but are you attempting to handle moving the best piece of data in any given row so that you don't overflow your 6 allotted rows?
Jonathan Melo (10/16/2009)
As a side note, your script has problems splitting values twice over the limit. It just lets them slip by, it seems ;-).
It seems to work properly for me. It does stop at 6 rows though and won't error, it just stops processing. For example, changing c30 to 12999.99 instead of 9999.99 generates 2 5000 rows and 1 2999.99 row in my tests. What test data did you use on mine that is coming out incorrectly?
October 16, 2009 at 9:00 am
Garadin (10/16/2009)
I assumed there *were* reasons behind it, I was just noting that I didn't know what they were :hehe:. I tend to be a little frivolous in my script headers, especially if I'm not happy with where it's at in terms of performance or design currently. I have a fairly large hatred of cursors, so this one falls firmly into that category =).
I despise cursors myself. When I'm really, really stuck to go row by row, I use a WHILE construct with a key instead. Somewhere, in my head, there's a voice telling me that it's safer and possibly faster to do it this way. Go figure.
Garadin (10/16/2009)
I'd definitely be interested to see how they matched up on a larger dataset. The tally join in yours is pretty hefty and seems to me to be massive RBAR. Hidden RBAR in code that looks set based is often more dangerous than just straight out loops, so be careful there =). I can't really test these side by side yet as yours still returns different results than mine, but yours is actually a bit slower on these 6 rows of test data. Being slower on 6 rows doesn't mean that yours won't blow it away on a larger dataset though, which is the normal problem with cursors.
I was under the impression that joining on the tally table this way was the trick to avoid RBAR, since it only joins one time when needed, but will join twice or more times when I need to duplicate rows. However, in the second part of the script (where I try to apply the "max 6 values per record" rule), I perform a WHILE based on columns, and I increment, only affecting the column I want every time. I suppose, then, that this is actually a switched RBAR, or a CBAC (Column-by-Agonizing-Column) of some sort.
As for the different results, my script uses a FederalEmployer and a PeopleId as a key. I used your test data, but I assigned the keys differently to fit my situation.
Garadin (10/16/2009)
Out of curiosity, (I haven't had a chance to examine yours in full detail yet), but are you attempting to handle moving the best piece of data in any given row so that you don't overflow your 6 allotted rows?
Not quite sure of what you mean exactly, but my script will move data and fill every record to the max before creating a new one, yes. The only problem is, it has trouble to handle moving a multi-record value correctly. When running the script, you'll see the behavior if you look at FederalEmployer 1 and PeopleId 3, column Case_84. The value is correctly split by the code into multiple rows, but since this one has to shift on a new record and it's composed of multiple rows, there's a 5000 that gets lost. This is where I'm currently stuck.
Garadin (10/16/2009)
It seems to work properly for me. It does stop at 6 rows though and won't error, it just stops processing. For example, changing c30 to 12999.99 instead of 9999.99 generates 2 5000 rows and 1 2999.99 row in my tests. What test data did you use on mine that is coming out incorrectly?
Try putting 16999.99 instead, you'll see a 5000, 5000 and a 6999.99 trailing at the end, while it should be 5000, 5000, 5000, 1999.99. 🙂
October 16, 2009 at 10:29 am
Jonathan Melo (10/16/2009)
I despise cursors myself. When I'm really, really stuck to go row by row, I use a WHILE construct with a key instead. Somewhere, in my head, there's a voice telling me that it's safer and possibly faster to do it this way. Go figure.
Eh, for the most part, a loop is a loop IMO. Some RBAR is actually much worse than curors/while loops would ever be and it looks completely innocuous. Triangular Joins, correlated subqueries, cross joins and certain placings of UDF's come to mind as very common culprits that are often overlooked until your database asplodes.
Jonathan Melo (10/16/2009)
I was under the impression that joining on the tally table this way was the trick to avoid RBAR, since it only joins one time when needed, but will join twice or more times when I need to duplicate rows. However, in the second part of the script (where I try to apply the "max 6 values per record" rule), I perform a WHILE based on columns, and I increment, only affecting the column I want every time. I suppose, then, that this is actually a switched RBAR, or a CBAC (Column-by-Agonizing-Column) of some sort.
Tally tables *can* be used to avoid loops, but they don't automatically make the query set based. In my tests, that tally table section is 70% of the total cost. I worry whenever I see <,> or 'OR' in a join. In most cases it may be unfounded concern though. In this particular case, I think the issue is that you're not stating a hard value limiter for Number and it's doing a table scan. Adding a WHERE Number < 50 AND and putting the rest of your join criteria in parens speeds the entire thing up drastically.
Jonathan Melo (10/16/2009)
As for the different results, my script uses a FederalEmployer and a PeopleId as a key. I used your test data, but I assigned the keys differently to fit my situation.
I guess I'm confused on your output. I see where the issue is with chopping off 5000, but where are the other values like 847.55 going?
Jonathan Melo (10/16/2009)
Not quite sure of what you mean exactly, but my script will move data and fill every record to the max before creating a new one, yes. The only problem is, it has trouble to handle moving a multi-record value correctly. When running the script, you'll see the behavior if you look at FederalEmployer 1 and PeopleId 3, column Case_84. The value is correctly split by the code into multiple rows, but since this one has to shift on a new record and it's composed of multiple rows, there's a 5000 that gets lost. This is where I'm currently stuck.
What I mean is this. Say for this example you have a limit of 3 fields per row, a limit of 3 rows and a limit of 20 per field.
If the program was really intelligent, it would handle a situation like this:
Start Data
10101030<-- 4 rows, 4th column has too high a value. Basically need 2 rows to
0055<-- resolve this currently. One to move the whole value to and 1 more to split it.
00100
My Version
1010100
0055<-- Moves the fourth value because the row already has 3.
001030<-- The 30 cannot be split because we are out of rows.
Smarter Version
1001020<-- Moves the 10 in col2 down 1 instead of moving col4, splits col4 in place.
01055
001010
That seems like it complicates this task severely though.
Jonathan Melo (10/16/2009)
Try putting 16999.99 instead, you'll see a 5000, 5000 and a 6999.99 trailing at the end, while it should be 5000, 5000, 5000, 1999.99. 🙂
It only does this if you leave 6000 in the third record as well, at which point it cannot resolve that last split because it would put it at 7 rows. If you remove the 6000, it processes properly because it has enough rows to do so.
October 20, 2009 at 5:38 pm
October 21, 2009 at 7:53 am
Yeah, I've been thinking about it over the weekend, and had a flash monday evening in the shower that made me jump to my current version.
As of now, it works. It will split any values larger than the specified limit into multiple records, and will make sure there are never more than 6 values on the same record by moving the extra values to another record (creating it if need be). It also meets my other goals, which means no cursors, no hard limits (besides the hardcode 20, which I'll patch up to be dynamic in the short future), somewhat easy to maintain (e.g. if I have to add a new column, which is the most probable case) and it optimizes the split to create the less possible records.
For the purpose of the stripped-down version I'll show as an example, I used again the values you put into your first script, but I group them by FederalEmployer and PeopleId, which matches my way of handling the data. The results vary a bit from your example, but if you put a unique identifier on every record of your test data (from 1 to 6), they match up (except my script has optimized the split and uses 2 records where yours uses 3).
/*
Note : Sys_Sequence is the tally table. It starts at 0 and goes up. Here's a small script to create it if need be. DON'T FORGET TO CLEAN UP!
CREATE TABLE Sys_Sequence(Number int PRIMARY KEY)
DECLARE @i int
SELECT @i = 0
WHILE @i <= 200
BEGIN
INSERT Sys_Sequence
SELECT @i
SELECT @i = @i + 1
END
*/
-- Cautious pre-run cleanup
IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL
DROP TABLE #Temp_Values
IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL
DROP TABLE #Multiplied_Values
IF OBJECT_ID('TempDB..#Columns') IS NOT NULL
DROP TABLE #Columns
-- Test data table
CREATE TABLE #Temp_Values
(
FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,
PeopleId int,
/*Value*/ /*Maximum value for the column*/ /*Number of slips we'll need to write the current value*/
Case_30 money, Case_30_MaxValue money DEFAULT(5000), Case_30_SlipCount AS Floor((Case_30-.01) / Case_30_MaxValue),
Case_32 money, Case_32_MaxValue money DEFAULT(5000), Case_32_SlipCount AS Floor((Case_32-.01) / Case_32_MaxValue),
Case_33 money, Case_33_MaxValue money DEFAULT(5000), Case_33_SlipCount AS Floor((Case_33-.01) / Case_33_MaxValue),
Case_34 money, Case_34_MaxValue money DEFAULT(5000), Case_34_SlipCount AS Floor((Case_34-.01) / Case_34_MaxValue),
Case_40 money, Case_40_MaxValue money DEFAULT(5000), Case_40_SlipCount AS Floor((Case_40-.01) / Case_40_MaxValue),
Case_71 money, Case_71_MaxValue money DEFAULT(5000), Case_71_SlipCount AS Floor((Case_71-.01) / Case_71_MaxValue),
Case_84 money, Case_84_MaxValue money DEFAULT(5000), Case_84_SlipCount AS Floor((Case_84-.01) / Case_84_MaxValue),
Case_85 money, Case_85_MaxValue money DEFAULT(5000), Case_85_SlipCount AS Floor((Case_85-.01) / Case_85_MaxValue),
PRIMARY KEY(FederalEmployer, PeopleId)
)
-- Final data table
CREATE TABLE #Multiplied_Values
(
FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,
PeopleId int,
MultiSlipCounter int,
OtherCount int DEFAULT(0),
Case_30 money DEFAULT(0),
Case_32 money DEFAULT(0),
Case_33 money DEFAULT(0),
Case_34 money DEFAULT(0),
Case_40 money DEFAULT(0),
Case_71 money DEFAULT(0),
Case_84 money DEFAULT(0),
Case_85 money DEFAULT(0),
PRIMARY KEY(FederalEmployer, PeopleId, MultiSlipCounter)
)
-- Helper table, to handle each column by People/FederalEmployer, from the column having the highest amount to the lowest
CREATE TABLE #Columns
(
MyKey int IDENTITY(1, 1),
FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,
PeopleId int,
KeyByPeople int,
Col int,
SlipCount int,
PRIMARY KEY (FederalEmployer, PeopleId, MyKey)
)
-- Index to help performance on larger datasets
CREATE INDEX IX_Temp_Columns_ColByPeople ON #Columns(KeyByPeople, Col)
-- Insert test data
INSERT #Temp_Values
(
FederalEmployer,
PeopleId,
Case_30,
Case_32,
Case_33,
Case_34,
Case_40,
Case_71,
Case_84,
Case_85
)
SELECT '1', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 0 UNION ALL
SELECT '1', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL
SELECT '1', 5, 6000,0,0,0,0,0,0,0 UNION ALL
SELECT '2', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 500 UNION ALL
SELECT '2', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL
SELECT '2', 5, 6000,0,0,0,0,0,0,0
SELECT * FROM #Temp_Values
-- Multiply the records to respect the max value per column
INSERT #Multiplied_Values
SELECT
FederalEmployer,
PeopleId,
Sys_Sequence.Number,
0,
-- If we're after the number of slip we'll need to write the value, write 0
-- If we're exactly on the number of slip, write the value minus the maximum value multiplied by the number of slips already done so far
-- If we're before the number of slip we'll need to write the value, write the maximum value
CASE WHEN Number > Case_30_SlipCount THEN 0 WHEN Number = Case_30_SlipCount THEN Case_30 - (Case_30_MaxValue * Number) ELSE Case_30_MaxValue END,
CASE WHEN Number > Case_32_SlipCount THEN 0 WHEN Number = Case_32_SlipCount THEN Case_32 - (Case_32_MaxValue * Number) ELSE Case_32_MaxValue END,
CASE WHEN Number > Case_33_SlipCount THEN 0 WHEN Number = Case_33_SlipCount THEN Case_33 - (Case_33_MaxValue * Number) ELSE Case_33_MaxValue END,
CASE WHEN Number > Case_34_SlipCount THEN 0 WHEN Number = Case_34_SlipCount THEN Case_34 - (Case_34_MaxValue * Number) ELSE Case_34_MaxValue END,
CASE WHEN Number > Case_40_SlipCount THEN 0 WHEN Number = Case_40_SlipCount THEN Case_40 - (Case_40_MaxValue * Number) ELSE Case_40_MaxValue END,
CASE WHEN Number > Case_71_SlipCount THEN 0 WHEN Number = Case_71_SlipCount THEN Case_71 - (Case_71_MaxValue * Number) ELSE Case_71_MaxValue END,
CASE WHEN Number > Case_84_SlipCount THEN 0 WHEN Number = Case_84_SlipCount THEN Case_84 - (Case_84_MaxValue * Number) ELSE Case_84_MaxValue END,
CASE WHEN Number > Case_85_SlipCount THEN 0 WHEN Number = Case_85_SlipCount THEN Case_85 - (Case_85_MaxValue * Number) ELSE Case_85_MaxValue END
FROM
#Temp_Values AS Temp_Values
INNER JOIN Sys_Sequence
ON Sys_Sequence.Number <= Case_30_SlipCount
OR Sys_Sequence.Number <= Case_32_SlipCount
OR Sys_Sequence.Number <= Case_33_SlipCount
OR Sys_Sequence.Number <= Case_34_SlipCount
OR Sys_Sequence.Number <= Case_40_SlipCount
OR Sys_Sequence.Number <= Case_71_SlipCount
OR Sys_Sequence.Number <= Case_84_SlipCount
OR Sys_Sequence.Number <= Case_85_SlipCount
WHERE
-- Currently limiting to 20 slips maximum, to enabled a better execution plan. Change to the real calculated max in the future to remove the limitation.
Sys_Sequence.Number < 20
-- Load every People/FederalEmployer, and determine the order in which the columns will be handled (highest amount to lowest amount)
INSERT #Columns(FederalEmployer, PeopleId, Col, SlipCount)
SELECT
FederalEmployer,
PeopleId,
ColKey,
SlipCount
FROM
(
SELECT FederalEmployer, PeopleId, 0 AS ColKey, Case_30_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_30_SlipCount UNION ALL
SELECT FederalEmployer, PeopleId, 1 AS ColKey, Case_32_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_32_SlipCount UNION ALL
SELECT FederalEmployer, PeopleId, 2 AS ColKey, Case_33_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_33_SlipCount UNION ALL
SELECT FederalEmployer, PeopleId, 3 AS ColKey, Case_34_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_34_SlipCount UNION ALL
SELECT FederalEmployer, PeopleId, 4 AS ColKey, Case_40_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_40_SlipCount UNION ALL
SELECT FederalEmployer, PeopleId, 5 AS ColKey, Case_71_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_71_SlipCount UNION ALL
SELECT FederalEmployer, PeopleId, 6 AS ColKey, Case_84_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_84_SlipCount UNION ALL
SELECT FederalEmployer, PeopleId, 7 AS ColKey, Case_85_SlipCount AS SlipCount FROM #Temp_Values GROUP BY FederalEmployer, PeopleId, Case_85_SlipCount
) AS Val
WHERE
SlipCount >= 0 -- Quickly remove any combo where no work needs to be done
ORDER BY
FederalEmployer,
PeopleId,
SlipCount DESC,
ColKey
-- Give an index (from 0 to 7) to all columns in the combo, per Employee/FederalEmployer
UPDATE #Columns
SET
KeyByPeople = MyKey - MinByPeople.MinKey
FROM
#Columns AS Columns
INNER JOIN (SELECT FederalEmployer, PeopleId, Min(MyKey) AS MinKey FROM #Columns GROUP BY FederalEmployer, PeopleId) AS MinByPeople
ON Columns.FederalEmployer = MinByPeople.FederalEmployer AND
Columns.PeopleId = MinByPeople.PeopleId
-- Declare the variables for the next part
DECLARE @ColumnCounter int, @MaxColumnCounter int
SELECT @ColumnCounter = Min(KeyByPeople), @MaxColumnCounter = Max(KeyByPeople) FROM #Columns
-- For each of the columns, validate if we need to move the info to another record (max 6 info per record)
WHILE @ColumnCounter <= @MaxColumnCounter
BEGIN
-- Update the count of info found per record so far
UPDATE #Multiplied_Values
SET
OtherCount = UpdatedCount.CountSoFar
FROM
#Multiplied_Values AS Multi
INNER JOIN
(
SELECT
Multi.FederalEmployer,
Multi.PeopleId,
Multi.MultiSlipCounter,
Sum(CASE WHEN Columns.Col = 0 AND Case_30 <> 0 THEN 1 ELSE 0 END+
CASE WHEN Columns.Col = 1 AND Case_32 <> 0 THEN 1 ELSE 0 END+
CASE WHEN Columns.Col = 2 AND Case_33 <> 0 THEN 1 ELSE 0 END+
CASE WHEN Columns.Col = 3 AND Case_34 <> 0 THEN 1 ELSE 0 END+
CASE WHEN Columns.Col = 4 AND Case_40 <> 0 THEN 1 ELSE 0 END+
CASE WHEN Columns.Col = 5 AND Case_71 <> 0 THEN 1 ELSE 0 END+
CASE WHEN Columns.Col = 6 AND Case_84 <> 0 THEN 1 ELSE 0 END+
CASE WHEN Columns.Col = 7 AND Case_85 <> 0 THEN 1 ELSE 0 END) AS CountSoFar
FROM
#Multiplied_Values AS Multi
INNER JOIN #Columns AS Columns
ON Multi.FederalEmployer = Columns.FederalEmployer AND
Multi.PeopleId = Columns.PeopleId AND
Columns.KeyByPeople <= @ColumnCounter
GROUP BY
Multi.FederalEmployer,
Multi.PeopleId,
Multi.MultiSlipCounter
) AS UpdatedCount
ON Multi.FederalEmployer = UpdatedCount.FederalEmployer AND
Multi.PeopleId = UpdatedCount.PeopleId AND
Multi.MultiSlipCounter = UpdatedCount.MultiSlipCounter
-- Insert an empty record when we detect we'll need one
INSERT #Multiplied_Values(FederalEmployer, PeopleId, MultiSlipCounter)
SELECT
Multi.FederalEmployer,
Multi.PeopleId,
-- Adds one line either for half a value expressed on multiple lines, or multiples lines at once, without violating the primary key
NextMultiSlip.NextMultiSlipCounter+Columns.SlipCount-Multi.MultiSlipCounter
FROM
#Multiplied_Values AS Multi
INNER JOIN (SELECT FederalEmployer, PeopleId, Max(MultiSlipCounter) + 1 AS NextMultiSlipCounter FROM #Multiplied_Values GROUP BY FederalEmployer, PeopleId) AS NextMultiSlip
ON Multi.FederalEmployer = NextMultiSlip.FederalEmployer AND
Multi.PeopleId = NextMultiSlip.PeopleId
INNER JOIN #Columns AS Columns
ON Multi.FederalEmployer = Columns.FederalEmployer AND
Multi.PeopleId = Columns.PeopleId AND
Columns.KeyByPeople = @ColumnCounter
LEFT JOIN #Multiplied_Values AS Next
ON Next.FederalEmployer = Multi.FederalEmployer AND
Next.PeopleId = Multi.PeopleId AND
-- Next is after Multi
Next.MultiSlipCounter > Multi.MultiSlipCounter AND
-- Multi and Next are "synchronized" to handle values expressed on multiple lines
((Next.MultiSlipCounter - Multi.MultiSlipCounter) % (Columns.SlipCount+1) = 0) AND
-- Multi must "export" data
Multi.OtherCount > 6 AND
-- Next can "get" data
Next.OtherCount < 6
WHERE
-- Only if there is no next record
Next.PeopleId IS NULL AND
-- And the current record is not only full (at 6), but also has another value to add (7 and greater)
Multi.OtherCount > 6
-- The "lower records" will pull the data from the record above them if they still have room and the "top record" is full
-- At the same time, if a "top record" is full and there's a value, we clear it
UPDATE Multi
SET
Case_30 = CASE DataJumpInfo.Col WHEN 0 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_30 ELSE 0 END ELSE Multi.Case_30 END,
Case_32 = CASE DataJumpInfo.Col WHEN 1 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_32 ELSE 0 END ELSE Multi.Case_32 END,
Case_33 = CASE DataJumpInfo.Col WHEN 2 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_33 ELSE 0 END ELSE Multi.Case_33 END,
Case_34 = CASE DataJumpInfo.Col WHEN 3 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_34 ELSE 0 END ELSE Multi.Case_34 END,
Case_40 = CASE DataJumpInfo.Col WHEN 4 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_40 ELSE 0 END ELSE Multi.Case_40 END,
Case_71 = CASE DataJumpInfo.Col WHEN 5 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_71 ELSE 0 END ELSE Multi.Case_71 END,
Case_84 = CASE DataJumpInfo.Col WHEN 6 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_84 ELSE 0 END ELSE Multi.Case_84 END,
Case_85 = CASE DataJumpInfo.Col WHEN 7 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_85 ELSE 0 END ELSE Multi.Case_85 END
FROM
(-- Find the lower difference before a valid "export" target
SELECT
Multi.FederalEmployer,
Multi.PeopleId,
Columns.Col,
Min(Multi.MultiSlipCounter - Prev.MultiSlipCounter) AS MultiSlipDifference
FROM
#Multiplied_Values AS Multi
INNER JOIN #Columns AS Columns
ON Multi.FederalEmployer = Columns.FederalEmployer AND
Multi.PeopleId = Columns.PeopleId AND
Columns.KeyByPeople = @ColumnCounter
LEFT JOIN #Multiplied_Values AS Prev
ON Prev.FederalEmployer = Multi.FederalEmployer AND
Prev.PeopleId = Multi.PeopleId AND
-- Prev is before Multi
Prev.MultiSlipCounter < Multi.MultiSlipCounter AND
-- Multi and Prev are "synchronized" to handle values expressed on multiple lines
((Multi.MultiSlipCounter - Prev.MultiSlipCounter) % (Columns.SlipCount+1) = 0) AND
-- Prev must "export" data
Prev.OtherCount > 6 AND
-- Multi can "get" data
Multi.OtherCount < 6
WHERE
-- Only if we are already full and a value now needs to be "cleared"
Multi.OtherCount > 6 OR
-- Else, there must be data to import AND there must be no data present if we want to import something in this line
(Prev.OtherCount > 6 AND
CASE Columns.Col
WHEN 0 THEN Multi.Case_30
WHEN 1 THEN Multi.Case_32
WHEN 2 THEN Multi.Case_33
WHEN 3 THEN Multi.Case_34
WHEN 4 THEN Multi.Case_40
WHEN 5 THEN Multi.Case_71
WHEN 6 THEN Multi.Case_84
WHEN 7 THEN Multi.Case_85
END = 0)
GROUP BY
Multi.FederalEmployer,
Multi.PeopleId,
Columns.Col
) AS DataJumpInfo
INNER JOIN #Multiplied_Values AS Multi
ON Multi.FederalEmployer = DataJumpInfo.FederalEmployer AND
Multi.PeopleId = DataJumpInfo.PeopleId
LEFT JOIN #Multiplied_Values AS Prev
ON Prev.FederalEmployer = Multi.FederalEmployer AND
Prev.PeopleId = Multi.PeopleId AND
-- Make sure we move the data only to the first place available
Multi.MultiSlipCounter - Prev.MultiSlipCounter = DataJumpInfo.MultiSlipDifference
SELECT @ColumnCounter = @ColumnCounter + 1
END
SELECT * FROM #Multiplied_Values
-- Final cleanup
IF OBJECT_ID('TempDB..#Columns') IS NOT NULL
DROP TABLE #Columns
IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL
DROP TABLE #Multiplied_Values
IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL
DROP TABLE #Temp_Values
As an aside, what made me end up with the final solution was a lot of pieces I already saw in possible solutions, but the driving force was the helper table to optimize the split (which I was thinking about ever since you asked if I was optimizing the split). Once this was in place, the rest was much easier to do :-).
Thanks a lot for your help, Garadin!
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply