SELECTing values on different records based on rules

Question

SELECTing values on different records based on rules

Jonathan Melo

Old Hand

Points: 329
More actions
October 13, 2009 at 1:12 pm

#215452

Hey everyone,
I've been trying to wrap my mind around a problem lately, and I can't make it. Out of desperation, I turn to you for new ideas to accomplish the task.
First, a quick recap of my situation :
I must produce a specific ResultSet from a SQL query. There are rules about that ResultSet.
1 - A column value must not exceed a maximum value defined by column (for example, the value of c30 must not exceed 999.99, the value of c32 must not exceed 9999.99, etc).
2 - There must be a maximum of 6 values per record.
In either case, the solution is to add a new record and copy the rest of the values in there. For instance, if I must write 1500 in c30 and the limit is 999.99, I'll have a record with 999.99 and a second record with 500.01.
I handled the first rule on my own (for the record, I made a temp table aside containing the maximum values per columns and I joined it on a Tally table using the result of a division between the actual value and the maximum, therefore creating enough records every time). The one that's been giving me problems is the second one. I keep having problems to make a query that will transform the data the way I want.
Here's two small test cases, for testing purposes.
DECLARE @test-2 TABLE(c30 money, c32 money, c33 money, c34 money, c40 money, c71 money, c84 money, c85 money)
--Test values #1
DELETE @test-2
INSERT @test-2
SELECT 9999.99, 98.54, 847.55, 99999.99, 34.55, 365.74, 8462.57, 0 UNION ALL
SELECT 2587.4, 0, 0, 18.42, 0, 0, 0, 0
SELECT * FROM @test-2
--Expected result for values #1
SELECT 9999.99, 98.54, 847.55, 99999.99, 34.55, 365.74,0, 0 UNION ALL
SELECT 2587.4, 0, 0, 18.42, 0, 0, 8462.57, 0
--Test values #2
DELETE @test-2
INSERT @test-2
SELECT 9999.99, 99.99, 999.99, 99999.99, 99.99, 9999.99, 8462.57, 0 UNION ALL
SELECT 2587.4, 98.54, 847.55, 18.42, 34.55, 365.74, 0, 0
SELECT * FROM @test-2
--Expected result for values #2
SELECT 9999.99, 99.99, 999.99, 99999.99, 99.99, 9999.99, 0, 0 UNION ALL
SELECT 2587.4, 98.54, 847.55, 18.42, 34.55, 365.74, 0, 0 UNION ALL
SELECT 0, 0, 0, 0, 0, 0, 8462.57, 0
Does anyone have an idea on how I could tackle this problem?

Viewing 13 posts - 1 through 12 (of 12 total)

You must be logged in to reply to this topic. Login to reply

Garadin One Orange Chip Points: 29613 More actions · Answer 1

:blink: :blink: :blink: Uhh... Huh? :ermm: :ermm: :ermm:

In either case, the solution is to add a new record and copy the rest of the values in there. For instance, if I must write 1500 in c30 and the limit is 999.99, I'll have a record with 999.99 and a second record with 500.01.

That. Makes sense.

Everything else about the post is extremely confusing. Your rules don't correlate to your test data in any way that I can see (you state c30 has a max of 999.99 but your actual number is 9999.99).

Did you perhaps forget to include the "rules" for each given column in this example? Cuz I don't see anything indicating the maximum per column.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 2

OK, I think I see part of it. The 8462.57 value is creating its own line because it is the 7th value (zeroes not counted apparently) in the row and you only allow 6. Correct?

What in the world do you need something like this for? Before people here spend a bunch of time attempting to do something that should never be done in SQL, I think it'd be good to convince people they're not just wasting their time.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Matt Miller (4) SSC Guru Points: 124210 More actions · Answer 3

Other than the confusion - I can't help but think we've got a normalization disaster going on with is making this problem much harder to handle. Whe're the relationship between the values you're breaking into smaller chunks and said smaller chunks (so - when one changes, you can adjust, etc...). Why do we have the same piece of data repeated 8 times in a single row (violation of normalization)?

More information as to what you're ACTUALLY doing (vs. this particular implementation) might help get you better ideas (and ones designed to work well with SQL server).

----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

Jonathan Melo Old Hand Points: 329 More actions · Answer 4

The query is designed to be used by the front-end application to generate a data file. This part is the final portion, where I take the data I've compiled so far and apply the rules to it before returning it.

The columns (c30, c32, c34, c35, etc) are simply different values of a record I need to send to the application. I have a record per employee. The example values are for a single employee, in the end I'll have to handle multiple employees, but I figure if I can get something working with a single record, it's gonna be easy to do so for multiple records.

Also, the columns are NOT coming straight from tables - they're computed in a temporary table from a lot of tables using some more logic. The result is a set of values (c30, c32, c34, c35, etc) per employees, and I must apply the last two rules to them before sending it back to the application.

As for the example I've given to the limit of c30, I'm aware that it does not match the example data I've sent. The portion about the limit is already handled, and is explained only to show the case where a value would already be splitted in two records without confusing everyone (something that, I fear, I've failed at).

Sorry for the confusion. I hope it's a bit better now.

Garadin One Orange Chip Points: 29613 More actions · Answer 5

Jonathan Melo (10/14/2009)
The example values are for a single employee, in the end I'll have to handle multiple employees, but I figure if I can get something working with a single record, it's gonna be easy to do so for multiple records.

While that is not necessarily a false statement, by not including the details on things like that, and not giving us the part of the solution you've already written, you've more or less ensured this thing will be massive RBAR. We don't know what you've already done (to see if we can combine steps) and we don't know how this data is keyed per employee to possibly do something like this at a set level as opposed to trying to make it work for one person. So while you will likely be able to throw the whole thing in a cursor, you're not likely to get a performant method.

That said, that objection may be mostly academic in nature. I'll think about this for a bit and see what I can come up with.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 6

So I thought about it for a while, this is what I came up with. I'll state up front that I can already see half a dozen ways to improve this / clean it up / make it better, but I'll leave it here for now. I'll post this in the body for people who want to read it, and as an attachment so that the formatting isn't completely boned.

One thing of note is that this does not use any logic to determine what the best value to move down a row is when there are more than 6 values in a row. That was not stated in the original problem so I didn't add it in. Without this, you aren't efficiently utilizing all of your columns and could fail to move rows around.

If anyone wants to take this a bit further and optimize sections or get rid of the cursor, please post back the final script when you're done.

/*

=============================================================================================

CREATE DATE: 10/15/2009

LAST MODIFIED:10/15/2009

CREATED BY:SETH PHELABAUM

PURPOSE:Do crazy things for reasons I don't understand

NOTES:This assumes a hard-coded limit of $5000 for all columns(can obviously be expanded upon).

The unpivot could be made dynamic.

There's way too much hardcoded. Things like Max Values Per Row should be a variable set once at the top.

ISSUES:Uses a cursor and GOTO's for flow control. This could likely be improved.

The cursor and high number of updates will likely cause this to scale very poorly.

Revision History:

Date ByChange Made

-------- --- -------------------------------------

=============================================================================================

*/

SET ANSI_WARNINGS OFF

SET NOCOUNT ON

------------------ DECLARE VARIABLES --------------------

DECLARE @Seqint,

@Identint,

@ColNamevarchar(10),

@ColNumint,

@Amountmoney,

@ColCountint,

@OverFlowint,

@ChangeMadebit

--------------------- Test Data Setup ---------------------

DECLARE @test-2 TABLE(Ident int, c30 money, c32 money, c33 money, c34 money, c40 money, c71 money, c84 money, c85 money)

INSERT @test-2

SELECT 1, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 0 UNION ALL

SELECT 1, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

SELECT 1, 6000,0,0,0,0,0,0,0 UNION ALL

SELECT 2, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 500 UNION ALL

SELECT 2, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

SELECT 2, 6000,0,0,0,0,0,0,0

SELECT * FROM @test-2

-----------------------------------------------------------

-------- Add Sequencing Column and Sequence Rows ---------- (Would use RowNumber in 2K5+)

SELECT CAST(0 AS int) seq, *

INTO #Test

FROM @test-2

CREATE CLUSTERED INDEX Quirk ON #Test (Ident)

SET @Seq = 0

UPDATE #Test

SET @Seq = Seq = CASE WHEN @Ident = Ident THEN @Seq + 1 ELSE 1 END,

@Ident = Ident

FROM #Test OPTION (MAXDOP 1)

--SELECT * FROM #Test

-----------------------------------------------------------

----------------- Add Additional Rows ---------------------

INSERT INTO #Test

SELECT T2.N, T2.Ident, c30, c32, c33, c34, c40, c71, c84, c85

FROM (SELECT Ident, N

FROM (SELECT DISTINCT IDent FROM #Test) T, Tally T1

WHERE T1.N <= 6) T2

LEFT JOIN #Test T ON T.Ident = T2.Ident AND T.Seq = T2.N

WHERE T.Ident IS NULL

--SELECT * FROM #test

-----------------------------------------------------------

----------------- Unpivot the Data ------------------------

CREATE TABLE #UPTest(

UPIdentityint IDENTITY(1,1),

Identint,

Seqint,

ColNamevarchar(10),

ColNumint,

Amountmoney,

ColCountint,

OverFlowint)

INSERT INTO #UPTest(Ident, Seq, ColName, ColNum, Amount, ColCount, Overflow)

SELECT Ident, Seq, 'C30', 1, C30, 0, 0 FROM #Test UNION ALL

SELECT Ident, Seq, 'C32', 2, C32, 0, 0 FROM #Test UNION ALL

SELECT Ident, Seq, 'C33', 3, C33, 0, 0 FROM #Test UNION ALL

SELECT Ident, Seq, 'C34', 4, C34, 0, 0 FROM #Test UNION ALL

SELECT Ident, Seq, 'C40', 5, C40, 0, 0 FROM #Test UNION ALL

SELECT Ident, Seq, 'C71', 6, C71, 0, 0 FROM #Test UNION ALL

SELECT Ident, Seq, 'C84', 7, C84, 0, 0 FROM #Test UNION ALL

SELECT Ident, Seq, 'C85', 8, C85, 0, 0 FROM #Test

CREATE CLUSTERED INDEX CI_UpdTest ON #UPTest (Ident, Seq, ColName, Amount)

-----------------------------------------------------------

----------------- Identify Problems -----------------------

UPDATE #UPTest

SET ColCount = UP2.CC,

OverFlow = CASE WHEN Amount > 5000 THEN 1 ELSE 0 END

FROM #UPTest

INNER JOIN (SELECT ident, Seq, COUNT(ColName) CC

FROM #UPTest

WHERE ISNULL(Amount,0.00) <> 0.00

GROUP BY Ident, Seq) UP2

ON #UPTest.Ident = UP2.Ident AND #UPTest.Seq = UP2.Seq

--SELECT * FROM #UPTest

-----------------------------------------------------------

---- If there are any issues, fix, otherwise Finalize -----

IF EXISTS (SELECT * FROM #UPTest WHERE OverFlow = 1 OR ColCount > 6)

BEGIN

SET @ChangeMade = 1

GOTO ProblemFix

END

ELSE

GOTO Finalize

---------------- Fix Problems ----------------------------- (Crazy RBAR :: Focus here for improvement)

--(I'm not convinced this can't be done in a set based manner, but unfortunately, I'm out of time to spend on this.)

ProblemFix:

IF @ChangeMade = 0

GOTO Finalize

ELSE

SET @ChangeMade = 0

BEGIN

PRINT 'Cursor Starting'

DECLARE IDCursor CURSOR LOCAL FORWARD_ONLY STATIC READ_ONLY

FOR

SELECT Ident, Seq, ColName, ColNum, Amount, ColCount, OverFlow

FROM #UPTest

WHERE (ColCount > 6 AND ColNum > 6)

OR OverFlow = 1

ORDER BY Ident, Seq, OverFlow DESC, ColCount

OPEN IDCursor

FETCH NEXT FROM IDCursor INTO @Ident, @Seq, @ColName, @ColNum, @Amount, @ColCount, @Overflow

WHILE @@FETCH_STATUS = 0

BEGIN

--------------- Too Many Column Fixes ---------------------

IF @ColCount > 6 AND @ColNum > 6

AND EXISTS (SELECT * FROM #UPTEST

WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)

BEGIN

UPDATE #UPTest

SET Amount = 0.00, --Max Allowed Value

ColCount = ColCount - 1,

OverFlow = 0

WHERE ColName = @ColName AND Ident = @Ident AND Seq = @Seq

UPDATE #UPTest

SET Amount = @Amount,

OverFlow = CASE WHEN @Amount > 5000 THEN 1 ELSE 0 END

FROM #UPTest

WHERE ColName = @ColName

AND Ident = @Ident

AND Seq = (SELECT MIN(Seq)

FROM #UPTest

WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)

SET @ChangeMade = 1 -- Additional Flow Control

SET @OverFlow = 0 -- Need to do this one on another run through

END

-----------------------------------------------------------

--------------- Number Too Big Fixes ----------------------

IF @OverFlow = 1

AND EXISTS (SELECT * FROM #UPTEST

WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)

BEGIN

UPDATE #UPTest

SET Amount = 5000.00, --Max Allowed Value

OverFlow = 0

WHERE ColName = @ColName AND Ident = @Ident AND Seq = @Seq

UPDATE #UPTest

SET Amount = @Amount - 5000.00,

OverFlow = CASE WHEN @Amount - 5000.00 > 5000.00 THEN 1 ELSE 0 END

FROM #UPTest

WHERE ColName = @ColName

AND Ident = @Ident

AND Seq = (SELECT MIN(Seq)

FROM #UPTest

WHERE ColName = @ColName AND Ident = @Ident AND Seq > @Seq AND ISNULL(Amount,0.00) = 0.00)

SET @ChangeMade = 1 -- Additional Flow Control

END

-----------------------------------------------------------

FETCH NEXT FROM IDCursor INTO @Ident, @Seq, @ColName, @ColNum, @Amount, @ColCount, @Overflow

END

CLOSE IDCursor

DEALLOCATE IDCursor

IF EXISTS (SELECT * FROM #UPTest WHERE OverFlow = 1 OR ColCount > 6)

GOTO ProblemFix

ELSE

GOTO Finalize

END

Finalize: -- Repivot

SELECT Ident, C30, C32, C33, C34, C40, C71, C84, C85

FROM (

SELECT Ident, Seq,

SUM(CASE WHEN ColName = 'C30' THEN Amount ELSE 0 END) C30,

SUM(CASE WHEN ColName = 'C32' THEN Amount ELSE 0 END) C32,

SUM(CASE WHEN ColName = 'C33' THEN Amount ELSE 0 END) C33,

SUM(CASE WHEN ColName = 'C34' THEN Amount ELSE 0 END) C34,

SUM(CASE WHEN ColName = 'C40' THEN Amount ELSE 0 END) C40,

SUM(CASE WHEN ColName = 'C71' THEN Amount ELSE 0 END) C71,

SUM(CASE WHEN ColName = 'C84' THEN Amount ELSE 0 END) C84,

SUM(CASE WHEN ColName = 'C85' THEN Amount ELSE 0 END) C85,

SUM(Amount) RowSum

FROM #UPTest

GROUP BY Ident, Seq) A

WHERE RowSum > 0

ORDER BY Ident, Seq

DROP TABLE #UPTest

DROP TABLE #Test

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jonathan Melo Old Hand Points: 329 More actions · Answer 7

First, Garadin, I'd like to thank you a lot for the effort you're putting in this problem. It looks like you went through a lot of trouble to make that script, and I appreciate it!

Now, to the problem itself... the "crazy things for a reason you don't understand" do have a valid reason behind them 😀 This procedure is supposed to gather and format data to send back to the front-end application, so it can generate a file that will be sent to a third party. The file being in XML, it has it's own rules that I must follow, and for design reasons, the application should do the bare minimum (which is, take the recordset and fill the XML file with it, no extra data manipulation).

I do understand that SQL isn't the perfect choice to handle these kind of limitations, but I know it's feasible and the design of the solution flows a lot better it it's made that way.

As for the code so far, I've been working on it on my side since I've posted. I came up with something that works quite well, but doesn't handle every situation it needs to (still having problems when I get over 2 rows generated, the values just won't follow). However, it doesn't make use of any cursor (alright, I cheated juste a little, since I loop to execute some code once per column), no GOTO (this time, no cheating), and as far as I know no RBAR.

I've made a cutdown example (using your numbers and limits as test data) that also includes my solution to handle the limits on the values :

/*

Note : Sys_Sequence is the tally table. It starts at 0 and goes up. Here's a small script to create if if need be. DON'T FORGET TO CLEAN UP!

CREATE TABLE Sys_Sequence(Number int PRIMARY KEY)

DECLARE @i int

SELECT @i = 0

WHILE @i <= 200

BEGIN

INSERT Sys_Sequence

SELECT @i

SELECT @i = @i + 1

END

*/

-- Cautious pre-run cleanup

IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL

DROP TABLE #Temp_Values

IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL

DROP TABLE #Multiplied_Values

-- Test data table

CREATE TABLE #Temp_Values

(

FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

PeopleId int,

Case_30 money,

Case_30_MaxValue money DEFAULT(5000), -- Determines the maximum value for the equivalent column

Case_30_SlipCount AS Floor((Case_30-.01) / Case_30_MaxValue), -- Determines the number of slips we'll need to write that value

Case_32 money,

Case_32_MaxValue money DEFAULT(5000),

Case_32_SlipCount AS Floor((Case_32-.01) / Case_32_MaxValue),

Case_33 money,

Case_33_MaxValue money DEFAULT(5000),

Case_33_SlipCount AS Floor((Case_33-.01) / Case_33_MaxValue),

Case_34 money,

Case_34_MaxValue money DEFAULT(5000),

Case_34_SlipCount AS Floor((Case_34-.01) / Case_34_MaxValue),

Case_40 money,

Case_40_MaxValue money DEFAULT(5000),

Case_40_SlipCount AS Floor((Case_40-.01) / Case_40_MaxValue),

Case_71 money,

Case_71_MaxValue money DEFAULT(5000),

Case_71_SlipCount AS Floor((Case_71-.01) / Case_71_MaxValue),

Case_84 money,

Case_84_MaxValue money DEFAULT(5000),

Case_84_SlipCount AS Floor((Case_84-.01) / Case_84_MaxValue),

Case_85 money,

Case_85_MaxValue money DEFAULT(5000),

Case_85_SlipCount AS Floor((Case_85-.01) / Case_85_MaxValue),

PRIMARY KEY(FederalEmployer, PeopleId)

)

-- Final data table

CREATE TABLE #Multiplied_Values

(

FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

PeopleId int,

MultiSlipCounter int,

OtherCount int DEFAULT(0),

Case_30 money DEFAULT(0),

Case_32 money DEFAULT(0),

Case_33 money DEFAULT(0),

Case_34 money DEFAULT(0),

Case_40 money DEFAULT(0),

Case_71 money DEFAULT(0),

Case_84 money DEFAULT(0),

Case_85 money DEFAULT(0),

PRIMARY KEY(FederalEmployer, PeopleId, MultiSlipCounter)

)

-- Insert test data

INSERT #Temp_Values

(

FederalEmployer,

PeopleId,

Case_30,

Case_32,

Case_33,

Case_34,

Case_40,

Case_71,

Case_84,

Case_85

)

SELECT '1', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 0 UNION ALL

SELECT '1', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

SELECT '1', 5, 6000,0,0,0,0,0,0,0 UNION ALL

SELECT '2', 3, 9999.99, 98.54, 847.55, 9999.99, 34.55, 365.74, 8462.57, 500 UNION ALL

SELECT '2', 4, 2587.4, 0, 0, 18.42, 0, 0, 0, 0 UNION ALL

SELECT '2', 5, 6000,0,0,0,0,0,0,0

SELECT * FROM #Temp_Values

-- Multiply the records to respect the max value per column

INSERT #Multiplied_Values

SELECT

FederalEmployer,

PeopleId,

Sys_Sequence.Number,

0,

-- If we're after the number of slip we'll need to write the value, write 0

-- If we're exactly on the number of slip, write the value minus the maximum value multiplied by the number of slips already done so far

-- If we're before the number of slip we'll need to write the value, write the maximum value

CASE WHEN Number > Case_30_SlipCount THEN 0 WHEN Number = Case_30_SlipCount THEN Case_30 - (Case_30_MaxValue * Number) ELSE Case_30_MaxValue END,

CASE WHEN Number > Case_32_SlipCount THEN 0 WHEN Number = Case_32_SlipCount THEN Case_32 - (Case_32_MaxValue * Number) ELSE Case_32_MaxValue END,

CASE WHEN Number > Case_33_SlipCount THEN 0 WHEN Number = Case_33_SlipCount THEN Case_33 - (Case_33_MaxValue * Number) ELSE Case_33_MaxValue END,

CASE WHEN Number > Case_34_SlipCount THEN 0 WHEN Number = Case_34_SlipCount THEN Case_34 - (Case_34_MaxValue * Number) ELSE Case_34_MaxValue END,

CASE WHEN Number > Case_40_SlipCount THEN 0 WHEN Number = Case_40_SlipCount THEN Case_40 - (Case_40_MaxValue * Number) ELSE Case_40_MaxValue END,

CASE WHEN Number > Case_71_SlipCount THEN 0 WHEN Number = Case_71_SlipCount THEN Case_71 - (Case_71_MaxValue * Number) ELSE Case_71_MaxValue END,

CASE WHEN Number > Case_84_SlipCount THEN 0 WHEN Number = Case_84_SlipCount THEN Case_84 - (Case_84_MaxValue * Number) ELSE Case_84_MaxValue END,

CASE WHEN Number > Case_85_SlipCount THEN 0 WHEN Number = Case_85_SlipCount THEN Case_85 - (Case_85_MaxValue * Number) ELSE Case_85_MaxValue END

FROM

#Temp_Values AS Temp_Values

INNER JOIN Sys_Sequence

ON Sys_Sequence.Number <= Case_30_SlipCount

OR Sys_Sequence.Number <= Case_32_SlipCount

OR Sys_Sequence.Number <= Case_33_SlipCount

OR Sys_Sequence.Number <= Case_34_SlipCount

OR Sys_Sequence.Number <= Case_40_SlipCount

OR Sys_Sequence.Number <= Case_71_SlipCount

OR Sys_Sequence.Number <= Case_84_SlipCount

OR Sys_Sequence.Number <= Case_85_SlipCount

-- Declare the variable for the next part

DECLARE @MultipliedCounter int

SELECT @MultipliedCounter = 0

-- For each of the 8 columns, validate if we need to move the info to another record (max 6 info per record)

WHILE @MultipliedCounter < 8

BEGIN

-- Update the count of info found per record so far

UPDATE #Multiplied_Values

SET

OtherCount = CASE WHEN @MultipliedCounter >= 0 AND Case_30 <> 0 THEN 1 ELSE 0 END+

CASE WHEN @MultipliedCounter >= 1 AND Case_32 <> 0 THEN 1 ELSE 0 END+

CASE WHEN @MultipliedCounter >= 2 AND Case_33 <> 0 THEN 1 ELSE 0 END+

CASE WHEN @MultipliedCounter >= 3 AND Case_34 <> 0 THEN 1 ELSE 0 END+

CASE WHEN @MultipliedCounter >= 4 AND Case_40 <> 0 THEN 1 ELSE 0 END+

CASE WHEN @MultipliedCounter >= 5 AND Case_71 <> 0 THEN 1 ELSE 0 END+

CASE WHEN @MultipliedCounter >= 6 AND Case_84 <> 0 THEN 1 ELSE 0 END+

CASE WHEN @MultipliedCounter >= 7 AND Case_85 <> 0 THEN 1 ELSE 0 END

-- Insert an empty record when we detect we'll need one

INSERT #Multiplied_Values(FederalEmployer, PeopleId, MultiSlipCounter)

SELECT

Multi.FederalEmployer,

Multi.PeopleId,

Multi.MultiSlipCounter+1

FROM

#Multiplied_Values AS Multi

LEFT JOIN #Multiplied_Values AS Next

ON Next.FederalEmployer = Multi.FederalEmployer AND

Next.PeopleId = Multi.PeopleId AND

Next.MultiSlipCounter = Multi.MultiSlipCounter + 1

WHERE

-- Only if there is no next record

Next.PeopleId IS NULL AND

-- And the current record is not only full (at 6), but also has another value to add (7 and greater)

Multi.OtherCount > 6

-- The "lower records" will pull the data from the record above them if they still have room and the "top record" is full

-- At the same time, if a "top record" is full and there's a value, we clear it

UPDATE Multi

SET

Case_30 = CASE @MultipliedCounter WHEN 0 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_30 ELSE 0 END ELSE Multi.Case_30 END,

Case_32 = CASE @MultipliedCounter WHEN 1 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_32 ELSE 0 END ELSE Multi.Case_32 END,

Case_33 = CASE @MultipliedCounter WHEN 2 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_33 ELSE 0 END ELSE Multi.Case_33 END,

Case_34 = CASE @MultipliedCounter WHEN 3 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_34 ELSE 0 END ELSE Multi.Case_34 END,

Case_40 = CASE @MultipliedCounter WHEN 4 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_40 ELSE 0 END ELSE Multi.Case_40 END,

Case_71 = CASE @MultipliedCounter WHEN 5 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_71 ELSE 0 END ELSE Multi.Case_71 END,

Case_84 = CASE @MultipliedCounter WHEN 6 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_84 ELSE 0 END ELSE Multi.Case_84 END,

Case_85 = CASE @MultipliedCounter WHEN 7 THEN CASE WHEN Multi.OtherCount < 6 AND Prev.OtherCount > 6 THEN Prev.Case_85 ELSE 0 END ELSE Multi.Case_85 END

FROM

#Multiplied_Values AS Multi

LEFT JOIN #Multiplied_Values AS Prev

ON Prev.FederalEmployer = Multi.FederalEmployer AND

Prev.PeopleId = Multi.PeopleId AND

Prev.MultiSlipCounter + 1 = Multi.MultiSlipCounter AND

CASE @MultipliedCounter

WHEN 0 THEN Prev.Case_30

WHEN 1 THEN Prev.Case_32

WHEN 2 THEN Prev.Case_33

WHEN 3 THEN Prev.Case_34

WHEN 4 THEN Prev.Case_40

WHEN 5 THEN Prev.Case_71

WHEN 6 THEN Prev.Case_84

WHEN 7 THEN Prev.Case_85

END <> 0

WHERE

-- Only if the current record is empty (to import the data) or full (to clear the data)

NOT (Multi.OtherCount BETWEEN 1 AND 6) OR

-- Else, there must be no data present if we want to import something in this line

CASE @MultipliedCounter

WHEN 0 THEN Multi.Case_30

WHEN 1 THEN Multi.Case_32

WHEN 2 THEN Multi.Case_33

WHEN 3 THEN Multi.Case_34

WHEN 4 THEN Multi.Case_40

WHEN 5 THEN Multi.Case_71

WHEN 6 THEN Multi.Case_84

WHEN 7 THEN Multi.Case_85

END = 0

SELECT @MultipliedCounter = @MultipliedCounter + 1

END

SELECT * FROM #Multiplied_Values

-- Final cleanup

IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL

DROP TABLE #Multiplied_Values

IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL

DROP TABLE #Temp_Values

As a side note, your script has problems splitting values twice over the limit. It just lets them slip by, it seems ;-).

Garadin One Orange Chip Points: 29613 More actions · Answer 8

Jonathan Melo (10/16/2009)
First, Garadin, I'd like to thank you a lot for the effort you're putting in this problem. It looks like you went through a lot of trouble to make that script, and I appreciate it!

It was a fun challenge. At some point if I have a bit more time I'll look at removing the cursor and cleaning this up a lot(Mainly for my own peace of mind, I understand you are already working on an alternate solution). This was a very rough draft, but wanted to throw it out there in case you were still struggling.

Jonathan Melo (10/16/2009)
Now, to the problem itself... the "crazy things for a reason you don't understand" do have a valid reason behind them 😀 This procedure is supposed to gather and format data to send back to the front-end application, so it can generate a file that will be sent to a third party. The file being in XML, it has it's own rules that I must follow, and for design reasons, the application should do the bare minimum (which is, take the recordset and fill the XML file with it, no extra data manipulation).

I assumed there *were* reasons behind it, I was just noting that I didn't know what they were :hehe:. I tend to be a little frivolous in my script headers, especially if I'm not happy with where it's at in terms of performance or design currently. I have a fairly large hatred of cursors, so this one falls firmly into that category =).

Jonathan Melo (10/16/2009)
As for the code so far, I've been working on it on my side since I've posted. I came up with something that works quite well, but doesn't handle every situation it needs to (still having problems when I get over 2 rows generated, the values just won't follow). However, it doesn't make use of any cursor (alright, I cheated juste a little, since I loop to execute some code once per column), no GOTO (this time, no cheating), and as far as I know no RBAR.

I'd definitely be interested to see how they matched up on a larger dataset. The tally join in yours is pretty hefty and seems to me to be massive RBAR. Hidden RBAR in code that looks set based is often more dangerous than just straight out loops, so be careful there =). I can't really test these side by side yet as yours still returns different results than mine, but yours is actually a bit slower on these 6 rows of test data. Being slower on 6 rows doesn't mean that yours won't blow it away on a larger dataset though, which is the normal problem with cursors. Out of curiosity, (I haven't had a chance to examine yours in full detail yet), but are you attempting to handle moving the best piece of data in any given row so that you don't overflow your 6 allotted rows?

Jonathan Melo (10/16/2009)
As a side note, your script has problems splitting values twice over the limit. It just lets them slip by, it seems ;-).

It seems to work properly for me. It does stop at 6 rows though and won't error, it just stops processing. For example, changing c30 to 12999.99 instead of 9999.99 generates 2 5000 rows and 1 2999.99 row in my tests. What test data did you use on mine that is coming out incorrectly?

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jonathan Melo Old Hand Points: 329 More actions · Answer 9

Garadin (10/16/2009)
I assumed there *were* reasons behind it, I was just noting that I didn't know what they were :hehe:. I tend to be a little frivolous in my script headers, especially if I'm not happy with where it's at in terms of performance or design currently. I have a fairly large hatred of cursors, so this one falls firmly into that category =).

I despise cursors myself. When I'm really, really stuck to go row by row, I use a WHILE construct with a key instead. Somewhere, in my head, there's a voice telling me that it's safer and possibly faster to do it this way. Go figure.

Garadin (10/16/2009)
I'd definitely be interested to see how they matched up on a larger dataset. The tally join in yours is pretty hefty and seems to me to be massive RBAR. Hidden RBAR in code that looks set based is often more dangerous than just straight out loops, so be careful there =). I can't really test these side by side yet as yours still returns different results than mine, but yours is actually a bit slower on these 6 rows of test data. Being slower on 6 rows doesn't mean that yours won't blow it away on a larger dataset though, which is the normal problem with cursors.

I was under the impression that joining on the tally table this way was the trick to avoid RBAR, since it only joins one time when needed, but will join twice or more times when I need to duplicate rows. However, in the second part of the script (where I try to apply the "max 6 values per record" rule), I perform a WHILE based on columns, and I increment, only affecting the column I want every time. I suppose, then, that this is actually a switched RBAR, or a CBAC (Column-by-Agonizing-Column) of some sort.

As for the different results, my script uses a FederalEmployer and a PeopleId as a key. I used your test data, but I assigned the keys differently to fit my situation.

Garadin (10/16/2009)
Out of curiosity, (I haven't had a chance to examine yours in full detail yet), but are you attempting to handle moving the best piece of data in any given row so that you don't overflow your 6 allotted rows?

Not quite sure of what you mean exactly, but my script will move data and fill every record to the max before creating a new one, yes. The only problem is, it has trouble to handle moving a multi-record value correctly. When running the script, you'll see the behavior if you look at FederalEmployer 1 and PeopleId 3, column Case_84. The value is correctly split by the code into multiple rows, but since this one has to shift on a new record and it's composed of multiple rows, there's a 5000 that gets lost. This is where I'm currently stuck.

Garadin (10/16/2009)
It seems to work properly for me. It does stop at 6 rows though and won't error, it just stops processing. For example, changing c30 to 12999.99 instead of 9999.99 generates 2 5000 rows and 1 2999.99 row in my tests. What test data did you use on mine that is coming out incorrectly?

Try putting 16999.99 instead, you'll see a 5000, 5000 and a 6999.99 trailing at the end, while it should be 5000, 5000, 5000, 1999.99. 🙂

Garadin One Orange Chip Points: 29613 More actions · Answer 10

Jonathan Melo (10/16/2009)
I despise cursors myself. When I'm really, really stuck to go row by row, I use a WHILE construct with a key instead. Somewhere, in my head, there's a voice telling me that it's safer and possibly faster to do it this way. Go figure.

Eh, for the most part, a loop is a loop IMO. Some RBAR is actually much worse than curors/while loops would ever be and it looks completely innocuous. Triangular Joins, correlated subqueries, cross joins and certain placings of UDF's come to mind as very common culprits that are often overlooked until your database asplodes.

Jonathan Melo (10/16/2009)
I was under the impression that joining on the tally table this way was the trick to avoid RBAR, since it only joins one time when needed, but will join twice or more times when I need to duplicate rows. However, in the second part of the script (where I try to apply the "max 6 values per record" rule), I perform a WHILE based on columns, and I increment, only affecting the column I want every time. I suppose, then, that this is actually a switched RBAR, or a CBAC (Column-by-Agonizing-Column) of some sort.

Tally tables *can* be used to avoid loops, but they don't automatically make the query set based. In my tests, that tally table section is 70% of the total cost. I worry whenever I see <,> or 'OR' in a join. In most cases it may be unfounded concern though. In this particular case, I think the issue is that you're not stating a hard value limiter for Number and it's doing a table scan. Adding a WHERE Number < 50 AND and putting the rest of your join criteria in parens speeds the entire thing up drastically.

Jonathan Melo (10/16/2009)
As for the different results, my script uses a FederalEmployer and a PeopleId as a key. I used your test data, but I assigned the keys differently to fit my situation.

I guess I'm confused on your output. I see where the issue is with chopping off 5000, but where are the other values like 847.55 going?

Jonathan Melo (10/16/2009)
Not quite sure of what you mean exactly, but my script will move data and fill every record to the max before creating a new one, yes. The only problem is, it has trouble to handle moving a multi-record value correctly. When running the script, you'll see the behavior if you look at FederalEmployer 1 and PeopleId 3, column Case_84. The value is correctly split by the code into multiple rows, but since this one has to shift on a new record and it's composed of multiple rows, there's a 5000 that gets lost. This is where I'm currently stuck.

What I mean is this. Say for this example you have a limit of 3 fields per row, a limit of 3 rows and a limit of 20 per field.

If the program was really intelligent, it would handle a situation like this:

Start Data

10101030<-- 4 rows, 4th column has too high a value. Basically need 2 rows to

0055<-- resolve this currently. One to move the whole value to and 1 more to split it.

00100

My Version

1010100

0055<-- Moves the fourth value because the row already has 3.

001030<-- The 30 cannot be split because we are out of rows.

Smarter Version

1001020<-- Moves the 10 in col2 down 1 instead of moving col4, splits col4 in place.

01055

001010

That seems like it complicates this task severely though.

Jonathan Melo (10/16/2009)
Try putting 16999.99 instead, you'll see a 5000, 5000 and a 6999.99 trailing at the end, while it should be 5000, 5000, 5000, 1999.99. 🙂

It only does this if you leave 6000 in the third record as well, at which point it cannot resolve that last split because it would put it at 7 rows. If you remove the 6000, it processes properly because it has enough rows to do so.

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Garadin One Orange Chip Points: 29613 More actions · Answer 11

How's this coming John? Get it all figured out?

Seth Phelabaum

Consistency is only a virtue if you're not a screwup. 😉

Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

Jonathan Melo Old Hand Points: 329 More actions · Answer 12

Yeah, I've been thinking about it over the weekend, and had a flash monday evening in the shower that made me jump to my current version.

As of now, it works. It will split any values larger than the specified limit into multiple records, and will make sure there are never more than 6 values on the same record by moving the extra values to another record (creating it if need be). It also meets my other goals, which means no cursors, no hard limits (besides the hardcode 20, which I'll patch up to be dynamic in the short future), somewhat easy to maintain (e.g. if I have to add a new column, which is the most probable case) and it optimizes the split to create the less possible records.

For the purpose of the stripped-down version I'll show as an example, I used again the values you put into your first script, but I group them by FederalEmployer and PeopleId, which matches my way of handling the data. The results vary a bit from your example, but if you put a unique identifier on every record of your test data (from 1 to 6), they match up (except my script has optimized the split and uses 2 records where yours uses 3).

/*

Note : Sys_Sequence is the tally table. It starts at 0 and goes up. Here's a small script to create it if need be. DON'T FORGET TO CLEAN UP!

CREATE TABLE Sys_Sequence(Number int PRIMARY KEY)

DECLARE @i int

SELECT @i = 0

WHILE @i <= 200

BEGIN

INSERT Sys_Sequence

SELECT @i

SELECT @i = @i + 1

END

*/

-- Cautious pre-run cleanup

IF OBJECT_ID('TempDB..#Temp_Values') IS NOT NULL

DROP TABLE #Temp_Values

IF OBJECT_ID('TempDB..#Multiplied_Values') IS NOT NULL

DROP TABLE #Multiplied_Values

IF OBJECT_ID('TempDB..#Columns') IS NOT NULL

DROP TABLE #Columns

-- Test data table

CREATE TABLE #Temp_Values

(

FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

PeopleId int,

/*Value*/ /*Maximum value for the column*/ /*Number of slips we'll need to write the current value*/

Case_30 money, Case_30_MaxValue money DEFAULT(5000), Case_30_SlipCount AS Floor((Case_30-.01) / Case_30_MaxValue),

Case_32 money, Case_32_MaxValue money DEFAULT(5000), Case_32_SlipCount AS Floor((Case_32-.01) / Case_32_MaxValue),

Case_33 money, Case_33_MaxValue money DEFAULT(5000), Case_33_SlipCount AS Floor((Case_33-.01) / Case_33_MaxValue),

Case_34 money, Case_34_MaxValue money DEFAULT(5000), Case_34_SlipCount AS Floor((Case_34-.01) / Case_34_MaxValue),

Case_40 money, Case_40_MaxValue money DEFAULT(5000), Case_40_SlipCount AS Floor((Case_40-.01) / Case_40_MaxValue),

Case_71 money, Case_71_MaxValue money DEFAULT(5000), Case_71_SlipCount AS Floor((Case_71-.01) / Case_71_MaxValue),

Case_84 money, Case_84_MaxValue money DEFAULT(5000), Case_84_SlipCount AS Floor((Case_84-.01) / Case_84_MaxValue),

Case_85 money, Case_85_MaxValue money DEFAULT(5000), Case_85_SlipCount AS Floor((Case_85-.01) / Case_85_MaxValue),

PRIMARY KEY(FederalEmployer, PeopleId)

)

-- Final data table

CREATE TABLE #Multiplied_Values

(

FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

PeopleId int,

MultiSlipCounter int,

OtherCount int DEFAULT(0),

Case_30 money DEFAULT(0),

Case_32 money DEFAULT(0),

Case_33 money DEFAULT(0),

Case_34 money DEFAULT(0),

Case_40 money DEFAULT(0),

Case_71 money DEFAULT(0),

Case_84 money DEFAULT(0),

Case_85 money DEFAULT(0),

PRIMARY KEY(FederalEmployer, PeopleId, MultiSlipCounter)

)

-- Helper table, to handle each column by People/FederalEmployer, from the column having the highest amount to the lowest

CREATE TABLE #Columns

(

MyKey int IDENTITY(1, 1),

FederalEmployer varchar(1) COLLATE Latin1_General_CI_AI,

PeopleId int,

KeyByPeople int,

Col int,

SlipCount int,

PRIMARY KEY (FederalEmployer, PeopleId, MyKey)

)

-- Index to help performance on larger datasets

CREATE INDEX IX_Temp_Columns_ColByPeople ON #Columns(KeyByPeople, Col)