Distributing the greatest value by rank in SQL

Question

Distributing the greatest value by rank in SQL

lmeinke

SSCommitted

Points: 1956
More actions
December 7, 2012 at 9:28 am

#267890

This seems as though it would be terribly simple...but I am stuck.
I need to distribute as "average sales" value among a group that is categorized by employee size, industry and a category.
A sample of the data:
Category Avg_Sales Emp_Size Industry SalesRank Emp_Rank DesiredOutcome
01-Widgets $576 1 - 4 Accommodations 3 1 $576
01-Widgets $832 5 - 9 Accommodations 1 2 $832
01-Widgets $672 10 - 19 Accommodations 2 3 $832
01-Widgets $451 20 - 49 Accommodations 5 4 $832
01-Widgets$459 50 - 99 Accommodations 4 5 $832
What I need to do is distribute the Avg_Sales value according to the Emp_Rank BUT ensure that the greaterEmp_Rank (i.e. larger company)
never has a smaller Avg_Sales value than a smaller Emp_Rank within the same Category and Industry.
I started this by adding the SalesRank and Emp_Rank columns thinking I could complete this with some case statements and some simple math.....I've hit a wall and any and all help is appreciated.
Thank you in Advance.
Lonnie M

Viewing 15 posts - 1 through 15 (of 40 total)

You must be logged in to reply to this topic. Login to reply

lmeinke SSCommitted Points: 1956 More actions · Answer 1

I apologize for the format in which this question posted. Not only did the spacing between columns not follow when I "posted" my color for the desired column looks like html. Not sure how to correct this.

sgmunson SSC Guru Points: 110640 More actions · Answer 2

I wouldn't worry about the format right away. I need a better explanation of what you actually mean by "distribute". It's often best when posting problems to provide a clear, but detailed description of what data you have, what you want to do with it, AND a good picture of the expected results for a given sample set of data, with specific examples for things that might be exceptions to a given rule. Most importantly, ALL the "rules" that you need to operate with need to be specified.

That said, you appear to want to allocate sales volume across various product and industry categories, but exactly how isn't clear, other than that you need to have your listed rule apply to that allocation. Am I on the right track, or off on some unknown page?

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

lmeinke SSCommitted Points: 1956 More actions · Answer 3

I think you are on the mark with what I am after. The "formatting" thing was supposed to be the clear example you speak of.

The "Desired Column" is the far right column.

I need to distribute the "maximum" average sales amount among the Categories/Industry and Employee sizes. The variable here is employee size. As you can see all employee size's have an "Avg_Sales" value unique to them, I have to distribute these values so that the larger employee size never has a lesser Avg_Sales amount than the previous employee size. (Employee size 5-9 can never have a lesser average sales amount than the employee size 1-4).

Hope this helps.

sgmunson SSC Guru Points: 110640 More actions · Answer 4

And now you've added "maximum average sales amount" to the mix, which you have not defined. My problem is that I don't really know what the average sales numbers you have actually represent, nor do I know how or why they get applied to the various categories, employee-sizes, and industries. This is, no doubt, clear as a bell for you, but as I can't know what the numbers are supposed to mean, you have to explain that in order to have a shot at me knowing what, exactly, you're trying to do.

I know it can be both painful and difficult to have to explain every detail, but without being able to have a clear, specified, defined meaning for the words average, maximum, and distributed, as they apply to your particular situation, the rest might as well be gibberish. One persons maximum might be across a particular set of data that they haven't specifically mentioned, or another's might be across a specfic category of records, and be something that only someone working for the company they work for would know. All too often, the requests for help here contain very vague statements that even if computers could speak English, they might still easily be thrown for the proverbial loop...

Does this help clarify the level of detail needed? Sometimes, two people just clash on communication style...

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

lmeinke SSCommitted Points: 1956 More actions · Answer 5

Thank you again for your response. I'm painfully aware how difficult it can be to communicate, but I appreciate your effort!

I believe a clearly illustrated matrix/spreadsheet would explain it best but I cannot provide this as this software seems to remove all white spaces.

1) Each row of data contains an industry/category/employee size and an "actual" average sales amount.

2) I need to distribute a (what I referred to as "maximum") sales amount to each row of data.

3) The amount I need to distribute may be the row's actual average sales amount or the average sales amount from the smaller employee size (whichever is greater)

4) What cannot happen is that a larger employee size (such as 50-99 employees) would have a smaller average sales amount than a smaller employee size (such as 1-4 employees). If that were to happen (which it does in reality) the larger employee size would assume the average sales amount of the smaller employee size.

5) Example (I'll use .... as placeholders for space to help with readability)

Industry.............Category.....Employees.....Avg_Sales....."DESIRED OUTPUT"

Accomodations....Widgets........1-4..............100..................100

Accomodations....Widgets........5-9..............75...................100

Accomodations....Widgets........10-19...........175.................175

Accomodations....Widgets........20-49...........225.................225

Accomodations....Widgets........50-99...........200.................225

Lowell SSC Guru Points: 323518 More actions · Answer 6

based on the first post, here's the table, but i think that's already been created by an existing select statement;

we need data like this to generate a working query to generate the desired output.

With SomeSampleData (Category,Avg_Sales,Emp_Size,Industry,SalesRank,Emp_Rank )

AS

(

SELECT '01-Widgets',' $576',' 1 - 4',' Accommodations',' 3',' 1' UNION ALL

SELECT '01-Widgets',' $832',' 5 - 9',' Accommodations',' 1',' 2' UNION ALL

SELECT '01-Widgets',' $672',' 10 - 19',' Accommodations',' 2',' 3' UNION ALL

SELECT '01-Widgets',' $451',' 20 - 49',' Accommodations',' 5',' 4' UNION ALL

SELECT '01-Widgets',' $459',' 50 - 99',' Accommodations',' 4',' 5'

)

select * from SomeSampleData

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

lmeinke SSCommitted Points: 1956 More actions · Answer 7

Thank you for taking the time to put into T-SQL. I should have done this to begin with. Hopefully with this and the above post (2 prior) the situation will be understandable.

Jeff Moden SSC Guru Points: 1004356 More actions · Answer 8

lmeinke (12/7/2012)
Thank you for taking the time to put into T-SQL. I should have done this to begin with. Hopefully with this and the above post (2 prior) the situation will be understandable.

This requires ordered data. The only thing I can see that relects the correct order is the combination of Industry, Category, and Employees with the caveat that Employees must be converted to something sortable. For most code, that's going to require a very expensive triangular join to even come close to doing this in a set-based fashion (and, no... Recursive CTEs are not set-based and are frequently slower than a cursor in situations such as this).

My question to you is, can the table be modified to accept one or two persisted calculated columns to make this query lightning quick? If not, then I need to know how many rows the table will have in it so I can test an alternate method for performance.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

lmeinke SSCommitted Points: 1956 More actions · Answer 9

Jeff,

Thank you for your response. Yes, the table can be modified at this point. I started down that road by adding numeric columns representing employee size and avg_sales but am simply at a loss how to "use" them.

lmeinke SSCommitted Points: 1956 More actions · Answer 10

Jeff,

If you have any more insight into this issue I would sure appreciate it.

Thank you,

Lonnie

sgmunson SSC Guru Points: 110640 More actions · Answer 11

With some thought going into how to add some dummy records to the scenario, you might be able to do something like this, and avoid the triangular join, as you appear to already have the employee size ranked.

;WITH SampleData (Category,Avg_Sales,Emp_Size,Industry,SalesRank,Emp_Rank) AS (

SELECT '01-Widgets', 0, '0-0', 'Accommodations',' 0', 0 UNION ALL

SELECT '01-Widgets', 100, '1-4', 'Accommodations',' 3', 1 UNION ALL

SELECT '01-Widgets', 75, '5-9', 'Accommodations',' 1', 2 UNION ALL

SELECT '01-Widgets', 175, '10-19', 'Accommodations',' 2', 3 UNION ALL

SELECT '01-Widgets', 225, '20-49', 'Accommodations',' 5', 4 UNION ALL

SELECT '01-Widgets', 200, '50-99', 'Accommodations',' 4', 5

)

SELECT SD.Industry, SD.Category, SD.Emp_Size,

CASE

WHEN SD0.Avg_Sales > SD.Avg_Sales THEN SD0.Avg_Sales

ELSE SD.Avg_Sales

END AS Avg_Sales

FROM SampleData AS SD

INNER JOIN SampleData AS SD0

ON SD.Industry = SD0.Industry

AND SD.Category = SD0.Category

AND SD.Emp_Rank = SD0.Emp_Rank + 1

WHERE SD.Emp_Rank > 0

Let me know if you think this might work, or if you don't see the concept behind the "zero dummy record".

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)

Dwain Camps SSC Guru Points: 86948 More actions · Answer 12

I'm not sure I really get what you're after but perhaps a Quirky Update will do the job:

CREATE TABLE #SalesByEmployeeCategory

(Category VARCHAR(100)

,Avg_Sales MONEY

,Emp_Size VARCHAR(100)

,Industry VARCHAR(100)

,DesiredOutput MONEY

,EmpRank AS (CAST(LEFT(Emp_Size, CHARINDEX(' ', Emp_Size)) AS INT)) PERSISTED

PRIMARY KEY CLUSTERED (Industry, Category, EmpRank))

INSERT INTO #SalesByEmployeeCategory

(Category, Avg_Sales, Emp_Size, Industry)

SELECT '01-Widgets',' $100','1 - 4','Accommodations' UNION ALL

SELECT '01-Widgets',' $75','5 - 9','Accommodations' UNION ALL

SELECT '01-Widgets',' $175','10 - 19','Accommodations' UNION ALL

SELECT '01-Widgets',' $225','20 - 49','Accommodations' UNION ALL

SELECT '01-Widgets',' $200','50 - 99','Accommodations' UNION ALL

SELECT '02-Furniture',' $100','1 - 4','Manufacturing' UNION ALL

SELECT '02-Furniture',' $75','5 - 9','Manufacturing' UNION ALL

SELECT '02-Furniture',' $175','10 - 19','Manufacturing' UNION ALL

SELECT '02-Furniture',' $225','20 - 49','Manufacturing' UNION ALL

SELECT '02-Furniture',' $200','50 - 99','Manufacturing'

DECLARE @IndCat VARCHAR(200) = ''

,@LastSales MONEY = $0

UPDATE #SalesByEmployeeCategory WITH(TABLOCKX)

SET DesiredOutput=CASE WHEN @LastSales > Avg_Sales AND @IndCat = Industry+Category

THEN @LastSales ELSE Avg_Sales END

,@LastSales=CASE WHEN @IndCat <> Industry+Category THEN 0

WHEN @LastSales > Avg_Sales THEN @LastSales ELSE Avg_Sales END

,@IndCat=Industry+Category

OPTION (MAXDOP 1)

SELECT *

FROM #SalesByEmployeeCategory

DROP TABLE #SalesByEmployeeCategory

If this seems to work for your case and you're not familiar with the QU, I suggest you read about the various rules (which I think I applied right) in this article by Jeff:

http://www.sqlservercentral.com/articles/T-SQL/68467/

My mantra: No loops! No CURSORs! No RBAR! Hoo-uh![/I]
My thought question: Have you ever been told that your query runs too fast?

My advice:
INDEXing a poor-performing query is like putting sugar on cat food. Yeah, it probably tastes better but are you sure you want to eat it?
The path of least resistance can be a slippery slope. Take care that fixing your fixes of fixes doesn't snowball and end up costing you more than fixing the root cause would have in the first place.

Need to UNPIVOT? Why not CROSS APPLY VALUES instead?[/url]
Since random numbers are too important to be left to chance, let's generate some![/url]
Learn to understand recursive CTEs by example.[/url]
[url url=http://www.sqlservercentral.com/articles/St

lmeinke SSCommitted Points: 1956 More actions · Answer 13

Thank you for the code and suggestion. I did run the code and it works for all but the very first entry in an industry/category group. I will look into the code a bit more but wanted to thank you.

Lonnie M

sgmunson SSC Guru Points: 110640 More actions · Answer 14

lmeinke (12/14/2012)
Thank you for the code and suggestion. I did run the code and it works for all but the very first entry in an industry/category group. I will look into the code a bit more but wanted to thank you.
Lonnie M

If you were replying to me, it's because you'd have to have a "dummy" record for each Industry and Category combination, with a 0 value for Emp_Rank and a 0 value for the Avg_Sales. You could create this set of records and then UNION ALL it with the rest before applying my query methodology. Creating it would be a simple GROUP BY with static values for Avg_Sales and Emp_Rank.

Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)