July 26, 2011 at 10:31 am
I have a summary table that I need to extract some additional summarized/grouped data out of.
The table contains the following columns being used:
YYYYMM,
P_TYPE, varchar(50)
P_NBR, varchar(30)
P_DESC, varchar(50)
P_USERS, numeric(9,0)
P_CONTACTS, numeric(9,0)
P_CONTACTS_USER, numeric(5,2) -->calculated value of P_CONTACTS/P_USERS
Within the data, there are multiple entries with the same P_DESC value but different P_NBR values. These need to be grouped in the query results by the P_DESC, not the P_NBR. The P_NBR is a unique number that may share a P_DESC with other records. For example:
YYYYMMP_TYPEP_NBRP_DESCP_USERSP_CONTACTSP_CONTACTS_USER
201106ind fruita1234apples102.20
201106ind fruita4321apples581.60
201106bag fruitb9876oranges74.57
201106ind fruitb7890apples2012.60
In the example above, all the data for the P_DESC of 'apples' would need grouped and summarized as such:
YYYYMMP_TYPEP_DESCP_USERSP_CONTACTSP_CONTACTS_USER
201106ind fruitapples3522.63
I can get it to the point the data gets summarized, but I need to pull in ONLY THE TOP 10 RECORDS PER MONTH based on the P_CONTACTS_USER value (I house 13 months of data so in the end, there will be 130 records returned) AND the summed total number of P_USERS must be at least 25,000 to be included.
When I try to add in row numbers it screws up and starts splitting the data out by P_DESC instead of keeping it grouped by P_DESC.
Here is the query that is working to get it grouped/summed:
SELECT
YYYYMM,
P_TYPE,
P_DESC,
SUM(P_USERS) AS P_USERS,
SUM(P_CONTACTS) AS P_CONTACTS,
CAST(((SUM(P_CONTACTS))/(SUM(P_USERS))) AS NUMERIC(5,2)) AS P_CONTACTS_USER
FROM
t_contacts_by_fruit
GROUP BY
YYYYMM,
P_TYPE,
P_DESC
HAVING
YYYYMM='201106'
and SUM(P_USERS) >=25000
AND P_TYPE IS NOT NULL
ORDER BY
P_CONTACTS_USER DESC
Any and all help will be greatly appreciated. I'm sure there's a better way to do it initially than what I have listed above too.
Thanks very much!
July 26, 2011 at 12:48 pm
Have a look at the RANK() function. Should make it very easy to get the top 10 per month.
Lots of examples and explanation:
http://weblogs.sqlteam.com/jeffs/archive/2007/03/28/60146.aspx
July 26, 2011 at 12:52 pm
Something like this:
;
WITH CTE
AS (SELECT YYYYMM,
P_TYPE,
P_DESC,
SUM(P_USERS) AS P_USERS,
SUM(P_CONTACTS) AS P_CONTACTS,
CAST(((SUM(P_CONTACTS)) / (SUM(P_USERS))) AS NUMERIC(5,
2)) AS P_CONTACTS_USER
FROM t_contacts_by_fruit
GROUP BY YYYYMM,
P_TYPE,
P_DESC
HAVING YYYYMM = '201106'
AND SUM(P_USERS) >= 25000
AND P_TYPE IS NOT NULL),
CTE2
AS (SELECT YYYYMM,
P_TYPE,
P_DESC,
P_USERS,
P_CONTACTS,
P_CONTACTS_USER,
DENSE_RANK() OVER (PARTITION BY YYYYMM ORDER BY P_CONTACTS_USER DESC) AS Row
FROM CTE)
SELECT YYYYMM,
P_TYPE,
P_DESC,
P_USERS,
P_CONTACTS,
P_CONTACTS_USER
FROM CTE2
WHERE Row >= 10 ;
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 26, 2011 at 1:05 pm
GSquared...
That code unfortunately pulls back more than the top 10. I had to change it to the row number <=10 (perhaps I put that wrong in my initial code sample, I'll have to go back and look).
However, even after changing that, the query returned 12 rows because two records had the same P_CONTACTS_USER value (2 had .50 and 2 had .33).
I think that is the problem with ranking versus sorting by the P_CONTACTS_USER value, then partitioning the results by YYYYMM. When I do this with the query I've listed below here, I get 10 records/month for a total of 130 records, they are ranked in order by the P_CONTACTS_USER, BUT...I lose my grouping on the P_DESC and thus my summary numbers based on the P_DESC:
SELECT
YYYYMM,
P_TYPE,
P_DESC,
SUM(P_USERS) AS P_USERS,
SUM(P_CONTACTS) AS P_CONTACTS,
CAST(((SUM(P_CONTACTS))/(SUM(P_USERS))) AS NUMERIC(5,2)) AS P_CONTACTS_USER
FROM
(SELECT *, row_number() OVER (PARTITION BY YYYYMM ORDER BY (CASE when P_USERS >=25000 then 1 else 99999 end),
P_CONTACTS_USER DESC) AS RowNum
FROM t_contacts_by_fruit) AS TEMP
WHERE P_USERS>=25000 AND
RowNum <=10
ORDER BY YYYYMM DESC, P_CONTACTS_USER DESC, P_CONTACTS DESC
July 26, 2011 at 9:39 pm
With my query, yes, I got the comparison operator backwards.
If you want to ignore ties, use Row_Number instead of Dense_Rank. Will cut off otherwise valid records if you have ties, especially in the final slots, but will just give you 10.
If you need it to be deterministic, you'll need to add more criteria to the Row_Number function, so that you get the same ones each time in case of a tie in the final slot.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
July 27, 2011 at 6:34 am
That's where my problem is. I have the ROW_NUMBER function setup to look at the qty being >=25K, and I have it listing in descending order by the P_CONTACTS_USER, so my results are giving me the top 10 rows per month which is good, they're in descending order by the P_CONTACTS_USER which is good, but they're not grouping and summing based on the P_DESC value.
I can't figure out what I'm missing that's causing that loss of grouping/summing.
I do need to use ROW_NUMBER because I don't want to ignore ties.
July 27, 2011 at 7:03 am
Why not just use GSquared's query with RANK instead of DENSE_RANK. That should give you just the 10 rows you want.
What do you mean by you're losing the sum on p_desc?
July 27, 2011 at 7:46 am
Here's what I mean on things getting ungrouped/summed:
YYYYMMP_TYPEP_DESCP_USERSP_CONTACTSP_CONTACTS_USER
201106Bagapples26383684272.59
201106Indoranges42661228330.54
201106Indpears27083145890.54
201106Bagapples14271007552710.53
201106Indlemons3211581617610.5
201106Baggrapes26162115970.44
201106Bagapples14413055874630.41
201106Indoranges42082166600.4
201106Bagapples6465532514450.39
201106Bagapples38655146360.38
That is a sample of the results I'm getting back. See how the apples and oranges are getting split out into multiple rows? They should all be grouped/summed on one row (one row for apples, one for oranges) as shown below (with the exception that 5 additional rows would be included since I need the top 10 rows):
YYYYMMP_TYPEP_DESCP_USERSP_CONTACTSP_CONTACTS_USER
201106Indpears27083145890.54
201106Indlemons3211581617610.50
201106Indoranges4320632128510.49
201106Bagapples357999616772420.47
201106Baggrapes26162115970.44
See how all the 5 rows of apples and two rows of oranges would be combined into 2 summarized rows (1 each) which then changes the overall ranking because the combined P_CONTACTS_USER value drops in these cases.
This is what is failing for me.
July 27, 2011 at 8:18 am
So, working with GSquared's query as a basis, shouldn't this give you what you want?
;WITH CTE
AS (SELECT YYYYMM,
P_TYPE,
P_DESC,
SUM(P_USERS) AS P_USERS,
SUM(P_CONTACTS) AS P_CONTACTS,
CAST(((SUM(P_CONTACTS)) / (SUM(P_USERS))) AS NUMERIC(5,
2)) AS P_CONTACTS_USER
FROM t_contacts_by_fruit
GROUP BY YYYYMM,
P_TYPE,
P_DESC
HAVING
SUM(P_USERS) >= 25000
AND P_TYPE IS NOT NULL),
CTE2
AS (SELECT YYYYMM,
P_TYPE,
P_DESC,
P_USERS,
P_CONTACTS,
P_CONTACTS_USER,
RANK() OVER (PARTITION BY YYYYMM ORDER BY P_CONTACTS_USER DESC) AS Row
FROM CTE)
SELECT YYYYMM,
P_TYPE,
P_DESC,
P_USERS,
P_CONTACTS,
P_CONTACTS_USER
FROM CTE2
WHERE Row <= 10
ORDER BY YYYYMM desc
July 27, 2011 at 8:18 am
Okay, now I tried using RANK with GSquared's query and removed my YYYYMM limitation of just 201106. It looked good, but I got 131 rows back...10 for every month except 201102 which returned 11 rows for some reason - I couldn't see anything obvious in the results that caused it to do that.
So I tried using GSquared's query and replaced the DENSE_RANK with ROW_NUMBER and it returned a total of 130 rows, including just 10 for 201102.
These results look promising, but I'm going to need to review them closer to make sure they're correct.
Any reason using ROW_NUMBER in place of DENSE_RANK would not work with this query?
July 27, 2011 at 8:25 am
No. Since you definitely only want 10 rows, regardless of ties, I'd go with ROW_NUMBER.
July 27, 2011 at 9:22 am
Cool, thank you both, this looks like it's working!
July 28, 2011 at 6:21 am
You're welcome.
- Gus "GSquared", RSVP, OODA, MAP, NMVP, FAQ, SAT, SQL, DNA, RNA, UOI, IOU, AM, PM, AD, BC, BCE, USA, UN, CF, ROFL, LOL, ETC
Property of The Thread
"Nobody knows the age of the human race, but everyone agrees it's old enough to know better." - Anon
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply