Advnaced Grouping Help Needed

Question

Post reply

Advnaced Grouping Help Needed

Jason Tontz

SSCommitted

Points: 1957
More actions
December 6, 2011 at 4:40 pm

#261073

I have a query where I need to return every 10th row of a particular group upto a certian amount which is different for each group. Example:
RowID, Color, Salesman
1,blue,bob
2,red,bob
3,purple,bob
4,purple,jane
5,blue,jane
6,red,jane
What I need is every 2nd row of all of the rows but only 1 for bob & 2 for jane. So the result will look like:
2,red,bob
4,purple,jane
6,red,jane
I got as far as the row count (every 2nd), but am stuck at how to limit the amount differently for each group. I appreciate any help.
Thank You In Advance.
JT

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply

tfifield SSCrazy Eights Points: 9655 More actions · Answer 1

Jason,

Is there a business reason for this?

Todd Fifield

Jason Tontz SSCommitted Points: 1957 More actions · Answer 2

I know... My original question back to the requestor, but yes. What we are doing is an extract of sample data to be used for testing and they want so many of one group to test and more or less of others to test.

Cadavre SSC-Forever Points: 41690 More actions · Answer 3

Not sure I understand what you're after. Try these two queries to see if they help.

BEGIN TRAN

--Sample Data

SELECT RowID, Color, Salesman

INTO #sales

FROM (SELECT 1, 'blue', 'bob'

UNION ALL SELECT 2, 'red', 'bob'

UNION ALL SELECT 3, 'purple', 'bob'

UNION ALL SELECT 4, 'purple', 'jane'

UNION ALL SELECT 5, 'blue', 'jane'

UNION ALL SELECT 6, 'red', 'jane') a(RowID, Color, Salesman)

--First go

SELECT RowID, Color, Salesman

FROM #sales

WHERE RowID % 2 = 0

/*--------The above returns--------*RowID Color Salesman

----------- ------ --------

2 red bob

4 purple jane

6 red jane

\*---------------------------------*/

--But you wanted to ensure that there is only ever 1 row for

--bob and 2 for jane

SELECT RowID, Color, Salesman

FROM (SELECT RowID, Color, Salesman,

ROW_NUMBER() OVER (PARTITION BY Salesman ORDER BY RowID DESC) AS rn

FROM (SELECT RowID, Color, Salesman

FROM #sales

WHERE RowID % 2 = 0) innerQ ) outerQ

WHERE rn <= 2

/*--------The above returns--------*RowID Color Salesman

----------- ------ --------

2 red bob

4 purple jane

6 red jane

\*---------------------------------*/

ROLLBACK

Forever trying to learn
My blog - http://www.cadavre.co.uk/
For better, quicker answers on T-SQL questions, click on the following...http://www.sqlservercentral.com/articles/Best+Practices/61537/
For better, quicker answers on SQL Server performance related questions, click on the following...http://www.sqlservercentral.com/articles/SQLServerCentral/66909/

matzer SSCrazy Eights Points: 9727 More actions · Answer 4

Maybe this one will do?

WITH SampleData AS

(

SELECT * FROM (VALUES

(0, 'blue', 'bob'),

(1, 'red', 'bob'),

(3, 'purple', 'bob'),

(4, 'purple', 'jane'),

(5, 'blue', 'jane'),

(7, 'red', 'jane'),

(8, 'green', 'bob'),

(9, 'yellow', 'bob'),

(10, 'black', 'jane'),

(11, 'white', 'jane')

) D(ID, Color, FirstName)

),

NumberedSampleData AS

(

SELECT

ID, Color, FirstName,

ROW_NUMBER() OVER (ORDER BY ID) RowNum

FROM

SampleData

),

FilteredSampleData AS

(

SELECT

ID, Color, FirstName, RowNum,

ROW_NUMBER() OVER (PARTITION BY FirstName ORDER BY RowNum) FirstNameRowNum

FROM

NumberedSampleData

WHERE

RowNum % 2 = 0

)

SELECT

*

FROM

FilteredSampleData

WHERE

FirstNameRowNum <= CASE FirstName WHEN 'Bob' THEN 1 WHEN 'Jane' THEN 2 ELSE FirstNameRowNum END

ORDER BY

ID

Assumptions:

- The n-th row is determined by the order of ID's

- ID's do not form a consecutive sequence

EDIT: Oops, I missed the column headers in the original sample data, so I used my own one. If the second assumption is not valid, the numbering of rows in NumberedSampleData can be eliminated. Also, if there are a lot of groups to filter on, maybe it's better to put these groups in a separate table together with the number of rows to filter on, rather than hard-coding the number of rows in the WHERE clause.

ChrisM@Work SSC Guru Points: 186127 More actions · Answer 5

If you're looking for a fixed percentage of random rows from each partition (salesperson), have you considered using NTILE?

SELECT

*

FROM (

SELECT *, NTILE(10) OVER(PARTITION BY Salesman ORDER BY NEWID()) AS [Tenth part]

FROM MyTable

) d

WHERE [Tenth part] = 1

^{“Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw}

For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

Jason Tontz SSCommitted Points: 1957 More actions · Answer 6

Thank you everyone for the input. I greatly appreciate your time and effort. Each one of the postings gave me a different way to run with this. for the N-th row part of the query I am already using the ROW_NUMBER () OVER function and is working great. But I am going to try performing "UNIONS" with the different criteria. Luckily there is only about 9 groups, but I am dealing with millions of rows and therefore the N-th row must be duplicated for each group. I will post more when I have a solid solution in place.

Thanks again for the help.

JT