December 6, 2011 at 4:40 pm
I have a query where I need to return every 10th row of a particular group upto a certian amount which is different for each group. Example:
RowID, Color, Salesman
1,blue,bob
2,red,bob
3,purple,bob
4,purple,jane
5,blue,jane
6,red,jane
What I need is every 2nd row of all of the rows but only 1 for bob & 2 for jane. So the result will look like:
2,red,bob
4,purple,jane
6,red,jane
I got as far as the row count (every 2nd), but am stuck at how to limit the amount differently for each group. I appreciate any help.
Thank You In Advance.
JT
December 6, 2011 at 5:09 pm
Jason,
Is there a business reason for this?
Todd Fifield
December 6, 2011 at 5:11 pm
I know... My original question back to the requestor, but yes. What we are doing is an extract of sample data to be used for testing and they want so many of one group to test and more or less of others to test.
December 7, 2011 at 1:44 am
Not sure I understand what you're after. Try these two queries to see if they help.
BEGIN TRAN
--Sample Data
SELECT RowID, Color, Salesman
INTO #sales
FROM (SELECT 1, 'blue', 'bob'
UNION ALL SELECT 2, 'red', 'bob'
UNION ALL SELECT 3, 'purple', 'bob'
UNION ALL SELECT 4, 'purple', 'jane'
UNION ALL SELECT 5, 'blue', 'jane'
UNION ALL SELECT 6, 'red', 'jane') a(RowID, Color, Salesman)
--First go
SELECT RowID, Color, Salesman
FROM #sales
WHERE RowID % 2 = 0
/*--------The above returns--------*RowID Color Salesman
----------- ------ --------
2 red bob
4 purple jane
6 red jane
\*---------------------------------*/
--But you wanted to ensure that there is only ever 1 row for
--bob and 2 for jane
SELECT RowID, Color, Salesman
FROM (SELECT RowID, Color, Salesman,
ROW_NUMBER() OVER (PARTITION BY Salesman ORDER BY RowID DESC) AS rn
FROM (SELECT RowID, Color, Salesman
FROM #sales
WHERE RowID % 2 = 0) innerQ ) outerQ
WHERE rn <= 2
/*--------The above returns--------*RowID Color Salesman
----------- ------ --------
2 red bob
4 purple jane
6 red jane
\*---------------------------------*/
ROLLBACK
December 7, 2011 at 1:53 am
Maybe this one will do?
WITH SampleData AS
(
SELECT * FROM (VALUES
(0, 'blue', 'bob'),
(1, 'red', 'bob'),
(3, 'purple', 'bob'),
(4, 'purple', 'jane'),
(5, 'blue', 'jane'),
(7, 'red', 'jane'),
(8, 'green', 'bob'),
(9, 'yellow', 'bob'),
(10, 'black', 'jane'),
(11, 'white', 'jane')
) D(ID, Color, FirstName)
),
NumberedSampleData AS
(
SELECT
ID, Color, FirstName,
ROW_NUMBER() OVER (ORDER BY ID) RowNum
FROM
SampleData
),
FilteredSampleData AS
(
SELECT
ID, Color, FirstName, RowNum,
ROW_NUMBER() OVER (PARTITION BY FirstName ORDER BY RowNum) FirstNameRowNum
FROM
NumberedSampleData
WHERE
RowNum % 2 = 0
)
SELECT
*
FROM
FilteredSampleData
WHERE
FirstNameRowNum <= CASE FirstName WHEN 'Bob' THEN 1 WHEN 'Jane' THEN 2 ELSE FirstNameRowNum END
ORDER BY
ID
Assumptions:
- The n-th row is determined by the order of ID's
- ID's do not form a consecutive sequence
EDIT: Oops, I missed the column headers in the original sample data, so I used my own one. If the second assumption is not valid, the numbering of rows in NumberedSampleData can be eliminated. Also, if there are a lot of groups to filter on, maybe it's better to put these groups in a separate table together with the number of rows to filter on, rather than hard-coding the number of rows in the WHERE clause.
December 7, 2011 at 2:37 am
If you're looking for a fixed percentage of random rows from each partition (salesperson), have you considered using NTILE?
SELECT
*
FROM (
SELECT *, NTILE(10) OVER(PARTITION BY Salesman ORDER BY NEWID()) AS [Tenth part]
FROM MyTable
) d
WHERE [Tenth part] = 1
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 7, 2011 at 9:01 am
Thank you everyone for the input. I greatly appreciate your time and effort. Each one of the postings gave me a different way to run with this. for the N-th row part of the query I am already using the ROW_NUMBER () OVER function and is working great. But I am going to try performing "UNIONS" with the different criteria. Luckily there is only about 9 groups, but I am dealing with millions of rows and therefore the N-th row must be duplicated for each group. I will post more when I have a solid solution in place.
Thanks again for the help.
JT
Viewing 7 posts - 1 through 6 (of 6 total)
You must be logged in to reply to this topic. Login to reply