How to select top 10 records from 2 distinct groups

Question

How to select top 10 records from 2 distinct groups

IBeDatMan

SSC-Addicted

Points: 487
More actions
March 2, 2016 at 8:07 am

#308957

Hey there - I am looking to create a query that will help me pull the top 10 records from two distinct groups in a column.
I have a column called CUSTOMER_ID that has values like 'XYZ-1234' and 'AAA-9423' or 'BBB-1212'. Each of these CUSTOMER_IDs can have tens or hundreds of records. For my purposes, I am only interested in a sample of each CUSTOMER_ID and in my case, I only want the first 10 records for a particular CUSTOMER_ID.
What I would like to do is when I select a particular set of CUSTOMER_IDs, to return the first 10 records for EACH of the ones I have selected. I am not sure how to do this code-wise.
Here is the code I put together but it is not working as expected because the first CUSTOMER_ID is showing up 20 times since it has more records than the other customer_ID.
SELECT TOP 10 *
WHERE CUSTOMER_ID IS IN ('XYZ-1234', 'AAA-9423')
FROM DO.CUSTOMER_tbl
My desired output would have customer 'XYZ-1234' and the top 10 records associated with it and customer 'AAA-9423' and the top 10 records associated with it, for a combined output of 20 records.

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply

Phil Parkin SSC Guru Points: 246993 More actions · Answer 1

IBeDatMan (3/2/2016)
Hey there - I am looking to create a query that will help me pull the top 10 records from two distinct groups in a column.
I have a column called CUSTOMER_ID that has values like 'XYZ-1234' and 'AAA-9423' or 'BBB-1212'. Each of these CUSTOMER_IDs can have tens or hundreds of records. For my purposes, I am only interested in a sample of each CUSTOMER_ID and in my case, I only want the first 10 records for a particular CUSTOMER_ID.
What I would like to do is when I select a particular set of CUSTOMER_IDs, to return the first 10 records for EACH of the ones I have selected. I am not sure how to do this code-wise.
Here is the code I put together but it is not working as expected because the first CUSTOMER_ID is showing up 20 times since it has more records than the other customer_ID.
SELECT TOP 10 *
WHERE CUSTOMER_ID IS IN ('XYZ-1234', 'AAA-9423')
FROM DO.CUSTOMER_tbl
My desired output would have customer 'XYZ-1234' and the top 10 records associated with it and customer 'AAA-9423' and the top 10 records associated with it, for a combined output of 20 records.

For each customer, how are you ordering the results? Is there some sort of date or Id column involved?

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

IBeDatMan SSC-Addicted Points: 487 More actions · Answer 2

Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.

Phil Parkin SSC Guru Points: 246993 More actions · Answer 3

IBeDatMan (3/2/2016)
Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.

So you just want any 10 rows, per customer?

The absence of evidence is not evidence of absence.
Martin Rees

You can lead a horse to water, but a pencil must be lead.
Stan Laurel

pietlinden SSC Guru Points: 63517 More actions · Answer 4

Shot in the dark, since we don't have answers to Phil's questions, but here's a guess. I built this using AdventureWorks2012

SELECT [Name] AS TerritoryName

, t.TerritoryID

, tc.CustomerID

, tc.AccountNumber

FROM Sales.SalesTerritory T

CROSS APPLY (SELECT TOP 3 *

FROM Sales.Customer C

WHERE C.TerritoryID = T.TerritoryID) tc;

You need the TOP VALUES in the CROSS APPLY to get the Top N per group. And the aliases are so you can include columns from the top values query in your outer (Territory) query. Which records get returned by the cross applied query will depend on the ORDER BY clause. If you want totally random records for each territory, you'd do this:

SELECT [Name] AS TerritoryName

, t.TerritoryID

, tc.CustomerID

, tc.AccountNumber

FROM Sales.SalesTerritory T

CROSS APPLY (SELECT TOP 3 *

FROM Sales.Customer C

WHERE C.TerritoryID = T.TerritoryID

ORDER BY NEWID()) tc;

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 5

IBeDatMan (3/2/2016)
Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.

I don't think Phil is asking about the order of the source data. (Since it's in a relational table, it does not have an order anyway - relational tables are sets, which are unordered by definition).

What I think Phil asks about, and what I would need to know before I can help you with your problem, is the ordering to apply when determining the top 10 rows for each customer. Top 10 by what criterium?

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

IBeDatMan SSC-Addicted Points: 487 More actions · Answer 6

Thanks, Hugo.

For the 10 records, I am not looking for these to be ordered, so no critieria would apply. I only need 10 records for the customer regardless of the order. I only need a sample.

pietlinden SSC Guru Points: 63517 More actions · Answer 7

That's what the second query I posted does. NEWID() returns a random number so ordering by it and then doing a TOP returns a random set of records each time you run it.

Gail Shaw SSC Guru Points: 1004504 More actions · Answer 8

Is there another table that has a list of distinct CUSTOMER_ID?

Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

We walk in the dark places no others will enter
We stand on the bridge and no one may pass

ZZartin SSC-Dedicated Points: 30921 More actions · Answer 9

If those are the only two customer id's you care about and you don't care about the order.

SELECT TOP 10 * FROM DO.CUSTOMER_tbl WHERE CUSTOMER_ID IS = 'XYZ-1234'

UNION ALL

SELECT TOP 10 * FROM DO.CUSTOMER_tbl WHERE CUSTOMER_ID IS = 'AAA-9423'

IBeDatMan SSC-Addicted Points: 487 More actions · Answer 10

pietlinden (3/2/2016)
Shot in the dark, since we don't have answers to Phil's questions, but here's a guess. I built this using AdventureWorks2012
SELECT [Name] AS TerritoryName
, t.TerritoryID
, tc.CustomerID
, tc.AccountNumber
FROM Sales.SalesTerritory T
CROSS APPLY (SELECT TOP 3 *
FROM Sales.Customer C
WHERE C.TerritoryID = T.TerritoryID) tc;
You need the TOP VALUES in the CROSS APPLY to get the Top N per group. And the aliases are so you can include columns from the top values query in your outer (Territory) query. Which records get returned by the cross applied query will depend on the ORDER BY clause. If you want totally random records for each territory, you'd do this:
SELECT [Name] AS TerritoryName
, t.TerritoryID
, tc.CustomerID
, tc.AccountNumber
FROM Sales.SalesTerritory T
CROSS APPLY (SELECT TOP 3 *
FROM Sales.Customer C
WHERE C.TerritoryID = T.TerritoryID
ORDER BY NEWID()) tc;

hi pietlinden - I would like to use this code for my purposes but it's unclear how to make it work to my custom needs.

I won't need any specific fields, so in place of selecting certain fields I would use *. Secondly, I need to select specific customers instead of taking the top 3 records for each customer. How could I adapt your code to include that requirement?

Hugo Kornelis SSC Guru Points: 64790 More actions · Answer 11

IBeDatMan (3/2/2016)
I won't need any specific fields, so in place of selecting certain fields I would use *.

No, please don't. That wastes bandwidth between the server and the client, it cripples SQL Server's options to optimize the execution of the query, and it is likely to break the application if anyone ever changes the table structure.

Never, ever, ever, user SELECT * in production. (Except in an EXISTS subquery, and sometimes in a CTE or subquery)

Secondly, I need to select specific customers instead of taking the top 3 records for each customer. How could I adapt your code to include that requirement?

Add a WHERE clause to the outer query.

Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
Visit my SQL Server blog: https://sqlserverfast.com/blog/
SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

pietlinden SSC Guru Points: 63517 More actions · Answer 12

I can't answer your question without some CREATE TABLE scripts. What table are the records related to Customer coming from?

I posted a basic pattern based on AdventureWorks because it's readily available for download, so you could play with the query and compare the structure of it to your database (although there are some times when AW is incredibly confusing... at least to me!)

and Hugo's smarter than I am... best that you should follow his advice...

DesNorton SSC-Insane Points: 24424 More actions · Answer 13

The following code is a basic template that will get you what you are looking for.

DECLARE @SampleSize INT = 10;

SELECT *

FROM (

SELECT CUSTOMER_ID

--, Field1, Field2, Field3 ...

, num = ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY NEWID())

FROM DO.CUSTOMER_tbl

WHERE CUSTOMER_ID IN ('XYZ-1234', 'AAA-9423')

) AS src

WHERE src.num <= @SampleSize

ORDER BY CUSTOMER_ID

How to post data/code on a forum to get the best help.

IBeDatMan SSC-Addicted Points: 487 More actions · Answer 14

DesNorton (3/3/2016)
The following code is a basic template that will get you what you are looking for.
DECLARE @SampleSize INT = 10;
SELECT *
FROM (
SELECT CUSTOMER_ID
--, Field1, Field2, Field3 ...
, num = ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY NEWID())
FROM DO.CUSTOMER_tbl
WHERE CUSTOMER_ID IN ('XYZ-1234', 'AAA-9423')
) AS src
WHERE src.num <= @SampleSize
ORDER BY CUSTOMER_ID

Thanks for taking a stab at this. I modified your code for my purposes but I get an error: Syntax error, expected something like a name or a Unicode delimited identifier between the word 'num' and '='. Here is my code:

DECLARE @SampleSize INT = 10;

SELECT *

FROM (

SELECT

DXCD1,

CHNL_SRC_CD,

SPECLTY_CLM_DATA_SRC,

num = ROW_NUMBER() OVER (PARTITION BY DXCD1 ORDER BY DXCD1())

FROM CCW_VIEW_PRD.MED_PD_CLM_TRNSMSSN

WHERE DXCD1 IN (IS NULL, '7140')

) AS src

WHERE src.num <= @SampleSize

ORDER BY DXCD1