How to select top 10 records from 2 distinct groups

  • Hey there - I am looking to create a query that will help me pull the top 10 records from two distinct groups in a column.

    I have a column called CUSTOMER_ID that has values like 'XYZ-1234' and 'AAA-9423' or 'BBB-1212'. Each of these CUSTOMER_IDs can have tens or hundreds of records. For my purposes, I am only interested in a sample of each CUSTOMER_ID and in my case, I only want the first 10 records for a particular CUSTOMER_ID.

    What I would like to do is when I select a particular set of CUSTOMER_IDs, to return the first 10 records for EACH of the ones I have selected. I am not sure how to do this code-wise.

    Here is the code I put together but it is not working as expected because the first CUSTOMER_ID is showing up 20 times since it has more records than the other customer_ID.

    SELECT TOP 10 *

    WHERE CUSTOMER_ID IS IN ('XYZ-1234', 'AAA-9423')

    FROM DO.CUSTOMER_tbl

    My desired output would have customer 'XYZ-1234' and the top 10 records associated with it and customer 'AAA-9423' and the top 10 records associated with it, for a combined output of 20 records.

  • IBeDatMan (3/2/2016)


    Hey there - I am looking to create a query that will help me pull the top 10 records from two distinct groups in a column.

    I have a column called CUSTOMER_ID that has values like 'XYZ-1234' and 'AAA-9423' or 'BBB-1212'. Each of these CUSTOMER_IDs can have tens or hundreds of records. For my purposes, I am only interested in a sample of each CUSTOMER_ID and in my case, I only want the first 10 records for a particular CUSTOMER_ID.

    What I would like to do is when I select a particular set of CUSTOMER_IDs, to return the first 10 records for EACH of the ones I have selected. I am not sure how to do this code-wise.

    Here is the code I put together but it is not working as expected because the first CUSTOMER_ID is showing up 20 times since it has more records than the other customer_ID.

    SELECT TOP 10 *

    WHERE CUSTOMER_ID IS IN ('XYZ-1234', 'AAA-9423')

    FROM DO.CUSTOMER_tbl

    My desired output would have customer 'XYZ-1234' and the top 10 records associated with it and customer 'AAA-9423' and the top 10 records associated with it, for a combined output of 20 records.

    For each customer, how are you ordering the results? Is there some sort of date or Id column involved?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.

  • IBeDatMan (3/2/2016)


    Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.

    So you just want any 10 rows, per customer?

    The absence of evidence is not evidence of absence.
    Martin Rees

    You can lead a horse to water, but a pencil must be lead.
    Stan Laurel

  • Shot in the dark, since we don't have answers to Phil's questions, but here's a guess. I built this using AdventureWorks2012

    SELECT [Name] AS TerritoryName

    , t.TerritoryID

    , tc.CustomerID

    , tc.AccountNumber

    FROM Sales.SalesTerritory T

    CROSS APPLY (SELECT TOP 3 *

    FROM Sales.Customer C

    WHERE C.TerritoryID = T.TerritoryID) tc;

    You need the TOP VALUES in the CROSS APPLY to get the Top N per group. And the aliases are so you can include columns from the top values query in your outer (Territory) query. Which records get returned by the cross applied query will depend on the ORDER BY clause. If you want totally random records for each territory, you'd do this:

    SELECT [Name] AS TerritoryName

    , t.TerritoryID

    , tc.CustomerID

    , tc.AccountNumber

    FROM Sales.SalesTerritory T

    CROSS APPLY (SELECT TOP 3 *

    FROM Sales.Customer C

    WHERE C.TerritoryID = T.TerritoryID

    ORDER BY NEWID()) tc;

  • IBeDatMan (3/2/2016)


    Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.

    I don't think Phil is asking about the order of the source data. (Since it's in a relational table, it does not have an order anyway - relational tables are sets, which are unordered by definition).

    What I think Phil asks about, and what I would need to know before I can help you with your problem, is the ordering to apply when determining the top 10 rows for each customer. Top 10 by what criterium?


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • Thanks, Hugo.

    For the 10 records, I am not looking for these to be ordered, so no critieria would apply. I only need 10 records for the customer regardless of the order. I only need a sample.

  • That's what the second query I posted does. NEWID() returns a random number so ordering by it and then doing a TOP returns a random set of records each time you run it.

  • Is there another table that has a list of distinct CUSTOMER_ID?

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • If those are the only two customer id's you care about and you don't care about the order.

    SELECT TOP 10 * FROM DO.CUSTOMER_tbl WHERE CUSTOMER_ID IS = 'XYZ-1234'

    UNION ALL

    SELECT TOP 10 * FROM DO.CUSTOMER_tbl WHERE CUSTOMER_ID IS = 'AAA-9423'

  • pietlinden (3/2/2016)


    Shot in the dark, since we don't have answers to Phil's questions, but here's a guess. I built this using AdventureWorks2012

    SELECT [Name] AS TerritoryName

    , t.TerritoryID

    , tc.CustomerID

    , tc.AccountNumber

    FROM Sales.SalesTerritory T

    CROSS APPLY (SELECT TOP 3 *

    FROM Sales.Customer C

    WHERE C.TerritoryID = T.TerritoryID) tc;

    You need the TOP VALUES in the CROSS APPLY to get the Top N per group. And the aliases are so you can include columns from the top values query in your outer (Territory) query. Which records get returned by the cross applied query will depend on the ORDER BY clause. If you want totally random records for each territory, you'd do this:

    SELECT [Name] AS TerritoryName

    , t.TerritoryID

    , tc.CustomerID

    , tc.AccountNumber

    FROM Sales.SalesTerritory T

    CROSS APPLY (SELECT TOP 3 *

    FROM Sales.Customer C

    WHERE C.TerritoryID = T.TerritoryID

    ORDER BY NEWID()) tc;

    hi pietlinden - I would like to use this code for my purposes but it's unclear how to make it work to my custom needs.

    I won't need any specific fields, so in place of selecting certain fields I would use *. Secondly, I need to select specific customers instead of taking the top 3 records for each customer. How could I adapt your code to include that requirement?

  • IBeDatMan (3/2/2016)


    I won't need any specific fields, so in place of selecting certain fields I would use *.

    No, please don't. That wastes bandwidth between the server and the client, it cripples SQL Server's options to optimize the execution of the query, and it is likely to break the application if anyone ever changes the table structure.

    Never, ever, ever, user SELECT * in production. (Except in an EXISTS subquery, and sometimes in a CTE or subquery)

    Secondly, I need to select specific customers instead of taking the top 3 records for each customer. How could I adapt your code to include that requirement?

    Add a WHERE clause to the outer query.


    Hugo Kornelis, SQL Server/Data Platform MVP (2006-2016)
    Visit my SQL Server blog: https://sqlserverfast.com/blog/
    SQL Server Execution Plan Reference: https://sqlserverfast.com/epr/

  • I can't answer your question without some CREATE TABLE scripts. What table are the records related to Customer coming from?

    I posted a basic pattern based on AdventureWorks because it's readily available for download, so you could play with the query and compare the structure of it to your database (although there are some times when AW is incredibly confusing... at least to me!)

    and Hugo's smarter than I am... best that you should follow his advice...

  • The following code is a basic template that will get you what you are looking for.

    DECLARE @SampleSize INT = 10;

    SELECT *

    FROM (

    SELECT CUSTOMER_ID

    --, Field1, Field2, Field3 ...

    , num = ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY NEWID())

    FROM DO.CUSTOMER_tbl

    WHERE CUSTOMER_ID IN ('XYZ-1234', 'AAA-9423')

    ) AS src

    WHERE src.num <= @SampleSize

    ORDER BY CUSTOMER_ID

  • DesNorton (3/3/2016)


    The following code is a basic template that will get you what you are looking for.

    DECLARE @SampleSize INT = 10;

    SELECT *

    FROM (

    SELECT CUSTOMER_ID

    --, Field1, Field2, Field3 ...

    , num = ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY NEWID())

    FROM DO.CUSTOMER_tbl

    WHERE CUSTOMER_ID IN ('XYZ-1234', 'AAA-9423')

    ) AS src

    WHERE src.num <= @SampleSize

    ORDER BY CUSTOMER_ID

    Thanks for taking a stab at this. I modified your code for my purposes but I get an error: Syntax error, expected something like a name or a Unicode delimited identifier between the word 'num' and '='. Here is my code:

    DECLARE @SampleSize INT = 10;

    SELECT *

    FROM (

    SELECT

    DXCD1,

    CHNL_SRC_CD,

    SPECLTY_CLM_DATA_SRC,

    num = ROW_NUMBER() OVER (PARTITION BY DXCD1 ORDER BY DXCD1())

    FROM CCW_VIEW_PRD.MED_PD_CLM_TRNSMSSN

    WHERE DXCD1 IN (IS NULL, '7140')

    ) AS src

    WHERE src.num <= @SampleSize

    ORDER BY DXCD1

Viewing 15 posts - 1 through 15 (of 17 total)

You must be logged in to reply to this topic. Login to reply