March 2, 2016 at 8:07 am
Hey there - I am looking to create a query that will help me pull the top 10 records from two distinct groups in a column.
I have a column called CUSTOMER_ID that has values like 'XYZ-1234' and 'AAA-9423' or 'BBB-1212'. Each of these CUSTOMER_IDs can have tens or hundreds of records. For my purposes, I am only interested in a sample of each CUSTOMER_ID and in my case, I only want the first 10 records for a particular CUSTOMER_ID.
What I would like to do is when I select a particular set of CUSTOMER_IDs, to return the first 10 records for EACH of the ones I have selected. I am not sure how to do this code-wise.
Here is the code I put together but it is not working as expected because the first CUSTOMER_ID is showing up 20 times since it has more records than the other customer_ID.
SELECT TOP 10 *
WHERE CUSTOMER_ID IS IN ('XYZ-1234', 'AAA-9423')
FROM DO.CUSTOMER_tbl
My desired output would have customer 'XYZ-1234' and the top 10 records associated with it and customer 'AAA-9423' and the top 10 records associated with it, for a combined output of 20 records.
March 2, 2016 at 8:18 am
IBeDatMan (3/2/2016)
Hey there - I am looking to create a query that will help me pull the top 10 records from two distinct groups in a column.I have a column called CUSTOMER_ID that has values like 'XYZ-1234' and 'AAA-9423' or 'BBB-1212'. Each of these CUSTOMER_IDs can have tens or hundreds of records. For my purposes, I am only interested in a sample of each CUSTOMER_ID and in my case, I only want the first 10 records for a particular CUSTOMER_ID.
What I would like to do is when I select a particular set of CUSTOMER_IDs, to return the first 10 records for EACH of the ones I have selected. I am not sure how to do this code-wise.
Here is the code I put together but it is not working as expected because the first CUSTOMER_ID is showing up 20 times since it has more records than the other customer_ID.
SELECT TOP 10 *
WHERE CUSTOMER_ID IS IN ('XYZ-1234', 'AAA-9423')
FROM DO.CUSTOMER_tbl
My desired output would have customer 'XYZ-1234' and the top 10 records associated with it and customer 'AAA-9423' and the top 10 records associated with it, for a combined output of 20 records.
For each customer, how are you ordering the results? Is there some sort of date or Id column involved?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
March 2, 2016 at 8:51 am
Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.
March 2, 2016 at 8:52 am
IBeDatMan (3/2/2016)
Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.
So you just want any 10 rows, per customer?
The absence of evidence is not evidence of absence.
Martin Rees
You can lead a horse to water, but a pencil must be lead.
Stan Laurel
March 2, 2016 at 8:52 am
Shot in the dark, since we don't have answers to Phil's questions, but here's a guess. I built this using AdventureWorks2012
SELECT [Name] AS TerritoryName
, t.TerritoryID
, tc.CustomerID
, tc.AccountNumber
FROM Sales.SalesTerritory T
CROSS APPLY (SELECT TOP 3 *
FROM Sales.Customer C
WHERE C.TerritoryID = T.TerritoryID) tc;
You need the TOP VALUES in the CROSS APPLY to get the Top N per group. And the aliases are so you can include columns from the top values query in your outer (Territory) query. Which records get returned by the cross applied query will depend on the ORDER BY clause. If you want totally random records for each territory, you'd do this:
SELECT [Name] AS TerritoryName
, t.TerritoryID
, tc.CustomerID
, tc.AccountNumber
FROM Sales.SalesTerritory T
CROSS APPLY (SELECT TOP 3 *
FROM Sales.Customer C
WHERE C.TerritoryID = T.TerritoryID
ORDER BY NEWID()) tc;
March 2, 2016 at 8:53 am
IBeDatMan (3/2/2016)
Hey Phil - the source data doesn't seem to be in any particular order, just random...although it might be on date but it is not arranged by customer.
I don't think Phil is asking about the order of the source data. (Since it's in a relational table, it does not have an order anyway - relational tables are sets, which are unordered by definition).
What I think Phil asks about, and what I would need to know before I can help you with your problem, is the ordering to apply when determining the top 10 rows for each customer. Top 10 by what criterium?
March 2, 2016 at 9:00 am
Thanks, Hugo.
For the 10 records, I am not looking for these to be ordered, so no critieria would apply. I only need 10 records for the customer regardless of the order. I only need a sample.
March 2, 2016 at 9:06 am
That's what the second query I posted does. NEWID() returns a random number so ordering by it and then doing a TOP returns a random set of records each time you run it.
March 2, 2016 at 9:06 am
Is there another table that has a list of distinct CUSTOMER_ID?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
March 2, 2016 at 9:58 am
If those are the only two customer id's you care about and you don't care about the order.
SELECT TOP 10 * FROM DO.CUSTOMER_tbl WHERE CUSTOMER_ID IS = 'XYZ-1234'
UNION ALL
SELECT TOP 10 * FROM DO.CUSTOMER_tbl WHERE CUSTOMER_ID IS = 'AAA-9423'
March 2, 2016 at 12:37 pm
pietlinden (3/2/2016)
Shot in the dark, since we don't have answers to Phil's questions, but here's a guess. I built this using AdventureWorks2012
SELECT [Name] AS TerritoryName
, t.TerritoryID
, tc.CustomerID
, tc.AccountNumber
FROM Sales.SalesTerritory T
CROSS APPLY (SELECT TOP 3 *
FROM Sales.Customer C
WHERE C.TerritoryID = T.TerritoryID) tc;
You need the TOP VALUES in the CROSS APPLY to get the Top N per group. And the aliases are so you can include columns from the top values query in your outer (Territory) query. Which records get returned by the cross applied query will depend on the ORDER BY clause. If you want totally random records for each territory, you'd do this:
SELECT [Name] AS TerritoryName
, t.TerritoryID
, tc.CustomerID
, tc.AccountNumber
FROM Sales.SalesTerritory T
CROSS APPLY (SELECT TOP 3 *
FROM Sales.Customer C
WHERE C.TerritoryID = T.TerritoryID
ORDER BY NEWID()) tc;
hi pietlinden - I would like to use this code for my purposes but it's unclear how to make it work to my custom needs.
I won't need any specific fields, so in place of selecting certain fields I would use *. Secondly, I need to select specific customers instead of taking the top 3 records for each customer. How could I adapt your code to include that requirement?
March 2, 2016 at 12:44 pm
IBeDatMan (3/2/2016)
I won't need any specific fields, so in place of selecting certain fields I would use *.
No, please don't. That wastes bandwidth between the server and the client, it cripples SQL Server's options to optimize the execution of the query, and it is likely to break the application if anyone ever changes the table structure.
Never, ever, ever, user SELECT * in production. (Except in an EXISTS subquery, and sometimes in a CTE or subquery)
Secondly, I need to select specific customers instead of taking the top 3 records for each customer. How could I adapt your code to include that requirement?
Add a WHERE clause to the outer query.
March 2, 2016 at 12:48 pm
I can't answer your question without some CREATE TABLE scripts. What table are the records related to Customer coming from?
I posted a basic pattern based on AdventureWorks because it's readily available for download, so you could play with the query and compare the structure of it to your database (although there are some times when AW is incredibly confusing... at least to me!)
and Hugo's smarter than I am... best that you should follow his advice...
March 3, 2016 at 1:54 am
The following code is a basic template that will get you what you are looking for.
DECLARE @SampleSize INT = 10;
SELECT *
FROM (
SELECT CUSTOMER_ID
--, Field1, Field2, Field3 ...
, num = ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY NEWID())
FROM DO.CUSTOMER_tbl
WHERE CUSTOMER_ID IN ('XYZ-1234', 'AAA-9423')
) AS src
WHERE src.num <= @SampleSize
ORDER BY CUSTOMER_ID
March 3, 2016 at 8:26 am
DesNorton (3/3/2016)
The following code is a basic template that will get you what you are looking for.
DECLARE @SampleSize INT = 10;
SELECT *
FROM (
SELECT CUSTOMER_ID
--, Field1, Field2, Field3 ...
, num = ROW_NUMBER() OVER (PARTITION BY CUSTOMER_ID ORDER BY NEWID())
FROM DO.CUSTOMER_tbl
WHERE CUSTOMER_ID IN ('XYZ-1234', 'AAA-9423')
) AS src
WHERE src.num <= @SampleSize
ORDER BY CUSTOMER_ID
Thanks for taking a stab at this. I modified your code for my purposes but I get an error: Syntax error, expected something like a name or a Unicode delimited identifier between the word 'num' and '='. Here is my code:
DECLARE @SampleSize INT = 10;
SELECT *
FROM (
SELECT
DXCD1,
CHNL_SRC_CD,
SPECLTY_CLM_DATA_SRC,
num = ROW_NUMBER() OVER (PARTITION BY DXCD1 ORDER BY DXCD1())
FROM CCW_VIEW_PRD.MED_PD_CLM_TRNSMSSN
WHERE DXCD1 IN (IS NULL, '7140')
) AS src
WHERE src.num <= @SampleSize
ORDER BY DXCD1
Viewing 15 posts - 1 through 15 (of 17 total)
You must be logged in to reply to this topic. Login to reply