March 21, 2014 at 9:01 am
I tried searching for this but I must be using the wrong search terms. What I need to do it select the top 80 percent of records per group based on the group total. To be clear I am not trying to just grab the top x percent of rows. My example below might be better than me trying to explain 🙂
Table 1:
DealerID RepairID
1 1
1 2
1 3
1 4
1 5
1 6
1 7
1 8
1 9
1 10
2 11
2 12
2 13
Table 2:
RepairID Tire
1 225/65R17 102T
2 225/65R17 102T
3 225/65R17 102T
4 235/60R18 102V
5 235/60R18 102V
6 235/60R18 102V
7 235/60R18 102V
8 205/55R16 89H
9 205/70R15 89H
13 225/65R17 102T
Table 1 has the total number of repair orders per dealer. This can be obtained by simply grouping on DealerID and counting the number of RepairIDs.
Table 2 has information on some of the repair orders and it is needed to select the top 80% of tire sizes. This table will be joined to Table 1 and grouped by DealerID and Tire.
Desired Output:
DealerIDTire RepairsOfThisTireRepairCount PercentOfTotalRepairOrders
1 235/60R18 102V4 10 40
1 225/65R17 102T3 10 30
1 205/55R16 89H1 10 10
2 225/65R17 102T1 3 33
The equation I am given to calculate the top tires per dealer is as follows: Total # of tires for the size / Total # of repair orders per dealer.
Here is what I have so far though I think I might have to rewrite all of it.
SELECT
DealerID ,
COUNT(RepairID)
INTO #TotalDealerRepairOrders
FROM
dbo.Table1
GROUP BY
DealerID
SELECT
DealerID,
Tire,
COUNT(RepairID) AS 'Tire Count'
INTO
#tempCounts
FROM
dbo.Table1 INNER JOIN
dbo.Table2 ON Table1.RepairID = Table2.RepairID
GROUP BY
Table1.DealerID, Tire
SELECT
tc.DealerId ,
tc.Tire,
CONVERT(DECIMAL(18,4), tc.[Tire Count]) / tro.NumberROs * 100 AS 'Percent'
FROM
#tempCounts tc INNER JOIN
#TotalDealerRepairOrders tro ON tro.DealerID = tc.DealerId
GROUP BY
tc.DealerID, tc.Tire, tc.[Tire Count], tro.NumberROs
HAVING
SUM(tc.[Tire Count]) <= .8 * tro.NumberROs
ORDER BY
tc.[Tire Count] DESC
DROP TABLE #TotalDealerRepairOrders
DROP TABLE #tempCounts
I feel like I should be using the Partition function. Thanks for your help.
March 21, 2014 at 9:31 am
What do you mean by "Total # of tires for the size"?
Here's your data set up in a consumable format:
DROP TABLE #Table1
CREATE TABLE #Table1 (DealerID INT, RepairID INT)
INSERT INTO #Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13)
DROP TABLE #Table2
CREATE TABLE #Table2 (RepairID INT, Tire VARCHAR(15))
INSERT INTO #Table2 VALUES (1,'225/65R17 102T'),
(2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),
(8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T')
Here's a query built upon your data which may go some way towards what you're trying to do:
SELECT DealerID, Tire, RepairsOfThisTyre, RepairCount = SUM(RepairsOfThisTyre) OVER(PARTITION BY DealerID)
FROM (
SELECT DealerID, Tire, RepairsOfThisTyre = COUNT(*)
FROM #Table1 d
INNER JOIN #Table2 r ON r.RepairID = d.RepairID
GROUP BY DealerID, Tire
) d
ORDER BY DealerID, RepairsOfThisTyre DESC
How close is this to your requested result?
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 21, 2014 at 10:05 am
It is close to what I wrote in my example.
Your query returns this:
DealerIDTire RepairsOfThisTyreRepairCount
1 235/60R18 102V 4 9
1 225/65R17 102T 3 9
1 205/55R16 89H 1 9
1 205/70R15 89H 1 9
2 225/65R17 102T 1 1
I need it to return this:
DealerIDTire RepairsOfThisTyreRepairCount PercentOfTotalRepairOrders
1 235/60R18 102V4 10 40
1 225/65R17 102T3 10 30
1 205/55R16 89H1 10 10
2 225/65R17 102T1 3 33
The RepairCount is the total number of repair orders in Table 1. The record with DealerID 1 and Tire 205/70R15 89H is not included since the 3 records above it make up the top 80%.
March 21, 2014 at 10:46 am
Gotcha - think you're looking for this:
;WITH
Preagg AS (
SELECT DealerID, Tire, RepairsOfThisTyre,
RepairsForThisDealer = SUM(RepairsOfThisTyre) OVER(PARTITION BY DealerID)
FROM (
SELECT DealerID, Tire,
RepairsOfThisTyre = COUNT(*)
FROM #Table1 d
INNER JOIN #Table2 r ON r.RepairID = d.RepairID
GROUP BY DealerID, Tire
) d
),
FinalAgg AS (
SELECT DealerID, Tire, RepairsOfThisTyre, RepairsForThisDealer,
x.PCT,
q = SUM(x.PCT) OVER(PARTITION BY DealerID ORDER BY x.PCT DESC, Tire),
rn = ROW_NUMBER() OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTyre DESC)
FROM Preagg
CROSS APPLY (SELECT PCT = (RepairsOfThisTyre*100.00)/RepairsForThisDealer) x
)
SELECT f.*
FROM FinalAgg f
CROSS APPLY (
SELECT TOP 1 *
FROM FinalAgg fi
WHERE fi.DealerID = f.DealerID
AND q >= 80
ORDER BY rn
) x
WHERE f.rn <= x.rn
ORDER BY f.DealerID, f.RepairsOfThisTyre DESC
Give it a shot. It's highly likely that it can be improved.
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
March 21, 2014 at 11:43 am
The Preagg query is just counting the total number or records per dealer in table 2.
I switched the inner join to a left outer join and got some crazy results. The only thing I can think of to correct it is to do this:
SELECT
DealerId,
COUNT(RepairID) AS 'RepairsForThisDealer'
INTO
#temp
FROM
#Table1
GROUP BY
DealerId
;WITH
Preagg AS (
SELECT
d.DealerID,
Tire,
COUNT(r.RepairID) AS RepairsOfThisTyre,
RepairsForThisDealer
FROM
#Table1 d
INNER JOIN #Table2 r ON r.RepairID = d.RepairID
INNER JOIN #temp t ON t.DealerId = d.DealerId
GROUP BY d.DealerID, Tire, RepairsForThisDealer
),
FinalAgg AS (
SELECT DealerID, Tire, RepairsOfThisTyre, RepairsForThisDealer,
x.PCT,
q = SUM(x.PCT) OVER(PARTITION BY DealerID ORDER BY x.PCT DESC, Tire),
rn = ROW_NUMBER() OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTyre DESC)
FROM Preagg
CROSS APPLY (SELECT PCT = (RepairsOfThisTyre*100.00)/RepairsForThisDealer) x
)
SELECT f.*
FROM FinalAgg f
CROSS APPLY (
SELECT TOP 1 *
FROM FinalAgg fi
WHERE fi.DealerID = f.DealerID
AND q >= 80
ORDER BY rn
) x
WHERE f.rn <= x.rn
ORDER BY f.DealerID, f.RepairsOfThisTyre DESC
I get the correct percents, but now it doesn't show dealer 2.
March 21, 2014 at 3:02 pm
I also posted this question in StackOverflow. Here is the solution they came up with:
select DealerId, Tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.Tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from table1 t1 left join
table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, Tire
order by 1, 2;
March 21, 2014 at 3:11 pm
Just an FYI, you shouldn't order by ordinal values. You really should specify the column names you are ordering by. If I remember correctly ordering by ordinal values is being deprecated.
March 21, 2014 at 3:13 pm
Thanks for the tip. I will make sure to change it in my query.
March 21, 2014 at 3:21 pm
Curious, is this what you actually want? I added a third dealer with only one order for one tire. This dealer does not show up in the results set.
CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);
INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14);
CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));
INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),
(2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),
(8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),
(14,'235/60R18 102V');
select * from dbo.Table1;
select * from dbo.Table2;
select DealerId, tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from dbo.Table1 t1 left join
dbo.Table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, tire
order by DealerId, tire, RepairsOfThisTire desc;
go
DROP TABLE dbo.Table1;
DROP TABLE dbo.Table2;
March 21, 2014 at 3:27 pm
Added two more orders for the same tire and Dealer 3 shows up but only reports the tire as 66% of the total when in actuality it is 100%.
CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);
INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16);
CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));
INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),
(2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),
(8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),
(14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');
select * from dbo.Table1;
select * from dbo.Table2;
select DealerId, tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from dbo.Table1 t1 left join
dbo.Table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, tire
order by DealerId, tire, RepairsOfThisTire desc;
go
DROP TABLE dbo.Table1;
DROP TABLE dbo.Table2;
March 21, 2014 at 3:44 pm
Lynn Pettis (3/21/2014)
Added two more orders for the same tire and Dealer 3 shows up but only reports the tire as 66% of the total when in actuality it is 100%.
Hmm...Interesting. I think you stumbled on a test case that we did not think of. Namely when all the orders are the same tire size. It is not likely going to ever happen but it is good to cover all bases.
The query is supposed to go through Table 1 grouping by dealerID and counting up all a dealers orders. This number will be the denominator in the calculation. It then looks at table 2 grouping by dealerID and by tire and dividing the count by the total orders in table 1 to get the % of the total orders. Once I have a % calculated I then need to only select the tires that make up 80% of the dealers total count, everything else is left out.
March 21, 2014 at 3:51 pm
Added another dealer with 10 orders, 9 for one timre and 1 for a different tire. The new dealer shows up, but again the count of the Repairs for the tire shows as 8 and 80% when actually it is 9 and 90%.
CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);
INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16),
(4,17),(4,18),(4,19),(4,20),(4,21),(4,22),(4,23),(4,24),(4,25),(4,26);
CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));
INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),
(2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),
(8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),
(17,'235/60R18 102V'),(18,'235/60R18 102V'),(19,'235/60R18 102V'),
(20,'235/60R18 102V'),(21,'235/60R18 102V'),(22,'235/60R18 102V'),
(23,'235/60R18 102V'),(24,'235/60R18 102V'),(25,'235/60R18 102V'),
(26,'225/65R17 102T');
select * from dbo.Table1;
select * from dbo.Table2;
select DealerId, tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from dbo.Table1 t1 left join
dbo.Table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, tire
order by DealerId, tire, RepairsOfThisTire desc;
go
DROP TABLE dbo.Table1;
DROP TABLE dbo.Table2;
March 21, 2014 at 4:03 pm
Modified the order by so that it listed output in descending order by RepairsOfThisTire by DealerId, which what you original indicated what you also wanted.
CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);
INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16),
(4,17),(4,18),(4,19),(4,20),(4,21),(4,22),(4,23),(4,24),(4,25),(4,26);
CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));
INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),
(2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),
(8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),
(14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V'),
(17,'235/60R18 102V'),(18,'235/60R18 102V'),(19,'235/60R18 102V'),
(20,'235/60R18 102V'),(21,'235/60R18 102V'),(22,'235/60R18 102V'),
(23,'235/60R18 102V'),(24,'235/60R18 102V'),(25,'235/60R18 102V'),
(26,'225/65R17 102T');
select * from dbo.Table1;
select * from dbo.Table2;
select DealerId, tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from dbo.Table1 t1 left join
dbo.Table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, tire
order by DealerId, RepairsOfThisTire desc, tire;
go
DROP TABLE dbo.Table1;
DROP TABLE dbo.Table2;
March 22, 2014 at 3:33 am
tfendt (3/21/2014)
The Preagg query is just counting the total number or records per dealer in table 2....
It also counts the total number of repairs per dealer.
;WITH
Preagg AS (
SELECT DealerID, Tire, RepairsOfThisTyre,
RepairsForThisDealer = SUM(RepairsOfThisTyre) OVER(PARTITION BY DealerID)
FROM (
SELECT DealerID, Tire,
RepairsOfThisTyre = COUNT(*)
FROM Table1 d
INNER JOIN Table2 r ON r.RepairID = d.RepairID
GROUP BY DealerID, Tire
) d
) SELECT * FROM Preagg
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
March 23, 2014 at 6:50 am
tfendt (3/21/2014)
I also posted this question in StackOverflow. Here is the solution:
select DealerId, Tire, count(*) as RepairsOfThisTire,
max(cnt) as TotalRepairs,
count(*) * 1.0 / max(cnt) as PercentOfTotal
from (select t1.*, t2.Tire,
row_number() over (partition by t1.DealerId
order by (case when t2.tire is null then 1 else 0 end),
t1.RepairId
) as seqnum,
count(*) over (partition by t1.DealerId) as cnt
from table1 t1 left join
table2 t2
on t1.RepairId = t2.RepairId
) t
where seqnum <= 0.8*cnt and tire is not null
group by DealerId, Tire
order by 1, 2;
Thank you for you time.
This query works only because of the sequence of RepairId in table2, which already orders most-used tyres ahead of least-used tyres, probably by accident. If table2 really is artificially ordered in this way, you're ok, otherwise you will get incorrect results. Here's a copy of Lyn's dataset slightly adjusted to show this:
DROP TABLE dbo.Table1
CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);
INSERT INTO dbo.Table1 VALUES
(1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),
(1,8),(1,9),(1,10),(2,11),(2,12),(2,13),
(3,14),(3,15),(3,16);
DROP TABLE dbo.Table2
CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));
INSERT INTO dbo.Table2 VALUES (9,'225/65R17 102T'),
(7,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),
(5,'235/60R18 102V'),(6,'235/60R18 102V'),(8,'235/60R18 102V'),
(2,'205/55R16 89H'),(1,'205/70R15 89H'),(13,'225/65R17 102T'),
(14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');
and a query which works:
SELECT DealerID, tire, RepairsOfThisTire, TotalRepairs,
PercentOfTotal = (RepairsOfThisTire*100.00)/TotalRepairs
FROM (
SELECT DealerID, TotalRepairs, tire, RepairsOfThisTire,
Sector = NTILE(5) OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTire DESC, tire DESC)
FROM (
SELECT
t1.DealerID, t2.tire,
TotalRepairs = COUNT(*) OVER (PARTITION BY t1.DealerID),
RepairsOfThisTire = COUNT(*) OVER(PARTITION BY t1.DealerID, t2.tire)
FROM dbo.Table1 t1
LEFT JOIN dbo.Table2 t2 ON t2.RepairId = t1.RepairId
) d
WHERE tire IS NOT NULL
) e
WHERE Sector < 5
GROUP BY DealerID, TotalRepairs, tire, RepairsOfThisTire--, PercentOfTotal
ORDER BY DealerID, RepairsOfThisTire DESC, tire;
For better assistance in answering your questions, please read this[/url].
Hidden RBAR: Triangular Joins[/url] / The "Numbers" or "Tally" Table: What it is and how it replaces a loop[/url] Jeff Moden[/url]
Viewing 15 posts - 1 through 15 (of 25 total)
You must be logged in to reply to this topic. Login to reply