Select Top x% From Group Based on Total

  • might be this will be handy and more readable

    with cte as (

    SELECT DISTINCT t1.DealerID, t2.tire,

    TotalRepairs = SUM(1) OVER (PARTITION BY t1.DealerID),

    RepairsOfThisTire = SUM(1) OVER(PARTITION BY t1.DealerID, t2.tire),

    PercentOfTotalRepairOrders = ROUND((SUM(1.0) OVER(PARTITION BY t1.DealerID, t2.tire)/SUM(1.0) OVER (PARTITION BY t1.DealerID)) *100,0)

    FROM dbo.Table1 t1

    LEFT JOIN dbo.Table2 t2

    ON t2.RepairId = t1.RepairId

    )

    SELECT * FROM cte

    WHERE tire IS NOT NULL

    ORDER BY PercentOfTotalRepairOrders DESC

    since you are using Sql 2012 you can use Lead or Lag function to calculate Total percent by Dealer Group and you can get the desire result

    Every rule in a world of bits and bytes, can be bend or eventually be broken
    MyBlog About Common dialog control
    A Visualizer for viewing SqlCommand object script [/url]

  • ChrisM@home (3/23/2014)

    This query works only because of the sequence of RepairId in table2, which already orders most-used tyres ahead of least-used tyres, probably by accident. If table2 really is artificially ordered in this way, you're ok, otherwise you will get incorrect results. Here's a copy of Lyn's dataset slightly adjusted to show this:

    DROP TABLE dbo.Table1

    CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

    INSERT INTO dbo.Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13),

    (3,14),(3,15),(3,16);

    DROP TABLE dbo.Table2

    CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

    INSERT INTO dbo.Table2 VALUES (9,'225/65R17 102T'),

    (7,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(8,'235/60R18 102V'),

    (2,'205/55R16 89H'),(1,'205/70R15 89H'),(13,'225/65R17 102T'),

    (14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');

    and a query which works:

    SELECT DealerID, tire, RepairsOfThisTire, TotalRepairs,

    PercentOfTotal = (RepairsOfThisTire*100.00)/TotalRepairs

    FROM (

    SELECT DealerID, TotalRepairs, tire, RepairsOfThisTire,

    Sector = NTILE(5) OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTire DESC, tire DESC)

    FROM (

    SELECT

    t1.DealerID, t2.tire,

    TotalRepairs = COUNT(*) OVER (PARTITION BY t1.DealerID),

    RepairsOfThisTire = COUNT(*) OVER(PARTITION BY t1.DealerID, t2.tire)

    FROM dbo.Table1 t1

    LEFT JOIN dbo.Table2 t2 ON t2.RepairId = t1.RepairId

    ) d

    WHERE tire IS NOT NULL

    ) e

    WHERE Sector < 5

    GROUP BY DealerID, TotalRepairs, tire, RepairsOfThisTire--, PercentOfTotal

    ORDER BY DealerID, RepairsOfThisTire DESC, tire;

    The problem with this query is it returns data for dealer id 3. The business rule I was given was that they only want to see the top 80%. In the case of dealer id 3 the query is showing the top 100%. In this example the top 80% cannot be calculated since all the orders are the same tire. The main issue I see and need to get clarification from the business on is what happens if the top tire is > 80%?

  • tfendt (3/24/2014)


    ...

    The problem with this query is it returns data for dealer id 3. The business rule I was given was that they only want to see the top 80%. In the case of dealer id 3 the query is showing the top 100%. In this example the top 80% cannot be calculated since all the orders are the same tire. The main issue I see and need to get clarification from the business on is what happens if the top tire is > 80%?

    Exactly. What do they want if the number of rows is not divisible by 5. Modification of the query to accommodate the decision should be straightforward.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • tfendt (3/24/2014)


    ChrisM@home (3/23/2014)

    This query works only because of the sequence of RepairId in table2, which already orders most-used tyres ahead of least-used tyres, probably by accident. If table2 really is artificially ordered in this way, you're ok, otherwise you will get incorrect results. Here's a copy of Lyn's dataset slightly adjusted to show this:

    DROP TABLE dbo.Table1

    CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

    INSERT INTO dbo.Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13),

    (3,14),(3,15),(3,16);

    DROP TABLE dbo.Table2

    CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

    INSERT INTO dbo.Table2 VALUES (9,'225/65R17 102T'),

    (7,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(8,'235/60R18 102V'),

    (2,'205/55R16 89H'),(1,'205/70R15 89H'),(13,'225/65R17 102T'),

    (14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V');

    and a query which works:

    SELECT DealerID, tire, RepairsOfThisTire, TotalRepairs,

    PercentOfTotal = (RepairsOfThisTire*100.00)/TotalRepairs

    FROM (

    SELECT DealerID, TotalRepairs, tire, RepairsOfThisTire,

    Sector = NTILE(5) OVER(PARTITION BY DealerID ORDER BY RepairsOfThisTire DESC, tire DESC)

    FROM (

    SELECT

    t1.DealerID, t2.tire,

    TotalRepairs = COUNT(*) OVER (PARTITION BY t1.DealerID),

    RepairsOfThisTire = COUNT(*) OVER(PARTITION BY t1.DealerID, t2.tire)

    FROM dbo.Table1 t1

    LEFT JOIN dbo.Table2 t2 ON t2.RepairId = t1.RepairId

    ) d

    WHERE tire IS NOT NULL

    ) e

    WHERE Sector < 5

    GROUP BY DealerID, TotalRepairs, tire, RepairsOfThisTire--, PercentOfTotal

    ORDER BY DealerID, RepairsOfThisTire DESC, tire;

    The problem with this query is it returns data for dealer id 3. The business rule I was given was that they only want to see the top 80%. In the case of dealer id 3 the query is showing the top 100%. In this example the top 80% cannot be calculated since all the orders are the same tire. The main issue I see and need to get clarification from the business on is what happens if the top tire is > 80%?

    I do like this query however. I am not 100% sure on how it works though. How does it know to select the top 80%? Why NTILE(5)?

  • Have a look at the NTILE function in Books Online (SQL Server help).

    NTILE(n) splits the data into n equal-sized partitions numbered 1 to n in a user-defined order and with defined rules for when the data set is not divisible by n.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (3/24/2014)


    Have a look at the NTILE function in Books Online (SQL Server help).

    NTILE(n) splits the data into n equal-sized partitions numbered 1 to n in a user-defined order and with defined rules for when the data set is not divisible by n.

    I was aware it split the data into 5 groups I was just unaware how it knows to select 80%. So 5 groups 20% in each group?

    So from what I understand is the data is first partitioned by DealerID and then split into 5 groups within each DealerID based on RepairsOfThisTire. Why 5 is my question.

    When I add another record to table 1 (3,18) and table 2 (18, '205/70R15 89H') for dealer 3 it then shows two records one at 75% and another at 25%.

  • tfendt (3/24/2014)


    ChrisM@Work (3/24/2014)


    Have a look at the NTILE function in Books Online (SQL Server help).

    NTILE(n) splits the data into n equal-sized partitions numbered 1 to n in a user-defined order and with defined rules for when the data set is not divisible by n.

    I was aware it split the data into 5 groups I was just unaware how it knows to select 80%. So 5 groups 20% in each group?

    So from what I understand is the data is first partitioned by DealerID and then split into 5 groups within each DealerID based on RepairsOfThisTire. Why 5 is my question.

    When I add another record to table 1 (3,18) and table 2 (18, '205/70R15 89H') for dealer 3 it then shows two records one at 75% and another at 25%.

    80% is 4/5, hence NTILE(5) and discard for n=5. As you've mentioned above, if the number of qualifying rows is not divisible by 5 then you have a choice to make. The query is written the simplest way which might meet your business requirement, but is adjustable if the requirement is different.

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • ChrisM@Work (3/24/2014)


    tfendt (3/24/2014)


    ChrisM@Work (3/24/2014)


    Have a look at the NTILE function in Books Online (SQL Server help).

    NTILE(n) splits the data into n equal-sized partitions numbered 1 to n in a user-defined order and with defined rules for when the data set is not divisible by n.

    I was aware it split the data into 5 groups I was just unaware how it knows to select 80%. So 5 groups 20% in each group?

    So from what I understand is the data is first partitioned by DealerID and then split into 5 groups within each DealerID based on RepairsOfThisTire. Why 5 is my question.

    When I add another record to table 1 (3,18) and table 2 (18, '205/70R15 89H') for dealer 3 it then shows two records one at 75% and another at 25%.

    80% is 4/5, hence NTILE(5) and discard for n=5. As you've mentioned above, if the number of qualifying rows is not divisible by 5 then you have a choice to make. The query is written the simplest way which might meet your business requirement, but is adjustable if the requirement is different.

    :pinch: Makes sense now. Not sure why I was trying to make it more complex than it really was.

  • tfendt (3/24/2014)


    ChrisM@Work (3/24/2014)


    tfendt (3/24/2014)


    ChrisM@Work (3/24/2014)


    Have a look at the NTILE function in Books Online (SQL Server help).

    NTILE(n) splits the data into n equal-sized partitions numbered 1 to n in a user-defined order and with defined rules for when the data set is not divisible by n.

    I was aware it split the data into 5 groups I was just unaware how it knows to select 80%. So 5 groups 20% in each group?

    So from what I understand is the data is first partitioned by DealerID and then split into 5 groups within each DealerID based on RepairsOfThisTire. Why 5 is my question.

    When I add another record to table 1 (3,18) and table 2 (18, '205/70R15 89H') for dealer 3 it then shows two records one at 75% and another at 25%.

    80% is 4/5, hence NTILE(5) and discard for n=5. As you've mentioned above, if the number of qualifying rows is not divisible by 5 then you have a choice to make. The query is written the simplest way which might meet your business requirement, but is adjustable if the requirement is different.

    :pinch: Makes sense now. Not sure why I was trying to make it more complex than it really was.

    80% of software is more complex than it needs to be for this same reason 😉

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Not sure of the performance of this solution, but it is worth a try.

    CREATE TABLE dbo.Table1 (DealerId INT, RepairId INT);

    INSERT INTO dbo.Table1 VALUES

    (1,1),(1,2),(1,3),(1,4),(1,5),(1,6),(1,7),

    (1,8),(1,9),(1,10),(2,11),(2,12),(2,13),

    (3,14),(3,15),(3,16),

    (4,17),(4,18),(4,19),(4,20),(4,21),(4,22),(4,23),(4,24),(4,25),(4,26);

    CREATE TABLE dbo.Table2 (RepairId INT, tire VARCHAR(15));

    INSERT INTO dbo.Table2 VALUES (1,'225/65R17 102T'),

    (2,'225/65R17 102T'),(3,'225/65R17 102T'),(4,'235/60R18 102V'),

    (5,'235/60R18 102V'),(6,'235/60R18 102V'),(7,'235/60R18 102V'),

    (8,'205/55R16 89H'),(9,'205/70R15 89H'),(13,'225/65R17 102T'),

    (14,'235/60R18 102V'),(15,'235/60R18 102V'),(16,'235/60R18 102V'),

    (17,'235/60R18 102V'),(18,'235/60R18 102V'),(19,'235/60R18 102V'),

    (20,'235/60R18 102V'),(21,'235/60R18 102V'),(22,'235/60R18 102V'),

    (23,'235/60R18 102V'),(24,'235/60R18 102V'),(25,'235/60R18 102V'),

    (26,'225/65R17 102T');

    go

    with BaseData as (

    select

    t1.DealerId,

    t1.RepairId,

    t2.tire,

    tirecnt = count(t2.tire) over (partition by t1.DealerId, t2.tire),

    repaircnt = count(t1.RepairId) over (partition by t1.DealerId)

    from

    dbo.Table1 t1 left join

    dbo.Table2 t2

    on t1.RepairId = t2.RepairId

    ), Level1 as (

    select

    bd.DealerId,

    bd.RepairId,

    bd.tire,

    bd.repaircnt,

    bd.tirecnt,

    PercentOrders = (bd.tirecnt * 100.0 / bd.repaircnt),

    cnt = (row_number() over (partition by bd.DealerId

    order by

    case when bd.tire is null then 1 else 0 end,

    bd.tirecnt desc,

    bd.tire,

    bd.RepairId) * 100.0) / bd.repaircnt

    from

    BaseData bd

    ), Level2 as (

    select *, rn = row_number() over (partition by DealerId, tire order by cnt desc)

    from Level1

    where cnt <= 80.0)

    select DealerId, tire, tirecnt as RepairsThisTire, repaircnt as RepairCnt, PercentOrders as PercentOfTotal from Level2 where rn = 1 and tire is not null

    order by DealerId, tirecnt desc;

    go

    DROP TABLE dbo.Table1;

    DROP TABLE dbo.Table2;

  • ChrisM@Work (3/24/2014)


    tfendt (3/24/2014)


    ChrisM@Work (3/24/2014)


    Have a look at the NTILE function in Books Online (SQL Server help).

    NTILE(n) splits the data into n equal-sized partitions numbered 1 to n in a user-defined order and with defined rules for when the data set is not divisible by n.

    I was aware it split the data into 5 groups I was just unaware how it knows to select 80%. So 5 groups 20% in each group?

    So from what I understand is the data is first partitioned by DealerID and then split into 5 groups within each DealerID based on RepairsOfThisTire. Why 5 is my question.

    When I add another record to table 1 (3,18) and table 2 (18, '205/70R15 89H') for dealer 3 it then shows two records one at 75% and another at 25%.

    80% is 4/5, hence NTILE(5) and discard for n=5. As you've mentioned above, if the number of qualifying rows is not divisible by 5 then you have a choice to make. The query is written the simplest way which might meet your business requirement, but is adjustable if the requirement is different.

    Just got confirmation from the business. When the number of rows is not divisible by 5 we will display all the results. I will modify the code to use your new query.

Viewing 11 posts - 16 through 25 (of 25 total)

You must be logged in to reply to this topic. Login to reply