Find Averages ignoring high/low values

  • Hi. I need some help on a query I am writing. Users book time to a process against a Manufacturing Order (i.e. Process = Test, Assembly, Design). The Manufacturing Order relates to a Product.

    So over time multiple Manufacturing Orders will be completed for each Product.

    Now each completed Manufacturing Order will have values for quantity of time booked against a particular Process (quantity booked in minutes). I am trying to find the average times for each process for each product built (ignoring high low values).

    Now as I have asked for your time it is only fair that I put in some effort. I have taken some time to write a SQL script that creates all the necessary using TempDB then inserts some sample data (see attached 'Forum Post Script.txt'). The script includes my select query that does not yield the desired results.

    Required Output:

    Column 1

    Product Id

    Column 2

    Qty Manufactured (SUM of Manufacturing Orders for each unique Product)

    x

    Column 3

    ProcessName (unique)

    Column 4

    Quantity (Average)

    Column 5

    Quantity (Average removing High/Low values)

    Many Thanks,

    Phil.

    -------------------------------------------------------------------------------------
    A neutron walks into a bar. "I'd like a beer" he says. The bartender promptly serves up a beer. "How much will that be?" asks the neutron. "For you?" replies the bartender, "no charge."

    Two hydrogen atoms walk into a bar. One says, 'I think I've lost an electron.' The other says 'Are you sure?' The first says, 'Yes, I'm positive... '

    Tommy Cooper

  • Hi. Thanks to those that have viewed. I have had a go at the required query.

    WITH CompletedMO AS

    (

    SELECT

    ManufacturingOrders.ManufacturingOrderId,

    Products.ProductId,

    LabourRates.LabourRateId AS ProcessName,

    LabourTransactions.Quantity / 60 AS HoursBooked

    FROM

    LabourTransactions

    INNER JOIN ManufacturingOrders ON LabourTransactions.ManufacturingOrder = ManufacturingOrders.ManufacturingOrder

    INNER JOIN ActivityCentres ON LabourTransactions.ActivityCentre = ActivityCentres.ActivityCentre

    INNER JOIN LabourRates ON LabourRates.LabourRate = ActivityCentres.LabourRate

    INNER JOIN Processes ON LabourTransactions.Process = Processes.Process

    INNER JOIN Products ON ManufacturingOrders.Product = Products.Product

    )

    SELECT mo.ProductId,mo.ProcessName, SUM (mo.HoursBooked) AS HoursBooked, AVG(mo.HoursBooked) AS AvgHoursBooked

    FROM CompletedMO AS mo

    GROUP BY mo.ProductId,mo.ProcessName

    ORDER BY mo.ProductId ASC

    Not quite the desired output but a start.

    Phil.

    -------------------------------------------------------------------------------------
    A neutron walks into a bar. "I'd like a beer" he says. The bartender promptly serves up a beer. "How much will that be?" asks the neutron. "For you?" replies the bartender, "no charge."

    Two hydrogen atoms walk into a bar. One says, 'I think I've lost an electron.' The other says 'Are you sure?' The first says, 'Yes, I'm positive... '

    Tommy Cooper

  • Phil

    What should the output based on the current test data look like

    EG

    ManufacturingOrderIdSomeColumnProductIdProcessNameQuantity

    MO01508111-FAU440AASSEMBLY60.00000

    MO01508121-FAU440AASSEMBLY75.00000

    MO01508111-FAU440AASSEMBLY255.00000

    MO01508111-FAU440AASSEMBLY330.00000

    MO01508111-FAU440AINSPECTION30.00000

    MO01508111-FAU440ATEST60.00000

    MO02114311-FAU440AASSEMBLY30.00000

    MO02114321-FAU440AASSEMBLY45.00000

    MO02114311-FAU440AASSEMBLY105.00000

    MO02114311-FAU440AASSEMBLY360.00000

    MO02114311-FAU440AASSEMBLY495.00000

    MO02114311-FAU440AINSPECTION45.00000

    MO02114311-FAU440AINSPECTION105.00000

    MO02114311-FAU440ASTORES/DISPATCH240.00000

    MO02114321-FAU440ATEST60.00000

    MO02114311-FAU440ATEST195.00000

    MO02114311-FAU440ATEST300.00000

    MO02261621-FAU440AASSEMBLY30.00000

    MO02261611-FAU440AASSEMBLY60.00000

    MO02261611-FAU440AASSEMBLY240.00000

    MO02261611-FAU440AASSEMBLY360.00000

    MO02261611-FAU440AASSEMBLY495.00000

    MO02261611-FAU440ASTORES/DISPATCH360.00000

    MO02261611-FAU440ATEST30.00000

    MO02261611-FAU440ATEST435.00000

    MO02429711-FAU440AASSEMBLY120.00000

    MO02429711-FAU440AASSEMBLY255.00000

    MO02429711-FAU440AASSEMBLY405.00000

    MO02429711-FAU440AINSPECTION45.00000

    MO02429711-FAU440ASTORES/DISPATCH330.00000

    MO02429711-FAU440ATEST15.00000

    MO02429721-FAU440ATEST30.00000

    MO02429721-FAU440ATEST180.00000

    MO02429711-FAU440ATEST255.00000

    MO02429711-FAU440ATEST270.00000

    MO02483911-FAU440BINSPECTION15.00000

    MO02483911-FAU440BTEST180.00000

    MO02483911-FAU440BTEST195.00000

    MO02484611-FAU440BASSEMBLY90.00000

    MO02484611-FAU440BINSPECTION15.00000

    MO02484611-FAU440BTEST120.00000

    MO02484611-FAU440BTEST180.00000

    MO02507011-FAU440BASSEMBLY15.00000

    MO02507011-FAU440BASSEMBLY60.00000

    MO02507011-FAU440BINSPECTION30.00000

    MO02507011-FAU440BTEST30.00000

    MO02507011-FAU440BTEST120.00000

    MO02507011-FAU440BTEST240.00000

    MO02520621-FAU440BASSEMBLY30.00000

    MO02520611-FAU440BINSPECTION15.00000

    MO02520611-FAU440BTEST45.00000

    MO02520611-FAU440BTEST60.00000

    MO02520611-FAU440BTEST105.00000

    MO02520611-FAU440BTEST120.00000

    MO02520611-FAU440BTEST240.00000

    MO02520611-FAU440BTEST270.00000

    MO02520611-FAU440BTEST375.00000

  • Hi. So current output as per your post. I would like to look as follows:

    Product..........Manufactured Qty...........Process..................Total Time......Avg Time Booked (hours)

    1-FAU440A..............4........................ASSEMBLY...............62.................3.64

    ......................................................INSPECTION.............3.75..............0.93

    ......................................................STORES/DISPATCH....15.5..............5.16

    ......................................................TEST......................30.5..............2.77

    1-FAU440B..............4........................ASSEMBLY..............3.25...............0.81

    ......................................................INSPECTION............1.25...............0.31

    ......................................................TEST.....................38..................2.71

    Strictly speaking that's not all of it as I have not stripped out the high / low values.

    Looking at the output 1-FAU440A had 17 time bookings for assembly. The highest was 495, the lowest 30. Although 495 was booked more than once.......

    Based on that I would have another column Avg Time (High/Low Removed)

    The end user wants obvious high / low values removed that would skew the average.

    I hope that makes sense.

    Phil.

    -------------------------------------------------------------------------------------
    A neutron walks into a bar. "I'd like a beer" he says. The bartender promptly serves up a beer. "How much will that be?" asks the neutron. "For you?" replies the bartender, "no charge."

    Two hydrogen atoms walk into a bar. One says, 'I think I've lost an electron.' The other says 'Are you sure?' The first says, 'Yes, I'm positive... '

    Tommy Cooper

  • 2Tall (7/13/2012)


    Hi. So current output as per your post. I would like to look as follows:

    Product..........Manufactured Qty...........Process..................Total Time......Avg Time Booked (hours)

    1-FAU440A..............4........................ASSEMBLY...............62.................3.64

    ......................................................INSPECTION.............3.75..............0.93

    ......................................................STORES/DISPATCH....15.5..............5.16

    ......................................................TEST......................30.5..............2.77

    1-FAU440B..............4........................ASSEMBLY..............3.25...............0.81

    ......................................................INSPECTION............1.25...............0.31

    ......................................................TEST.....................38..................2.71

    Strictly speaking that's not all of it as I have not stripped out the high / low values.

    Looking at the output 1-FAU440A had 17 time bookings for assembly. The highest was 495, the lowest 30. Although 495 was booked more than once.......

    Based on that I would have another column Avg Time (High/Low Removed)

    The end user wants obvious high / low values removed that would skew the average.

    I hope that makes sense.

    Phil.

    Your big issue is the squishy nature of the requirement you were just given. You really need to clarify what they mean by "obviously high/low". In other words - they're not asking for an average that blindly removes the high and low values, they're looking for something like outliers.

    Either way - even if you use the sometimes accepted 6Sigma approach of looking for items that are more than 3 sigma away from the mean, you end up not removing anything.

    for the analysis:

    with bob as (

    SELECT ManufacturingOrders.ManufacturingOrderId, Products.ProductId, LabourRates.LabourRateId AS ProcessName, LabourTransactions.Quantity

    FROM LabourTransactions INNER JOIN ManufacturingOrders ON LabourTransactions.ManufacturingOrder = ManufacturingOrders.ManufacturingOrder

    INNER JOIN ActivityCentres ON LabourTransactions.ActivityCentre = ActivityCentres.ActivityCentre

    INNER JOIN LabourRates ON LabourRates.LabourRate = ActivityCentres.LabourRate

    INNER JOIN Processes ON LabourTransactions.Process = Processes.Process

    INNER JOIN Products ON ManufacturingOrders.Product = Products.Product

    ),

    normalizedCTE as (

    select *, avg(quantity) over (partition by productID,ProcessName) prodavg,stdev(quantity) over (partition by productID,ProcessName) prodstdev

    from bob)

    select *,abs((quantity-prodavg)/prodstdev) Sigmas from normalizedCTE

    order by Sigmas

    So - nothing is over or under the product/activity average by more that 3 standard deviations, so usually would NOT be considered outliers.

    Arbitrarily removing values in itself skews results - be sure you understand what the customer really wants to do. You can't really code "obvious".

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Hi Matt (and to all those that have taken time to view). Great response. The end user (who you can loosly term as a customer 🙂 ) referred to a manual process where he looks at times recorded then ignores obvious high or low values. For in stance, take one of the products in question. To build the product (one single manufacturing order) he would expect to see 180 minutes booked for process assembly (routing value on structure). Over time you might see a slight deviation on this. Every now and then a user will make a typo on data input and book 1800 or 1. These are the values he would look to exclude in calculating the average.

    I will run your code first chance I get.

    Many Thanks,

    Phil.

    -------------------------------------------------------------------------------------
    A neutron walks into a bar. "I'd like a beer" he says. The bartender promptly serves up a beer. "How much will that be?" asks the neutron. "For you?" replies the bartender, "no charge."

    Two hydrogen atoms walk into a bar. One says, 'I think I've lost an electron.' The other says 'Are you sure?' The first says, 'Yes, I'm positive... '

    Tommy Cooper

  • Hi. I have run code (see attached output). Using MO024297 (process test) as and example the average time is returned by the query is 150 (i have highlighted in red). If you calculate this manually the average is 137.14

    30,30,180,180,15,255,270 / 7

    Maybe I misunderstand (its early + not much sleep often equals mistake)

    Phil.

    -------------------------------------------------------------------------------------
    A neutron walks into a bar. "I'd like a beer" he says. The bartender promptly serves up a beer. "How much will that be?" asks the neutron. "For you?" replies the bartender, "no charge."

    Two hydrogen atoms walk into a bar. One says, 'I think I've lost an electron.' The other says 'Are you sure?' The first says, 'Yes, I'm positive... '

    Tommy Cooper

  • OK coffee has helped. I can see the average value returned 150 is calculated by using unique numbers in the array.

    Kind Regards,

    Phil.

    -------------------------------------------------------------------------------------
    A neutron walks into a bar. "I'd like a beer" he says. The bartender promptly serves up a beer. "How much will that be?" asks the neutron. "For you?" replies the bartender, "no charge."

    Two hydrogen atoms walk into a bar. One says, 'I think I've lost an electron.' The other says 'Are you sure?' The first says, 'Yes, I'm positive... '

    Tommy Cooper

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply