Min Max Quartile

  • I have a table with thousands of sales history records. Some of the records have skewed data i.e products being sold for £250k or £0.00 (company fudges etc). I want to exclude these from my min max.

    Table format

    product

    qty

    value

    date

    So I want to select by product the min (qty/value) and the max(qty/value) over the last 12 months. I would also like the average and median if possible. The min also must be greater then £0.00.

    Please note that I have used the Quartile example here that return LowerRange and UpperRange and it returns the £250k price. I can't see what I'm doing wrong.

  • Kelvin Phayre (12/9/2008)


    I have a table with thousands of sales history records. Some of the records have skewed data i.e products being sold for £250k or £0.00 (company fudges etc). I want to exclude these from my min max.

    Table format

    product

    qty

    value

    date

    So I want to select by product the min (qty/value) and the max(qty/value) over the last 12 months. I would also like the average and median if possible. The min also must be greater then £0.00.

    Please note that I have used the Quartile example here that return LowerRange and UpperRange and it returns the £250k price. I can't see what I'm doing wrong.

    And we can't see what your'e doing at all. Please provide sample data / table schema and if you'd like to know why your query doesn't work, post that too. If you just want a solution to your problem, sample data / table schema and expected results should be enough. Please read the post in my signature for an example of *how* to post proper sample data here.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • CREATE TABLE [dbo].[Products] (

    [ProductId]char(20) NOT NULL,

    [UnitPrice]float NULL

    )

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.65)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.65)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.65)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.65)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.78)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.55)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.55)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.78)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 1.1)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 1.02)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.9)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.9)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 1.85)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.78)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.78)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 236170.0)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.0)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.0)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.0)

    GO

    INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])

    VALUES('214700 ', 0.0)

    GO

    Then I copied the Quartile example below and get Lowerrange=0 and UpperRange=236170

    SELECT MIN(UnitPrice) AS LowerRange , MAX(UnitPrice) AS UpperRange

    FROM (

    SELECT D1.ProductId

    FROM

    ( SELECT TOP 25 PERCENT ProductId

    FROM Products

    ORDER BY UnitPrice ASC

    ) AS D1

    UNION

    SELECT D2.ProductId

    FROM

    ( SELECT TOP 25 PERCENT ProductId

    FROM Products

    ORDER BY UnitPrice DESC

    ) AS D2

    ) AS DT RIGHT JOIN Products ON DT.ProductId = Products.ProductId

  • What are you trying to do with the TOP 25 Percent's?

    I'm not quite getting what you want returned here, can you provide your expected results based on that sample data?

    If all you want is the min/max without the outliers, you can just do something like this:

    SELECT MIN(UnitPrice) AS LowerRange , MAX(UnitPrice) AS UpperRange

    FROM Products

    WHERE UnitPrice BETWEEN 0 AND 10000 -- Hard Coded value, this could be attained based on a top % of the table

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I'm trying to achieve what this article says Quartile will do.

    http://www.sqlservercentral.com/articles/Miscellaneous/sqlserverstatisticalfunctions/1255/

    Specifically and I quote from the article.

    "Another way of looking at extremes is to use something called the inter-quartile range

    The inter-quartile range is worked out by ordering the values that you want to evaluate and splitting them into 4 equal parts. The inter-quartile range is the range of values represented by the middle two parts."

    My problem is that the example appears to be returning the outer quartile and not the inner.

  • Having looked closer at the example I have now realised I have miss read what it is doing. So to make things really simple. If I have the following sales prices for a product.

    P1,0

    P1,5

    P1,10

    P1,20

    Where P1 is my product code I want min to return 5 and max to return 10.

    Is this possible. Obviously there will be more than 4 records for P1, in my case I have thousands.

  • Ok, I thought that might be what you were attempting, but 25% chunks seemed a bit large, so I wanted to verify ;). David's examples come out of the Northwind database. In that database, ProductID refers to an actual IDENTITY primary key. In your sample table, ProductID is not unique, and thus the theory behind this code does not work. This will allow you to use that code:

    [font="Courier New"]     ALTER TABLE Products

    ADD ProductIdent INT IDENTITY(1,1)

         SELECT MIN(UnitPrice) AS LowerRange , MAX(UnitPrice) AS UpperRange

    FROM (

        SELECT D1.ProductIdent

        FROM

            ( SELECT TOP 25 PERCENT ProductIdent

            FROM Products

            ORDER BY UnitPrice ASC

            ) AS D1

        UNION

        SELECT D2.ProductIdent

        FROM

            ( SELECT TOP 25 PERCENT ProductIdent

            FROM Products

            ORDER BY UnitPrice DESC

            ) AS D2

        ) AS DT RIGHT JOIN Products ON DT.ProductIdent = Products.ProductIdent

        WHERE DT.ProductIdent IS NULL[/font]

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I think this example is returning the min and max price across all products from the Products table. I want the min max for each product but I also want to exclude the extremes.

  • I think this will be more along the lines of what you're looking for to get all your products. Please note, that this may be very slow, but I don't know of a good way to do something like this in SQL2K.

    [font="Courier New"]DECLARE @Products TABLE(

    ProductIdent   INT IDENTITY(1,1),

    ProductID      VARCHAR(10),

    UnitPrice      money)

    INSERT INTO @Products(ProductID, UnitPrice)

    SELECT 'P1',0 UNION ALL

    SELECT 'P1',5 UNION ALL

    SELECT 'P1',10 UNION ALL

    SELECT 'P1',20 UNION ALL

    SELECT 'P2',0 UNION ALL

    SELECT 'P2',50 UNION ALL

    SELECT 'P2',100 UNION ALL

    SELECT 'P2',200

      

    SELECT ProductID,

           MIN(UnitPrice),

           MAX(UnitPrice)

    FROM @Products P

    WHERE ProductIDent NOT IN (

       (SELECT TOP 25 PERCENT ProductIdent

           FROM @Products WHERE ProductID = P.ProductID

           ORDER BY UnitPrice))

       AND

           ProductIDent NOT IN (      

       (SELECT TOP 25 PERCENT ProductIdent

                   FROM @Products WHERE ProductID = P.ProductID

                   ORDER BY UnitPrice DESC))

    GROUP BY ProductID[/font]

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I put 1.5m rows into a table and then ran your query. I then stopped the query before it crashed the server.

    I then took a different tac. I deleted from my SalesProduct table the top 25% and bottom 25% for all products and then just did a min max on what was left. This returned the correct result. Also was quite fast.

    If you come up with a better way of discarding records from a sample then please send in the solution.

    Thanks for your help.

  • Kelvin Phayre (12/9/2008)


    I put 1.5m rows into a table and then ran your query. I then stopped the query before it crashed the server.

    I then took a different tac. I deleted from my SalesProduct table the top 25% and bottom 25% for all products and then just did a min max on what was left. This returned the correct result. Also was quite fast.

    If you come up with a better way of discarding records from a sample then please send in the solution.

    Thanks for your help.

    This is a slight variation on Garadin's solution which is probably more performant (Thanks Seth for providing the excellent core code)

    SELECT ProductID,

    MIN(UnitPrice),

    MAX(UnitPrice)

    FROM @Products P

    LEFT JOIN (SELECT TOP 25 PERCENT ProductIdent, ProductID

    FROM @Products

    ORDER BY UnitPrice) h ON h.ProductIdent = P.ProductIdent AND h.ProductID = P.ProductID

    LEFT JOIN (SELECT TOP 25 PERCENT ProductIdent, ProductID

    FROM @Products

    ORDER BY UnitPrice DESC) l ON l.ProductIdent = P.ProductIdent AND l.ProductID = P.ProductID

    WHERE h.ProductIdent IS NULL AND l.ProductIdent IS NULL

    GROUP BY ProductID

    “Write the query the simplest way. If through testing it becomes clear that the performance is inadequate, consider alternative query forms.” - Gail Shaw

    For fast, accurate and documented assistance in answering your questions, please read this article.
    Understanding and using APPLY, (I) and (II) Paul White
    Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden

  • Kelvin Phayre (12/9/2008)


    I put 1.5m rows into a table and then ran your query. I then stopped the query before it crashed the server.

    I then took a different tac. I deleted from my SalesProduct table the top 25% and bottom 25% for all products and then just did a min max on what was left. This returned the correct result. Also was quite fast.

    If you come up with a better way of discarding records from a sample then please send in the solution.

    Thanks for your help.

    Heh, yeah, I was worried about that.

    When you say you deleted the top and bottom 25% of your table, was that the top 25% of the records for each ProductID, or just the top 25% of the table. Because they are drastically different things.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Chris Morris (12/9/2008)


    Kelvin Phayre (12/9/2008)


    I put 1.5m rows into a table and then ran your query. I then stopped the query before it crashed the server.

    I then took a different tac. I deleted from my SalesProduct table the top 25% and bottom 25% for all products and then just did a min max on what was left. This returned the correct result. Also was quite fast.

    If you come up with a better way of discarding records from a sample then please send in the solution.

    Thanks for your help.

    This is a slight variation on Garadin's solution which is probably more performant (Thanks Seth for providing the excellent core code)

    SELECT ProductID,

    MIN(UnitPrice),

    MAX(UnitPrice)

    FROM @Products P

    LEFT JOIN (SELECT TOP 25 PERCENT ProductIdent, ProductID

    FROM @Products

    ORDER BY UnitPrice) h ON h.ProductIdent = P.ProductIdent AND h.ProductID = P.ProductID

    LEFT JOIN (SELECT TOP 25 PERCENT ProductIdent, ProductID

    FROM @Products

    ORDER BY UnitPrice DESC) l ON l.ProductIdent = P.ProductIdent AND l.ProductID = P.ProductID

    WHERE h.ProductIdent IS NULL AND l.ProductIdent IS NULL

    GROUP BY ProductID

    Chris,

    Although it does seem like it might be more efficient due to the derived tables, the execution plans are virtually identical (once I add the aliases to your outer ProductID's). I actually had it this way and then went the other route. Check out this post from last week, particularly Jeff's post towards the bottom evaluating the 3 methods of doing this. (Left Join/NULL, Exists and IN)

    http://www.sqlservercentral.com/Forums/Topic611416-338-1.aspx

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • Garadin (12/9/2008)


    Chris,

    Although it does seem like it might be more efficient due to the derived tables, the execution plans are virtually identical (once I add the aliases to your outer ProductID's). I actually had it this way and then went the other route. Check out this post from last week, particularly Jeff's post towards the bottom evaluating the 3 methods of doing this. (Left Join/NULL, Exists and IN)

    http://www.sqlservercentral.com/Forums/Topic611416-338-1.aspx

    Which would have probably been true if I didn't have outer references in my IN's. :crazy: After testing, Chris is absolutely right, try his out, with my tests on my Dev box, his version completes in 34 seconds for 1.4M rows.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • I've run Chris' code and yes it runs real fast. Problem is that I have 944 unique product codes in the 1.5m records at the start but when I run the code it does not return a min max for 365 ProductID's. Please can you check your data and see if all products are accounted for?

    Analyzing one of the missing products UnitPrice I have 731 records in my prices table(SalesProducts).

    select UnitPrice, count(*) from SalesProducts where ProductID='123456'

    group by UnitPrice

    UnitPrice column2

    ------------ ----------

    0.11 29

    2.1 16

    0.9 686

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply