December 9, 2008 at 5:03 am
I have a table with thousands of sales history records. Some of the records have skewed data i.e products being sold for £250k or £0.00 (company fudges etc). I want to exclude these from my min max.
Table format
product
qty
value
date
So I want to select by product the min (qty/value) and the max(qty/value) over the last 12 months. I would also like the average and median if possible. The min also must be greater then £0.00.
Please note that I have used the Quartile example here that return LowerRange and UpperRange and it returns the £250k price. I can't see what I'm doing wrong.
December 9, 2008 at 6:19 am
Kelvin Phayre (12/9/2008)
I have a table with thousands of sales history records. Some of the records have skewed data i.e products being sold for £250k or £0.00 (company fudges etc). I want to exclude these from my min max.Table format
product
qty
value
date
So I want to select by product the min (qty/value) and the max(qty/value) over the last 12 months. I would also like the average and median if possible. The min also must be greater then £0.00.
Please note that I have used the Quartile example here that return LowerRange and UpperRange and it returns the £250k price. I can't see what I'm doing wrong.
And we can't see what your'e doing at all. Please provide sample data / table schema and if you'd like to know why your query doesn't work, post that too. If you just want a solution to your problem, sample data / table schema and expected results should be enough. Please read the post in my signature for an example of *how* to post proper sample data here.
December 9, 2008 at 7:45 am
CREATE TABLE [dbo].[Products] (
[ProductId]char(20) NOT NULL,
[UnitPrice]float NULL
)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.65)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.65)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.65)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.65)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.78)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.55)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.55)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.78)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 1.1)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 1.02)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.9)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.9)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 1.85)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.78)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.78)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 236170.0)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.0)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.0)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.0)
GO
INSERT INTO [dbo].[Products]([ProductId], [UnitPrice])
VALUES('214700 ', 0.0)
GO
Then I copied the Quartile example below and get Lowerrange=0 and UpperRange=236170
SELECT MIN(UnitPrice) AS LowerRange , MAX(UnitPrice) AS UpperRange
FROM (
SELECT D1.ProductId
FROM
( SELECT TOP 25 PERCENT ProductId
FROM Products
ORDER BY UnitPrice ASC
) AS D1
UNION
SELECT D2.ProductId
FROM
( SELECT TOP 25 PERCENT ProductId
FROM Products
ORDER BY UnitPrice DESC
) AS D2
) AS DT RIGHT JOIN Products ON DT.ProductId = Products.ProductId
December 9, 2008 at 8:14 am
What are you trying to do with the TOP 25 Percent's?
I'm not quite getting what you want returned here, can you provide your expected results based on that sample data?
If all you want is the min/max without the outliers, you can just do something like this:
SELECT MIN(UnitPrice) AS LowerRange , MAX(UnitPrice) AS UpperRange
FROM Products
WHERE UnitPrice BETWEEN 0 AND 10000 -- Hard Coded value, this could be attained based on a top % of the table
December 9, 2008 at 8:20 am
I'm trying to achieve what this article says Quartile will do.
http://www.sqlservercentral.com/articles/Miscellaneous/sqlserverstatisticalfunctions/1255/
Specifically and I quote from the article.
"Another way of looking at extremes is to use something called the inter-quartile range
The inter-quartile range is worked out by ordering the values that you want to evaluate and splitting them into 4 equal parts. The inter-quartile range is the range of values represented by the middle two parts."
My problem is that the example appears to be returning the outer quartile and not the inner.
December 9, 2008 at 8:47 am
Having looked closer at the example I have now realised I have miss read what it is doing. So to make things really simple. If I have the following sales prices for a product.
P1,0
P1,5
P1,10
P1,20
Where P1 is my product code I want min to return 5 and max to return 10.
Is this possible. Obviously there will be more than 4 records for P1, in my case I have thousands.
December 9, 2008 at 8:48 am
Ok, I thought that might be what you were attempting, but 25% chunks seemed a bit large, so I wanted to verify ;). David's examples come out of the Northwind database. In that database, ProductID refers to an actual IDENTITY primary key. In your sample table, ProductID is not unique, and thus the theory behind this code does not work. This will allow you to use that code:
[font="Courier New"] ALTER TABLE Products
ADD ProductIdent INT IDENTITY(1,1)
SELECT MIN(UnitPrice) AS LowerRange , MAX(UnitPrice) AS UpperRange
FROM (
SELECT D1.ProductIdent
FROM
( SELECT TOP 25 PERCENT ProductIdent
FROM Products
ORDER BY UnitPrice ASC
) AS D1
UNION
SELECT D2.ProductIdent
FROM
( SELECT TOP 25 PERCENT ProductIdent
FROM Products
ORDER BY UnitPrice DESC
) AS D2
) AS DT RIGHT JOIN Products ON DT.ProductIdent = Products.ProductIdent
WHERE DT.ProductIdent IS NULL[/font]
December 9, 2008 at 9:05 am
I think this example is returning the min and max price across all products from the Products table. I want the min max for each product but I also want to exclude the extremes.
December 9, 2008 at 9:05 am
I think this will be more along the lines of what you're looking for to get all your products. Please note, that this may be very slow, but I don't know of a good way to do something like this in SQL2K.
[font="Courier New"]DECLARE @Products TABLE(
ProductIdent INT IDENTITY(1,1),
ProductID VARCHAR(10),
UnitPrice money)
INSERT INTO @Products(ProductID, UnitPrice)
SELECT 'P1',0 UNION ALL
SELECT 'P1',5 UNION ALL
SELECT 'P1',10 UNION ALL
SELECT 'P1',20 UNION ALL
SELECT 'P2',0 UNION ALL
SELECT 'P2',50 UNION ALL
SELECT 'P2',100 UNION ALL
SELECT 'P2',200
SELECT ProductID,
MIN(UnitPrice),
MAX(UnitPrice)
FROM @Products P
WHERE ProductIDent NOT IN (
(SELECT TOP 25 PERCENT ProductIdent
FROM @Products WHERE ProductID = P.ProductID
ORDER BY UnitPrice))
AND
ProductIDent NOT IN (
(SELECT TOP 25 PERCENT ProductIdent
FROM @Products WHERE ProductID = P.ProductID
ORDER BY UnitPrice DESC))
GROUP BY ProductID[/font]
December 9, 2008 at 11:01 am
I put 1.5m rows into a table and then ran your query. I then stopped the query before it crashed the server.
I then took a different tac. I deleted from my SalesProduct table the top 25% and bottom 25% for all products and then just did a min max on what was left. This returned the correct result. Also was quite fast.
If you come up with a better way of discarding records from a sample then please send in the solution.
Thanks for your help.
December 9, 2008 at 11:21 am
Kelvin Phayre (12/9/2008)
I put 1.5m rows into a table and then ran your query. I then stopped the query before it crashed the server.I then took a different tac. I deleted from my SalesProduct table the top 25% and bottom 25% for all products and then just did a min max on what was left. This returned the correct result. Also was quite fast.
If you come up with a better way of discarding records from a sample then please send in the solution.
Thanks for your help.
This is a slight variation on Garadin's solution which is probably more performant (Thanks Seth for providing the excellent core code)
SELECT ProductID,
MIN(UnitPrice),
MAX(UnitPrice)
FROM @Products P
LEFT JOIN (SELECT TOP 25 PERCENT ProductIdent, ProductID
FROM @Products
ORDER BY UnitPrice) h ON h.ProductIdent = P.ProductIdent AND h.ProductID = P.ProductID
LEFT JOIN (SELECT TOP 25 PERCENT ProductIdent, ProductID
FROM @Products
ORDER BY UnitPrice DESC) l ON l.ProductIdent = P.ProductIdent AND l.ProductID = P.ProductID
WHERE h.ProductIdent IS NULL AND l.ProductIdent IS NULL
GROUP BY ProductID
For fast, accurate and documented assistance in answering your questions, please read this article.
Understanding and using APPLY, (I) and (II) Paul White
Hidden RBAR: Triangular Joins / The "Numbers" or "Tally" Table: What it is and how it replaces a loop Jeff Moden
December 9, 2008 at 11:46 am
Kelvin Phayre (12/9/2008)
I put 1.5m rows into a table and then ran your query. I then stopped the query before it crashed the server.I then took a different tac. I deleted from my SalesProduct table the top 25% and bottom 25% for all products and then just did a min max on what was left. This returned the correct result. Also was quite fast.
If you come up with a better way of discarding records from a sample then please send in the solution.
Thanks for your help.
Heh, yeah, I was worried about that.
When you say you deleted the top and bottom 25% of your table, was that the top 25% of the records for each ProductID, or just the top 25% of the table. Because they are drastically different things.
December 9, 2008 at 11:51 am
Chris Morris (12/9/2008)
Kelvin Phayre (12/9/2008)
I put 1.5m rows into a table and then ran your query. I then stopped the query before it crashed the server.I then took a different tac. I deleted from my SalesProduct table the top 25% and bottom 25% for all products and then just did a min max on what was left. This returned the correct result. Also was quite fast.
If you come up with a better way of discarding records from a sample then please send in the solution.
Thanks for your help.
This is a slight variation on Garadin's solution which is probably more performant (Thanks Seth for providing the excellent core code)
SELECT ProductID,
MIN(UnitPrice),
MAX(UnitPrice)
FROM @Products P
LEFT JOIN (SELECT TOP 25 PERCENT ProductIdent, ProductID
FROM @Products
ORDER BY UnitPrice) h ON h.ProductIdent = P.ProductIdent AND h.ProductID = P.ProductID
LEFT JOIN (SELECT TOP 25 PERCENT ProductIdent, ProductID
FROM @Products
ORDER BY UnitPrice DESC) l ON l.ProductIdent = P.ProductIdent AND l.ProductID = P.ProductID
WHERE h.ProductIdent IS NULL AND l.ProductIdent IS NULL
GROUP BY ProductID
Chris,
Although it does seem like it might be more efficient due to the derived tables, the execution plans are virtually identical (once I add the aliases to your outer ProductID's). I actually had it this way and then went the other route. Check out this post from last week, particularly Jeff's post towards the bottom evaluating the 3 methods of doing this. (Left Join/NULL, Exists and IN)
http://www.sqlservercentral.com/Forums/Topic611416-338-1.aspx
December 9, 2008 at 4:21 pm
Garadin (12/9/2008)
Chris,Although it does seem like it might be more efficient due to the derived tables, the execution plans are virtually identical (once I add the aliases to your outer ProductID's). I actually had it this way and then went the other route. Check out this post from last week, particularly Jeff's post towards the bottom evaluating the 3 methods of doing this. (Left Join/NULL, Exists and IN)
http://www.sqlservercentral.com/Forums/Topic611416-338-1.aspx
Which would have probably been true if I didn't have outer references in my IN's. :crazy: After testing, Chris is absolutely right, try his out, with my tests on my Dev box, his version completes in 34 seconds for 1.4M rows.
December 10, 2008 at 8:37 am
I've run Chris' code and yes it runs real fast. Problem is that I have 944 unique product codes in the 1.5m records at the start but when I run the code it does not return a min max for 365 ProductID's. Please can you check your data and see if all products are accounted for?
Analyzing one of the missing products UnitPrice I have 731 records in my prices table(SalesProducts).
select UnitPrice, count(*) from SalesProducts where ProductID='123456'
group by UnitPrice
UnitPrice column2
------------ ----------
0.11 29
2.1 16
0.9 686
Viewing 15 posts - 1 through 15 (of 21 total)
You must be logged in to reply to this topic. Login to reply