June 8, 2009 at 10:40 am
Sample data and the query syntax:
use tempdb
go
create table Test (Product nchar(20), Actual_Rate decimal(14,2), Quantity int)
insert into dbo.Test
select 'A-1015-7658',1,35.95
union all
select 'A-1015-7658',1,35.95
union all
select 'A-1015-7658',3,120.19
union all
select 'A-1015-7658',2,80.13
union all
select 'A-1015-7658',1,45.39
union all
select 'A-1015-7658',1,45.39
select Product,
case when Quantity > 0 then (Actual_Rate / Quantity)
else Actual_Rate
end as Actual_Rate,
row_number() over(partition by Product order by Actual_Rate desc) as RowNum,
count(*) over(partition by Product) as Cnt
from dbo.Test
drop table dbo.Test
When I run this it does not sort all of the records correctly, I expected the values to be in descending order. Can anyone help with this?
Thanks
June 8, 2009 at 10:44 am
There's no ORDER BY on that query so you can make absolutely no assumptions about the order the data will be returned in. If you want a specific order, add the ORDER BY clause. If that's not there the data will be returned in whatever order it's left in after the query processor finished.
The order by in the ROW_NUMBER just affects the assignment of the row numbers. It may or may not affect the final order of the data.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 8, 2009 at 10:47 am
Care to elaborate on the problem a bit? What would really help is and explaination of your code (why the count() over, for one). Also, what is the expected results based on the sample data.
June 8, 2009 at 1:33 pm
Sorry, here's the full code:
use tempdb
go
create table Test (Product nchar(20), Actual_Rate decimal(14,2), Quantity int)
insert into dbo.Test
select 'A-1015-7658',35.95,1
union all
select 'A-1015-7658',35.95,1
union all
select 'A-1015-7658',120.19,3
union all
select 'A-1015-7658',80.13,2
union all
select 'A-1015-7658',45.39,1
union all
select 'A-1015-7658',45.39,1
;with MedianTest as
(
select Product,
case when Quantity > 0 then (Actual_Rate / Quantity)
else Actual_Rate
end as Actual_Rate,
row_number() over(partition by Product order by Actual_Rate desc) as RowNum,
count(*) over(partition by Product) as Cnt
from dbo.Test
)
select Product, avg(Actual_Rate) as Median
from MedianTest
where RowNum IN((Cnt + 1) / 2, (Cnt + 2) / 2)
group by Product;
drop table dbo.Test
Basically I'm trying to calculate the Median of a set of values. At the moment the sort order is causing problems because the values are in the wrong order.
June 8, 2009 at 1:45 pm
Still waiting for an explanation as well as what the sample data SHOULD look like if it was sorted correctly.
June 8, 2009 at 2:08 pm
David (6/8/2009)
Basically I'm trying to calculate the Median of a set of values. At the moment the sort order is causing problems because the values are in the wrong order.
If you're talking about the final sort order of the returned result set, there's no defined order because you have no ORDER BY. If you mean some other order, please can you give more details?
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
June 8, 2009 at 4:40 pm
Am I missing something here, this query gives the median?
June 9, 2009 at 12:40 am
To find the median of a result set you have to first order the result set and then depending on the number of items in the set either return the average of the two middle items or return the middle item. So for a dataset containing an even number of items the result is the average of the two middle items, for a dataset containing an odd number of items the result is the middle item.
Running the first part of the query, the statement within the WITH block, returns the following dataset:
A-1015-7658 40.063333333333316
A-1015-7658 40.065000000000026
A-1015-7658 45.390000000000036
A-1015-7658 45.390000000000046
A-1015-7658 35.950000000000056
A-1015-7658 35.950000000000066
Now to find the median the dataset needs to be sorted:
A-1015-7658 35.950000000000056
A-1015-7658 35.950000000000066
A-1015-7658 40.063333333333316
A-1015-7658 40.065000000000026
A-1015-7658 45.390000000000036
A-1015-7658 45.390000000000046
However, ordering the dataset like this causes problems with the RowNum. The RowNum is used to locate either the two middle items or the middle item as defined above.
The answer using my sample data is 40.06. Because my sample data contains an even number of items the result is the average of the two middle items.
June 9, 2009 at 1:57 am
Apologies, missed that entirely.
If you calculate actual rate before you assign the row number the method works correctly.
I haven't investigated the performance hit on this, but if you do the following CTE:
;with MedianTest as
(
select a.*,
row_number() over(partition by Product order by Actual_Rate desc) as RowNum,
count(*) over(partition by Product) as Cnt
from
(
select Product,
case when Quantity > 0 then (Actual_Rate / Quantity)
else Actual_Rate
end as Actual_Rate
from dbo.Test
) a
)
June 9, 2009 at 2:43 am
Perfect! Thanks Allister, that works.
June 9, 2009 at 2:58 am
No bother, btw I really like this solution for finding the median, pretty neat. Will have a play around with large result sets when I get a chanace.
June 9, 2009 at 6:02 am
Allister Reid (6/9/2009)
No bother, btw I really like this solution for finding the median, pretty neat. Will have a play around with large result sets when I get a chanace.
As much as I'd like to take all the credit, I actually found the following article really helpful http://www.sqlmag.com/Article/ArticleID/49827/sql_server_49827.html and it is where the majority of the code was from 😉
June 9, 2009 at 6:08 am
Cheers, thanks for the link!
Viewing 13 posts - 1 through 12 (of 12 total)
You must be logged in to reply to this topic. Login to reply