This is the fourth article in a short series of SQL Server query challenges to test and expand your SQL Server querying skills. While none of these short articles is of overwhelming complexity, they will hopefully either teach you something new or help you to reactivate potentially dormant SQL skills in preparation for your next real-world challenge. You can see the rest of the articles on the Query Answers with SQL Server page.
In this article we will once again imagine that you are working for a small classic car retailer, called Prestige Cars Ltd, using the data that is in the company database.
The Challenge
The CEO has a solid background in statistics and has just asked for a report that shows the difference between the median discount and the actual discount for all sales per customer. Your challenge is to come up with the SQL to satisfy her request. Here is one possible solution:
SELECT DISTINCT CU.CustomerName ,TotalSalePrice ,TotalSalePrice- PERCENTILE_CONT(0.5) WITHIN GROUP(ORDER BY TotalSalePrice) OVER(PARTITION BY CustomerName) AS SaleToMedianDelta FROM Data.Sales AS SA INNER JOIN Data.Customer CU ON SA.CustomerID = CU.CustomerID
Running this query gives the results that you can see in Figure 1.
Figure 1. Displaying median values with the PERCENTILE_CONT() function
How It Works
There are times when you need to get a purely statistical answer from your data. Finding a median (rather than an average) value and comparing this to sales prices is one example of this. However, SQL does not have a median function in the same way that it has an AVG() function. Nonetheless, you can calculate a median using the built-in PERCENTILE_CONT() function. Finding a median value requires you to use the following SQL elements:
PERCENTILE_CONT(0.5)
The PERCENTILE_CONT() function finds a percentile in a dataset. Because you are looking for the median value, you need to indicate this by entering 0.5 inside the parentheses.
WITHIN GROUP
This lets you specify the numeric field that you want to extract the median value from. You also have to include the ORDER BY clause.
OVER(PARTITION BY)
As with the other windowing functions that you have seen in this chapter, PERCENTILE_CONT() allows you to segment the dataset into subgroups so that you can find the median value for each of the fields (or combinations of fields) that you specify after the PARTITION BY clause.
In this example, the query joins the Sales and Customer tables since they contain the fields that you need. Once the CustomerName and SalePrice fields have been selected, you add the clause required to extract the median value per customer and then subtract this from the sale price for each vehicle sold. In this case, this means specifying the SalePrice field for the WITHIN GROUP argument and partitioning by the CustomerName in the OVER clause.
To help you understand the database that underlies this SQL here is the Prestige Cars ERD:
That is it – you have seen a simple example of how to display median values from a Dataset. Keep watching SQL Server Central.com ready for the next article in this series.
This article is adapted from the book “Query Answers with SQL Server Volume II: In-Depth Querying” by Adam and Karine Aspin. Available as both a paperback and an eBook from all good resellers including Amazon, Kobo and iBooks, as well as all good bookstores.
The sample data is available either in the accompanying material for this article, or on the Tetras Publishing website at: www.tetraspublishing.com/sqlserver.
The Series
There are a number of articles in this series. You can see them all on the Query Answers page.