Blog Post

T-SQL Tuesday #016: Aggregates and Windowing Functions and Ranking! Yum!

,

T-SQL Tuesday #016Let’s get this out of the way right now: I am SO going to take advantage of this T-SQL Tuesday on aggregates to talk about OVER() and PARTITION BY. Aggregates and OVER go together like french fries and catsup…sure, you can have them apart, but together they’re just magical.

Note: If you’re not at all familiar with OVER (PARTITION BY) at all, take a quick look at my brief intro, Tip: Over and Partition By.

In my T-SQL Brush-up session (also known as the Forgotten SQL session), I walk my way through fairly random examples set in the AdventureWorks database, and miraculously end up somewhere sensible. 

(Tons more good talk and code after the jump…)

Select With Aggregates

Let’s start with a simple SELECT statement, chock full of aggregates:

/*****************************************************************************

------ Aggregates ------

This example gets the count, min/max/avg cost, and category of Products.

****************************************************************************/

SELECT

    COUNT(*) ItemCount ,

    MIN(P.ListPrice) MinCost ,

    MAX(P.ListPrice) MaxCost ,

    AVG(P.ListPrice) AvgCost ,

    SC.Name [Category]

FROM Production.Product P

LEFT OUTER JOIN Production.ProductSubCategory SC

    ON P.ProductSubCategoryID = SC.ProductSubCategoryID

WHERE P.ProductSubCategoryID IS NOT NULL

    AND P.Color IS NOT NULL

GROUP BY SC.Name

ORDER BY SC.Name;

That looks easy enough! We’re just getting some item COUNT, MIN/MAX/AVG cost action from our products table. 

Rows and Aggregations, Side by Side

 But we need a query for a report on individual items, not a summary. We want all the individual data alongside the aggregate data for each group. Enter the OVER (PARTITION BY) clause!

/*******************************************************************************

** OVER allows you to add aggregate columns within a query, without a GROUP BY. **

*******************************************************************************/

SELECT

    P.ListPrice ,

    COUNT(*) OVER ( PARTITION BY SC.Name ) [Item Count Per Category] , -- How many items IN EACH GROUP (SC.Name is category)

    MIN(P.ListPrice) OVER ( PARTITION BY SC.Name ) [Min Cost Per Category] ,    -- Minimum price IN EACH GROUP

    MAX(P.ListPrice) OVER ( PARTITION BY SC.Name) [Max Cost Per Category],    -- Maximum price IN EACH GROUP

    AVG(P.ListPrice) OVER ( ) [Average Cost Overall],    -- Average price over the whole rowset

    Count(P.ListPrice) OVER ( ) [Count Overall] ,  -- Count of items over the whole rowset

    SC.Name [Category] ,

    P.ProductNumber

FROM Production.Product P

LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID

WHERE P.ProductSubCategoryID IS NOT NULL

    AND P.Color IS NOT NULL;

A partial resultset looks like this:

ListPriceItem Count Per CategoryMin Cost Per CategoryMax Cost Per CategoryAverage Cost OverallCount OverallCategoryProductNumber
89.99389.9989.99885.3284245Bib-ShortsSB-M891-S
89.99389.9989.99885.3284245Bib-ShortsSB-M891-M
89.99389.9989.99885.3284245Bib-ShortsSB-M891-L
106.52106.5106.5885.3284245BrakesRB-9231
106.52106.5106.5885.3284245BrakesFB-9873

Notice that this query has no GROUP BY, because we want to see all the individual prodcts. This is, of course, very cool and convenient.

Ranking with Row_Number

Now, let’s melt the cheddar cheesy goodness of Row_Number over the top of this query…we’ll rank each product based on price, within its subcategory.  So for example,  all the gloves will be ranked from cheapest to most expensive, 1 to however many gloves we sell:

/*****************************************************************************

------ Ranking ------

******************************************************************************/

SELECT

     ROW_NUMBER() OVER ( PARTITION BY SC.Name ORDER BY P.ListPrice ) RankNum ,

    COUNT(*) OVER ( PARTITION BY SC.Name ) [Count],

    P.ProductNumber,

    P.ListPrice ,

    SC.Name [Category]

FROM Production.Product P

JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID

WHERE P.ProductSubCategoryID IS NOT NULL

    AND P.Color IS NOT NULL;

This gives us a rowset like this (partial):

RankNumCountProductNumberListPriceCategory
13CS-4759175.49Cranksets
23CS-6583256.49Cranksets
33CS-9183404.99Cranksets
12FD-234291.49Derailleurs
22RD-2308121.46Derailleurs
16GL-H102-S24.49Gloves
26GL-H102-M24.49Gloves

 Let’s take a quick step back. What we have here is a list of all products, grouped by category and ranked from cheapest to most expensive (within category). What could we do with that kind of information? Well, how about we put together a package deal for the salespeople to use….say, a “Frugal Shopper Special” of the least expensive item from each category we sell!

CTE FTW

There’s no way to ferret out this information with a basic SELECT-FROM-WHERE query. You can’t say “WHERE Row_Number() = 1″ (because windowed functions can only appear in the SELECT or ORDER BY), or “WHERE RankNum = 1″ (because column level aliases can only appear in the ORDER BY clause).  We have to make the SQL engine pretend that this rowset is a persisted table.  While, yes, you could load this data into a temporary table or table variable, we know (in our particular circumstance) that we won’t need to access that data again. Why not use a CTE?

Note: For an intro to CTEs, see N Things Worth Knowing About CTEs.

We stick the entire query from above into a CTE, and just select all columns WHERE RankNum = 1 (that is, for the cheapest item in each category):

WITH RankedProducts ( RankNum, Name, ProductNumber, Color, ListPrice, Category, StandardCost )

AS ( SELECT ROW_NUMBER() OVER ( PARTITION BY SC.Name ORDER BY P.ListPrice ) RankNum ,

    P.ProductNumber ,

    P.ListPrice ,

    SC.Name [Category],

    StandardCost

FROM Production.Product P

LEFT OUTER JOIN Production.ProductSubCategory SC ON P.ProductSubCategoryID = SC.ProductSubCategoryID

WHERE P.ProductSubCategoryID IS NOT NULL

    AND P.Color IS NOT NULL

)

SELECT ProductNumber ,

    ListPrice ,

    Category ,

    SUM(StandardCost) OVER () TotalStandardCost,

    SUM(ListPrice) OVER ( ) TotalCost

FROM RankedProducts

WHERE RankNum = 1

ORDER BY Category;

And you’ll notice there at the end, we added the sum of standard cost and of list price for the entire set…those numbers mean “sum of what it costs to make these items” and “sum of what we normally sell these items for”. Full result set:

ProductNumberListPriceCategoryTotalStandardCostTotalCost
SB-M891-S89.99Bib-Shorts2120.39683833.625
RB-9231106.5Brakes2120.39683833.625
CA-10988.99Caps2120.39683833.625
CH-023420.24Chains2120.39683833.625
CS-4759175.49Cranksets2120.39683833.625
FD-234291.49Derailleurs2120.39683833.625
GL-H102-S24.49Gloves2120.39683833.625
HL-U509-R34.99Helmets2120.39683833.625
HY-1023-7054.99Hydration Packs2120.39683833.625
LJ-0192-S49.99Jerseys2120.39683833.625
BK-M18B-40539.99Mountain Bikes2120.39683833.625
FR-M21B-40249.79Mountain Frames2120.39683833.625
PA-T100125Panniers2120.39683833.625
PD-M28240.49Pedals2120.39683833.625
BK-R19B-44539.99Road Bikes2120.39683833.625
FR-R38B-58337.22Road Frames2120.39683833.625
SH-M897-M59.99Shorts2120.39683833.625
SO-R809-M8.99Socks2120.39683833.625
TG-W091-S74.99Tights2120.39683833.625
BK-T18U-54742.35Touring Bikes2120.39683833.625
FR-T67U-50333.42Touring Frames2120.39683833.625
VE-C304-S63.5Vests2120.39683833.625
FW-M42360.745Wheels2120.39683833.625

Now the sales folk have which items to put in the package, and the current profit margin (so they can play around with discounts).  It is, in short, a delicious heap of steaming potato-, catsup-, and cheesy-goodness.  Okay, now I want fries… 

For more on these particular examples, see the recording of T-SQL Brush-up (requires PASS login), given at 24 Hours of PASS Fall 2010 (and download the demo code here).

Happy Tuesday!

Jen McCown

http://www.MidnightDBA.com/Jen

Rate

You rated this post out of 5. Change rating

Share

Share

Rate

You rated this post out of 5. Change rating