Another post for me that is simple and hopefully serves as an example for people trying to get blogging as #SQLNewBloggers.
I’ve been writing about window functions, which I find very handy. In this post, I want to build a running total using the framing of the window. This continues from my last post on aggregates.
I’ve been looking at baseball data. Imagine that someone wants to know how many total home runs a player had at each stage of his career. In this case, if a player hit 1, then 4, then 5 home runs, we’d expect a running total to show:
- Year 1: 1
- Year 2: 5
- Year 3: 10
This is a cumbersome query without window functions, and inefficient, as I really need a subtotal query for each of the main rows. It’s difficult to write, read, and it’s slow. With a window function, however, I can use this query. You can see the framing with the ROWS section in the OVER() clause.
SELECT
yearid
, hr
, SUM (b.hr) OVER (ORDER BY b.yearID
ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS TotalHR
FROM dbo.batting AS b
WHERE b.playerID = 'griffke02'
ORDER BY b.yearID;
This gives me results that look like this:
Simple and easy to see and read.
This works as the window is scanned and totaled using the OVER() clause. In this case, the partition is the entire window, meaning all rows for this player. I’ve set the ordering to be the year, so as we move through the years, the SUM() is calculated using the part of the window that goes from the beginning, with the UNBOUNDED PRECEDING marker, until the current row.
You can think of this as we scan from year 1989 first. In this case, the entire preceding section is nothing, and the current row is 1989. Therefore the sum is 16.
Next we look at year 1990. There is a preceding row (1989) and this current row. We sum those to get 16+22=38. We repeat this with each row, always going back to the first row.
For the ROWS clause, we can use the between to determine the start and end portion of the partition that we scan. This means we can use:
- unbounded preceding
- unbounded following
- current row
- an integer
We can combine these 4 choices to get what we need. If we were looking only for a best 3 year time frame of home runs, we could get a sum like this:
SELECT
yearid
, hr
, SUM (b.hr) OVER (ORDER BY b.yearID
ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING) AS TotalHR
FROM dbo.batting AS b
WHERE b.playerID = 'griffke02'
ORDER BY b.yearID;
This gives me a grouping of the SUM for the current 1, as well as 1 before and after, for each row. My results are shown below. For the first year, there is no preceding row, so we sum the current and next row, 16+22 for 38. For the second row, we have 16 preceding, 22 current, and 22 next, which sum to 60. You can check the math for others.
There’s more to this, but for now, a quick running total is using SUM() and then the ROWS BETWEEN UNBOUNDED PRECEDING and CURRENT ROW clause.
SQLNewBlogger
This was a quick 15 minutes to write. I took part of what I’d done in the last post, changed the query quickly, and then started to explain part of a clause. I’ll keep this around and use it to expand on some other places where the framing can be useful and affect how I work with data.
A good chance for you to also show how you might build a running total, or even running count, with your own data.