Given a customer table:
CREATE TABLE Customers
(CustId INT, OrderDate DATE, OrderAmount MONEY) ;
INSERT Customers VALUES
(1, ’2011-12-03′, 12500)
, (1, ’2011-10-07′, 15000)
, (1, ’2011-09-12′, 16000)
, (2, ’2011-07-11′, 900)
, (2, ’2011-09-13′, 18000)
, (2, ’2011-11-03′, 7000)
, (3, ’2011-01-29′, 19000)
, (3, ’2011-04-19′, 1000)
, (3, ’2011-07-09′, 12000) ;
Let’s say, we want to retrieve all records along with a column showing the customer’s date of first purchase.
We could start by computing the date of first purchase for each customer.
SELECT CustId, MIN(OrderDate) FirstPurchaseDate
FROM Customers
GROUP BY
CustId
Result set has one row per each customer.
Joining this to the original table will return the expected results.
SELECT c.*, firstD.FirstPurchaseDate
FROM Customers c
INNER JOIN
(
SELECT CustId, MIN(OrderDate) FirstPurchaseDate
FROM Customers
GROUP BY
CustId
)firstD
ON firstD.CustId = c.CustId
Result set:
The query plan uses a nested join because of the join we were using the query above.
We can achieve the same results using the FIRST_VALUE() function.
SELECT
c.CustId, c.OrderDate, c.OrderAmount
, FIRST_VALUE(OrderDate) OVER(PARTITION BY CustId ORDER BY OrderDate) CustomerSince
FROM
Customers c
[PARTITON BY CustId], creates a partiton for each customer. [ORDER BY OrderDate] sorts each partition by the OrderDate. Now, from this ordered result set, FIRST_VALUE() returns the first value for each partition.
The query is easier to read and a join is not needed. Result: the nested loop is eliminated in the query plan. This could improve performance.
I’ll conclude with a question for you to understand LAST_VALUE(). From the same fictitious customer data, let’s say we want to compute the recent purchase date (as opposed to first purchase date from above example) of every customer. What would be the query? I welcome your response in the comments below.