September 28, 2017 at 8:38 am
Hi,
I am working with Window Functions and am trying to understand their behaviour a bit better. The database I am using is the TSQL2012 database from Itzik Ben-Gans book "Inside Microsoft SQL Server 2008: T-SQL Querying". I am running SQL Server 2017.
By the way......the book is awesome! 🙂
Sorry the code isn't copyable. My browser keeps crashing when I try to copy and paste code from SSMS.
This is my table:
create table sales.orders (
orderid bigint,
custid int
)
The table is filled with data whereby each customer has several orderids against their unique custid.
I want to find the highest order number for each custid. This query produces the required results:
The result set:
So far so good. Now I want to rewrite the query to take advantage of a Window Function:
To me the query looks good but I get the following result set:
I can't understand why I am getting a different result set from this query.
Can someone explain what I am doing wrong?
Regards,
Kev
September 28, 2017 at 8:45 am
Kev
The second query has no WHERE clause, so you'll get one row in your result set for every row in the table. The first query filters out rows so that you are only left with the highest orderid per customer.
John
September 28, 2017 at 8:51 am
The first query is simulating an aggregate function. It's not returning all rows, just one for each customer. The second query will return one row per order.
You don't need group by when using window functions. But you would need to correct the group by and remove the OVER clause to get the correct result.
September 28, 2017 at 9:07 am
I'm now sure why you want to use a Window Function for this task. Would a GROUP BY not achieve what you're looking for instead?SELECT MAX(o.orderid) AS orderid,
o.custid
FROM Sales.Orders o
GROUP BY o.custid
ORDER BY o.custid ASC;
Using a Window Function, as you saw, will bring back lots of rows for a single customer because they have many orders; the Window function does not eliminate/group these, it simply let's you know the MAX order number for that customer on every row.
Edit:"Windows" Function, heh. :hehe:
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
September 28, 2017 at 9:09 am
Quick thought, in the case of the aggregated results, there is no benefit of using Window functions.
😎
September 28, 2017 at 9:23 am
Thom A - Thursday, September 28, 2017 9:07 AMI'm now sure why you want to use a Window Function for this task. Would a GROUP BY not achieve what you're looking for instead?SELECT MAX(o.orderid) AS orderid,
o.custid
FROM Sales.Orders o
GROUP BY o.custid
ORDER BY o.custid ASC;
Using a Window Function, as you saw, will bring back lots of rows for a single customer because they have many orders; the Window function does not eliminate/group these, it simply let's you know the MAX order number for that customer on every row.Edit:"Windows" Function, heh. :hehe:
September 28, 2017 at 11:51 am
Much simpler query:SELECT MAX(o.orderid) OVER(PARTITION BY o.custid ORDER BY o.orderid) AS LastOrderId,
o.custid
FROM Sales.Orders AS o
Steve (aka sgmunson) 🙂 🙂 🙂
Rent Servers for Income (picks and shovels strategy)
September 29, 2017 at 4:22 am
So, this is how I solved it first of all....
selectdistinct(max(o.orderid) over (partition by o.custid order by o.custid)) as LastOrderId
,o.custid
from Sales.Orders o
order by o.custid asc
;
It works but when I look at the execution plan.....
I know that there are many of you out there that cringed when they saw the distinct but I wanted to see the difference between the Window Function and a normal GROUP BY:
SELECT
MAX(o.orderid) AS orderid,
o.custid
FROM Sales.Orders o
GROUP BY o.custid
ORDER BY o.custid ASC;
Although it is still a scan and not a seek, the improvement is clear and this was what I was looking for.
@Eirikur: The majority of examples I have seen have used aggregate functions and that is something I wanted to understand a bit more. I know that there are some pretty elegant Window Functions available (LEAD, LAG and so on) but if it can be used then I don't make the assumption that it should be used or that there isn't a better solution. That is the reason why I wanted to get a better look at this particular problem. But now my question: If there is no benefit to generating aggregates using Window Functions then where do the real benefits lie?
@sgmunson: This was my first solution (although I put a GROUP BY in it) but it didn't produce the required results; the highest orderid that a single custid has (see my very first post). The post resulted from my confusion about why it didn't produce the required results.
September 29, 2017 at 4:42 am
Although it is still a scan and not a seek, the improvement is clear and this was what I was looking for.
That's to be expected. There's no filtering on your query, so SQL Server has to scan the entire index to get all the results. On the other hand, if you add a WHERE then it changes to a seek. For example:USE Sandbox;
GO
CREATE TABLE orders (orderid int PRIMARY KEY CLUSTERED,
custid int);
GO
CREATE INDEX idx_custid ON orders (custid ASC);
GO
INSERT INTO orders
VALUES (12,324),
(17,324),
(123,324),
(435,324),
(546,123),
(153,123),
(654,123),
(14,22),
(234,22),
(54,65),
(6573,6),
(489,6),
(694,6),
(21,567),
(23,567);
GO
SELECT *
FROM orders;
SELECT *
FROM orders
WHERE orderid = 23;
SELECT MAX(orderid) AS orderid, custid
FROM orders
GROUP BY custid;
SELECT MAX(orderid) AS orderid, custid
FROM orders
WHERE custid > 100
GROUP BY custid;
GO
DROP TABLE orders;
GO
(BE CAREFUL, I have a drop clause in there).
Notice, the first query (just the Select *) uses a Index Scan on the Clustered Index. The second, a Seek on the Clustered Index. The latter two use a Scan and Seek respectively as well on the NonClustered Index. This is due to the 2 queries with a Seek having a predicate on one of the columns. Without the predicate in both queries SQL Server has to get the result for every row; hence the Scan.
Thom~
Excuse my typos and sometimes awful grammar. My fingers work faster than my brain does.
Larnu.uk
September 29, 2017 at 5:02 am
kevaburg - Friday, September 29, 2017 4:22 AMAlthough it is still a scan and not a seek, the improvement is clear and this was what I was looking for.
That's to be expected. There's no filtering on your query, so SQL Server has to scan the entire index to get all the results. On the other hand, if you add a WHERE then it changes to a seek. For example:USE Sandbox;
GO
CREATE TABLE orders (orderid int PRIMARY KEY CLUSTERED,
custid int);
GO
CREATE INDEX idx_custid ON orders (custid ASC);
GO
INSERT INTO orders
VALUES (12,324),
(17,324),
(123,324),
(435,324),
(546,123),
(153,123),
(654,123),
(14,22),
(234,22),
(54,65),
(6573,6),
(489,6),
(694,6),
(21,567),
(23,567);
GO
SELECT *
FROM orders;
SELECT *
FROM orders
WHERE orderid = 23;
SELECT MAX(orderid) AS orderid, custid
FROM orders
GROUP BY custid;
SELECT MAX(orderid) AS orderid, custid
FROM orders
WHERE custid > 100
GROUP BY custid;
GO
DROP TABLE orders;
GO
(BE CAREFUL, I have a drop clause in there).
Notice, the first query (just the Select *) uses a Index Scan on the Clustered Index. The second, a Seek on the Clustered Index. The latter two use a Scan and Seek respectively as well on the NonClustered Index. This is due to the 2 queries with a Seek having a predicate on one of the columns. Without the predicate in both queries SQL Server has to get the result for every row; hence the Scan.
OK, I understand that. With the production tables I have worked with it was rarely the case that we retrieved the entire table in a query so in producing a subset of data with a sensible index on it I would see a seek rather than a scan. Certainly something to think about when considering how much data is actually required.
September 29, 2017 at 12:03 pm
kevaburg - Thursday, September 28, 2017 8:38 AMI can't understand why I am getting a different result set from this query.
Can someone explain what I am doing wrong?
Regards,
Kev
It may help you to see what you are doing wrong by including o.orderid in your columns selected. The query you tried essentialy says "for every unique combination of custid & orderid, get me the most recent orderid". Each unique combination is looking into a "window" for the most recent orderid for the customer.
Wes
(A solid design is always preferable to a creative workaround)
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply