November 8, 2008 at 9:20 am
We all spend a lot of time and effort eradicating RBAR and optimizing queries. This thread gives you a chance to do the opposite. We'll start with a very simple query out of the Northwind Database, and each poster can change something about it to make it increasingly less efficient. There are only a few rules
1. You can only use the 4 northwind tables (Customers, Products, Orders and Order Details) and the system tables.
2. You cannot do anything that serves absolutely no purpose or is completely time wasting... for example, you can't create a loop to generate random numbers until you get a particular number before continuing.
3. Explain what you added, and why it's less efficient than it was before / why you shouldn't do it in this scenario.
Beyond these rules, any other T-SQL commands are allowed... but the code should still work after you're done and return the correct result set. This post may be fun and amusing, or it may go completely unread / die out quickly. Guess we'll find out. :hehe: Here is the base query, the 2 commented out lines allow for the first chances at bad query design.
[font="Courier New"]
SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, SUM(OD.UnitPrice) * Quantity TotalPrice
-- Count of Number of Unique Products Per Order Per Customer
-- Average Cost of item on the order (Total Price of Order / Total of All products * their quantities)
FROM Orders O
INNER JOIN [Order Details] OD ON O.OrderID = OD.OrderID
INNER JOIN Customers C ON O.CustomerID = C.CustomerID
INNER JOIN Products P ON OD.ProductID = P.ProductID
GROUP BY CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity[/font]
For those who want to participate and do not have the Northwind database installed on their systems, I've scripted the table DDL into the attachment below. I didn't include all the data, because it'd probably be easier just to install the DB at that point.
November 8, 2008 at 3:20 pm
why not....
Hint hell:
SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, SUM(OD.UnitPrice) * Quantity TotalPrice
FROM Orders O with (INDEX = CustomerID)
INNER LOOP JOIN [Order Details] OD WITH (INDEX = OrderID) ON O.OrderID = OD.OrderID
INNER MERGE JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID
INNER MERGE JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID
GROUP BY CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity
ORDER BY CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity
OPTION (HASH GROUP)
duration doubles
Why you shouldn't do this: The optimiser usually knows best. Hints should only be applied if there's no other way to get the query to perform, if you know exactly what the hint is going to do, exactly why you're applying it and you are 120% sure you know better than the optimiser.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 8, 2008 at 9:17 pm
This sounds like fun.
I'll see your Hints Hell, and I'll raise you a Windowed Aggregate function with a misapplied distinct:
select distinct * from
(
SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, SUM(OD.UnitPrice) over (partition by CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity) * Quantity TotalPrice
FROM Orders O with (INDEX = CustomerID)
INNER LOOP JOIN [Order Details] OD WITH (INDEX = OrderID) ON O.OrderID = OD.OrderID
INNER MERGE JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID
INNER MERGE JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID
) s
ORDER BY CompanyName, OrderDate, ProductName,UnitPrice, Quantity
OPTION (HASH GROUP)
Double the time from Gail's (4x the original), and worktable logical reads are through the roof.
Why not to do this:
While windowed aggregate functions allow to to derive similar information to a "regular" aggregate function at first glance, it performs the action FOR EACH ROW within the candidate resultset. So - unless you need to derive info correlating the detail row to the grouped aggregate (or comparing multiple groupings to each other), Windowed Aggregates can be a huge drag. ?Also - in general, the new aggregate functions (PIVOT and the windowed aggregates) all seem to be heavy abusers of worktables, so they will jack up your IO stats.
----------------------------------------------------------------------------------
Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?
November 10, 2008 at 8:15 am
Gail, your index choices and join choices were 'too good'! 🙂 Try this one:
SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, SUM(OD.UnitPrice) over (partition by CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity) * Quantity TotalPrice
FROM Orders O with (INDEX = EmployeeID)
INNER LOOP JOIN [Order Details] OD WITH (INDEX = ProductID) ON O.OrderID = OD.OrderID
INNER LOOP JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID
INNER LOOP JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID
on my box, original duration was 0.2sec with 37 IOs
That version is 1.2 sec (6X) and 53458 IOs (1444X)
Usually it is best to let the optimizer pick both the join types AND the indexing. If you are going to force Indexes, pick ones that are useful for either the join or the where clause.
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 10, 2008 at 9:15 am
TheSQLGuru (11/10/2008)
Gail, your index choices and join choices were 'too good'! 🙂
I was trying to see how many sorts I could get in a single query. Answer, quite a few...:D
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 10, 2008 at 9:53 am
Added in the two extra calculations I had only comments for in my original query as correlated subqueries... one as a CASED correlated subquery.
Temporarily removed the Windowed function and the hash group, as they wouldn't compile on my 2K box.
[font="Courier New"]SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity,
SUM(OD.UnitPrice) S, --over (partition by CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity) * Quantity TotalPrice,
-- Count of Number of Unique Products Per Order Per Customer
(SELECT COUNT(DISTINCT ProductID) FROM [Order Details] ODI WHERE ODI.OrderID = O.OrderID) DistinctProductCount,
-- Average Cost of item on the order (Total Price of Order / Total of All products * their quantities)
AvgCost = CASE WHEN
ABS((SELECT SUM(Quantity) FROM [Order Details] ODI2 WHERE ODI2.OrderID = O.OrderID)) > 0
THEN
(SELECT SUM(UnitPrice * Quantity) / SUM(Quantity) FROM [Order Details] ODI3 WHERE ODI3.OrderID = O.OrderID)
ELSE
0.00
END
FROM Orders O WITH (INDEX = EmployeeID)
INNER LOOP JOIN [Order Details] OD WITH (INDEX = ProductID) ON O.OrderID = OD.OrderID
INNER MERGE JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID
INNER MERGE JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID
GROUP BY CompanyName, O.OrderID, OrderDate, ProductName, OD.UnitPrice, Quantity
ORDER BY CompanyName, OrderDate, ProductName,OD.UnitPrice, Quantity
--OPTION (HASH GROUP)[/font]
Correlated subqueries CAN (I believe the optimizer sometimes converts them to joins / derived tables for you) be run Row by Row, which is very slow. The same query could be written as derived tables which are only read once. Also, the CASED subquery reads a table twice when it should read only once. The CASE could easily be in a single subquery that if Quantity = 0 then 0.
Definitely need to see a cursor in this thing somehow before we're done!
November 10, 2008 at 10:22 am
If you want a 'cursor', simply create a UDF that takes an OrderID and calculates the SUM in the output. I leave it to the inquisitive reader to implement this and provide read/duration analysis. 😎
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 10, 2008 at 10:28 am
Garadin (11/10/2008)
Correlated subqueries CAN (I believe the optimizer sometimes converts them to joins / derived tables for you) be run Row by Row, which is very slow.
More correctly, the optimiser converts them most of the time. The only time I've managed to get a correlated subquery to be executed row by row is when there's an inequality in the expression that relates it to the outer query.
On my 2008 server, this query does not run either of the subqueries row by row. Easy to see, the number of executes on the scans of Order Details is 1. (2 if I don't add a maxdop 1, but that's a parallelism thing). If it was truely been executed multiple times, the scan count of order details would be higher (the number of rows in the outer result set, which is 2155), or there would be an index spool/table spool operator in place to facilitate the multiple executions.
Gail Shaw
Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability
November 10, 2008 at 10:31 am
are we allowed to use temp tables for looping? If the loop has a purpose?
----------------------------------------------
Try to learn something about everything and everything about something. - Thomas Henry Huxley
:w00t:
Posting Best Practices[/url]
Numbers / Tally Tables[/url]
November 10, 2008 at 10:40 am
GilaMonster (11/10/2008)
Garadin (11/10/2008)
Correlated subqueries CAN (I believe the optimizer sometimes converts them to joins / derived tables for you) be run Row by Row, which is very slow.More correctly, the optimiser converts them most of the time. The only time I've managed to get a correlated subquery to be executed row by row is when there's an inequality in the expression that relates it to the outer query.
On my 2008 server, this query does not run either of the subqueries row by row. Easy to see, the number of executes on the scans of Order Details is 1. (2 if I don't add a maxdop 1, but that's a parallelism thing). If it was truely been executed multiple times, the scan count of order details would be higher (the number of rows in the outer result set, which is 2155), or there would be an index spool/table spool operator in place to facilitate the multiple executions.
Good to know. So long as we're on the topic, two more questions.
Should the CASE statement actually be different as written than if written with a single select?
If you use a derived table for the SUM aggregate without criteria (Select OrderID, MAX(UnitPrice) FROM Orders) ON ... , would it attempt to resolve a MAX for all orders in the table, or only for ones returned by your outer query? Should you add the same criteria from your outer query into the derived table?
Christopher Stobbs (11/10/2008)
are we allowed to use temp tables for looping? If the loop has a purpose?
Absolutely.
November 10, 2008 at 1:17 pm
NOW we are starting to get somewhere! :w00t:
alter function dbo.fn_totalprice (@orderid varchar(max))
returns varchar(max)
as
BEGIN
declare @result money
select @result = sum(UnitPrice * Quantity) from [Order Details] with (index = productid) where orderid = @orderid
return cast(@result as varchar(max))
end
SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, cast(dbo.fn_TotalPrice(o.orderid) as money) as TotalPrice
FROM Orders O with (INDEX = EmployeeID)
INNER LOOP JOIN [Order Details] OD WITH (INDEX = ProductID) ON O.OrderID = OD.OrderID
INNER LOOP JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID
INNER LOOP JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID
order by companyname, orderdate, productname, od.unitprice, od.quantity, dbo.fn_totalprice(o.orderid)
Bad stuff:
1) Use of order by that is unnecessary
2) UDF function with ancillary data access
3) inappropriate datatypes (varchar(max) was a good bit worse than char(7950), which took 221K reads)
This set has 391K IOs (10567X or FOUR orders of magnitude higher than baseline), 28.5 seconds runtime (142.5X) and an astounding 10775 WRITES (ZERO for original query)! Query cost is up to 11.1 (61X increase). Not previously reported was the CPU usage: 15 for original query and 10359 now (690X).
Best,
Kevin G. Boles
SQL Server Consultant
SQL MVP 2007-2012
TheSQLGuru on googles mail service
November 10, 2008 at 5:57 pm
Heh... remove the ANSI joins and add old fashioned FROM's separated by comma's with no WHERE clause to cause a join. Then, add a distinct or a group by to resolve the duplication.
--Jeff Moden
Change is inevitable... Change for the better is not.
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply