The Reverse Optimization Challenge #1

  • We all spend a lot of time and effort eradicating RBAR and optimizing queries. This thread gives you a chance to do the opposite. We'll start with a very simple query out of the Northwind Database, and each poster can change something about it to make it increasingly less efficient. There are only a few rules

    1. You can only use the 4 northwind tables (Customers, Products, Orders and Order Details) and the system tables.

    2. You cannot do anything that serves absolutely no purpose or is completely time wasting... for example, you can't create a loop to generate random numbers until you get a particular number before continuing.

    3. Explain what you added, and why it's less efficient than it was before / why you shouldn't do it in this scenario.

    Beyond these rules, any other T-SQL commands are allowed... but the code should still work after you're done and return the correct result set. This post may be fun and amusing, or it may go completely unread / die out quickly. Guess we'll find out. :hehe: Here is the base query, the 2 commented out lines allow for the first chances at bad query design.

    [font="Courier New"]

    SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, SUM(OD.UnitPrice) * Quantity TotalPrice

       -- Count of Number of Unique Products Per Order Per Customer

       -- Average Cost of item on the order (Total Price of Order / Total of All products * their quantities)

    FROM Orders O

       INNER JOIN [Order Details] OD ON O.OrderID = OD.OrderID

       INNER JOIN Customers C ON O.CustomerID = C.CustomerID

       INNER JOIN Products P ON OD.ProductID = P.ProductID

    GROUP BY CompanyName,  OrderDate, ProductName, OD.UnitPrice, Quantity[/font]

    For those who want to participate and do not have the Northwind database installed on their systems, I've scripted the table DDL into the attachment below. I didn't include all the data, because it'd probably be easier just to install the DB at that point.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • why not....

    Hint hell:

    SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, SUM(OD.UnitPrice) * Quantity TotalPrice

    FROM Orders O with (INDEX = CustomerID)

    INNER LOOP JOIN [Order Details] OD WITH (INDEX = OrderID) ON O.OrderID = OD.OrderID

    INNER MERGE JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID

    INNER MERGE JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID

    GROUP BY CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity

    ORDER BY CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity

    OPTION (HASH GROUP)

    duration doubles

    Why you shouldn't do this: The optimiser usually knows best. Hints should only be applied if there's no other way to get the query to perform, if you know exactly what the hint is going to do, exactly why you're applying it and you are 120% sure you know better than the optimiser.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • This sounds like fun.

    I'll see your Hints Hell, and I'll raise you a Windowed Aggregate function with a misapplied distinct:

    select distinct * from

    (

    SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, SUM(OD.UnitPrice) over (partition by CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity) * Quantity TotalPrice

    FROM Orders O with (INDEX = CustomerID)

    INNER LOOP JOIN [Order Details] OD WITH (INDEX = OrderID) ON O.OrderID = OD.OrderID

    INNER MERGE JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID

    INNER MERGE JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID

    ) s

    ORDER BY CompanyName, OrderDate, ProductName,UnitPrice, Quantity

    OPTION (HASH GROUP)

    Double the time from Gail's (4x the original), and worktable logical reads are through the roof.

    Why not to do this:

    While windowed aggregate functions allow to to derive similar information to a "regular" aggregate function at first glance, it performs the action FOR EACH ROW within the candidate resultset. So - unless you need to derive info correlating the detail row to the grouped aggregate (or comparing multiple groupings to each other), Windowed Aggregates can be a huge drag. ?Also - in general, the new aggregate functions (PIVOT and the windowed aggregates) all seem to be heavy abusers of worktables, so they will jack up your IO stats.

    ----------------------------------------------------------------------------------
    Your lack of planning does not constitute an emergency on my part...unless you're my manager...or a director and above...or a really loud-spoken end-user..All right - what was my emergency again?

  • Gail, your index choices and join choices were 'too good'! 🙂 Try this one:

    SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, SUM(OD.UnitPrice) over (partition by CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity) * Quantity TotalPrice

    FROM Orders O with (INDEX = EmployeeID)

    INNER LOOP JOIN [Order Details] OD WITH (INDEX = ProductID) ON O.OrderID = OD.OrderID

    INNER LOOP JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID

    INNER LOOP JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID

    on my box, original duration was 0.2sec with 37 IOs

    That version is 1.2 sec (6X) and 53458 IOs (1444X)

    Usually it is best to let the optimizer pick both the join types AND the indexing. If you are going to force Indexes, pick ones that are useful for either the join or the where clause.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • TheSQLGuru (11/10/2008)


    Gail, your index choices and join choices were 'too good'! 🙂

    I was trying to see how many sorts I could get in a single query. Answer, quite a few...:D

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • Added in the two extra calculations I had only comments for in my original query as correlated subqueries... one as a CASED correlated subquery.

    Temporarily removed the Windowed function and the hash group, as they wouldn't compile on my 2K box.

    [font="Courier New"]SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity,

       SUM(OD.UnitPrice) S, --over (partition by CompanyName,  OrderDate, ProductName, OD.UnitPrice, Quantity) * Quantity TotalPrice,

       -- Count of Number of Unique Products Per Order Per Customer

       (SELECT COUNT(DISTINCT ProductID) FROM [Order Details] ODI WHERE ODI.OrderID = O.OrderID) DistinctProductCount,

       -- Average Cost of item on the order (Total Price of Order / Total of All products * their quantities)

       AvgCost = CASE WHEN

           ABS((SELECT SUM(Quantity) FROM [Order Details] ODI2 WHERE ODI2.OrderID = O.OrderID)) > 0

               THEN

           (SELECT SUM(UnitPrice * Quantity) / SUM(Quantity) FROM [Order Details] ODI3 WHERE ODI3.OrderID = O.OrderID)

               ELSE

           0.00

               END

              

    FROM Orders O WITH (INDEX = EmployeeID)

            INNER LOOP JOIN [Order Details] OD WITH  (INDEX = ProductID) ON O.OrderID = OD.OrderID

            INNER MERGE JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID

            INNER MERGE JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID

    GROUP BY CompanyName, O.OrderID, OrderDate, ProductName, OD.UnitPrice, Quantity

    ORDER BY CompanyName,  OrderDate, ProductName,OD.UnitPrice, Quantity

    --OPTION (HASH GROUP)[/font]

    Correlated subqueries CAN (I believe the optimizer sometimes converts them to joins / derived tables for you) be run Row by Row, which is very slow. The same query could be written as derived tables which are only read once. Also, the CASED subquery reads a table twice when it should read only once. The CASE could easily be in a single subquery that if Quantity = 0 then 0.

    Definitely need to see a cursor in this thing somehow before we're done!

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • If you want a 'cursor', simply create a UDF that takes an OrderID and calculates the SUM in the output. I leave it to the inquisitive reader to implement this and provide read/duration analysis. 😎

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Garadin (11/10/2008)


    Correlated subqueries CAN (I believe the optimizer sometimes converts them to joins / derived tables for you) be run Row by Row, which is very slow.

    More correctly, the optimiser converts them most of the time. The only time I've managed to get a correlated subquery to be executed row by row is when there's an inequality in the expression that relates it to the outer query.

    On my 2008 server, this query does not run either of the subqueries row by row. Easy to see, the number of executes on the scans of Order Details is 1. (2 if I don't add a maxdop 1, but that's a parallelism thing). If it was truely been executed multiple times, the scan count of order details would be higher (the number of rows in the outer result set, which is 2155), or there would be an index spool/table spool operator in place to facilitate the multiple executions.

    Gail Shaw
    Microsoft Certified Master: SQL Server, MVP, M.Sc (Comp Sci)
    SQL In The Wild: Discussions on DB performance with occasional diversions into recoverability

    We walk in the dark places no others will enter
    We stand on the bridge and no one may pass
  • are we allowed to use temp tables for looping? If the loop has a purpose?

    ----------------------------------------------
    Try to learn something about everything and everything about something. - Thomas Henry Huxley

    :w00t:
    Posting Best Practices[/url]
    Numbers / Tally Tables[/url]

    SQL-4-Life
  • GilaMonster (11/10/2008)


    Garadin (11/10/2008)


    Correlated subqueries CAN (I believe the optimizer sometimes converts them to joins / derived tables for you) be run Row by Row, which is very slow.

    More correctly, the optimiser converts them most of the time. The only time I've managed to get a correlated subquery to be executed row by row is when there's an inequality in the expression that relates it to the outer query.

    On my 2008 server, this query does not run either of the subqueries row by row. Easy to see, the number of executes on the scans of Order Details is 1. (2 if I don't add a maxdop 1, but that's a parallelism thing). If it was truely been executed multiple times, the scan count of order details would be higher (the number of rows in the outer result set, which is 2155), or there would be an index spool/table spool operator in place to facilitate the multiple executions.

    Good to know. So long as we're on the topic, two more questions.

    Should the CASE statement actually be different as written than if written with a single select?

    If you use a derived table for the SUM aggregate without criteria (Select OrderID, MAX(UnitPrice) FROM Orders) ON ... , would it attempt to resolve a MAX for all orders in the table, or only for ones returned by your outer query? Should you add the same criteria from your outer query into the derived table?

    Christopher Stobbs (11/10/2008)


    are we allowed to use temp tables for looping? If the loop has a purpose?

    Absolutely.

    Seth Phelabaum


    Consistency is only a virtue if you're not a screwup. 😉

    Links: How to Post Sample Data[/url] :: Running Totals[/url] :: Tally Table[/url] :: Cross Tabs/Pivots[/url] :: String Concatenation[/url]

  • NOW we are starting to get somewhere! :w00t:

    alter function dbo.fn_totalprice (@orderid varchar(max))

    returns varchar(max)

    as

    BEGIN

    declare @result money

    select @result = sum(UnitPrice * Quantity) from [Order Details] with (index = productid) where orderid = @orderid

    return cast(@result as varchar(max))

    end

    SELECT CompanyName, OrderDate, ProductName, OD.UnitPrice, Quantity, cast(dbo.fn_TotalPrice(o.orderid) as money) as TotalPrice

    FROM Orders O with (INDEX = EmployeeID)

    INNER LOOP JOIN [Order Details] OD WITH (INDEX = ProductID) ON O.OrderID = OD.OrderID

    INNER LOOP JOIN Customers C WITH (INDEX = Region) ON O.CustomerID = C.CustomerID

    INNER LOOP JOIN Products P WITH (INDEX = CategoryID) ON OD.ProductID = P.ProductID

    order by companyname, orderdate, productname, od.unitprice, od.quantity, dbo.fn_totalprice(o.orderid)

    Bad stuff:

    1) Use of order by that is unnecessary

    2) UDF function with ancillary data access

    3) inappropriate datatypes (varchar(max) was a good bit worse than char(7950), which took 221K reads)

    This set has 391K IOs (10567X or FOUR orders of magnitude higher than baseline), 28.5 seconds runtime (142.5X) and an astounding 10775 WRITES (ZERO for original query)! Query cost is up to 11.1 (61X increase). Not previously reported was the CPU usage: 15 for original query and 10359 now (690X).

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Heh... remove the ANSI joins and add old fashioned FROM's separated by comma's with no WHERE clause to cause a join. Then, add a distinct or a group by to resolve the duplication.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

Viewing 12 posts - 1 through 11 (of 11 total)

You must be logged in to reply to this topic. Login to reply