Speed Tips For Select Statements

You can code a SQL select statement in a number of ways to get

the same results, but some versions of the same query may perform better than others. In this

article we will look at ways to exploit this.

Here is a query that I cut-and-paste straight out of SQL 7

Books Online. The query runs in the Northwind database and is designed to pull

out the maximum unit price for each order in the database.

SELECT Ord.OrderID, Ord.OrderDate,
(    SELECT MAX(OrdDet.UnitPrice)
     FROM   Northwind.dbo.[Order Details] AS OrdDet
     WHERE Ord.OrderID = OrdDet.OrderID) AS MaxUnitPrice
FROM Northwind.dbo.Orders AS Ord

This type of query is called a Correllated Subquery – you

can see that there are two Select statements, the ‘main’ one (SELECT

Ord.OrderID, Ord.OrderDate ) which selects the order ID and date from the orders

table, and then the ‘sub’ query (in red), which

selects the maximum unit price for each order.

The ‘sub’ query is actually ran once for each row that the

main query returns, and this repeated access to the [order details] table can be

fairly inefficient.

Books Online goes on to say that queries like the one above

can usually be re-written as a straightforward join – the example below uses an

inner join between the Orders and [Order Details] table in association with the

MAX() function to produce exactly the same data, but more efficiently.

SELECT Ord.OrderID, Ord.OrderDate,
        MAX(OrdDet.UnitPrice) as maxUnitPrice
FROM    Northwind.dbo.[Order Details] AS OrdDet
inner join
        Northwind.dbo.Orders AS Ord
on      Ord.OrderID = OrdDet.OrderID
group   by Ord.OrderID, Ord.OrderDate

Although the same data is returned by both queries, Query

Analyzer indicated that the second

version took around 40% less SQL Server resources to run than the first, so no

prizes for guessing which is the preferred option. However, in some cases we can

use a third method to gain an even greater performance improvement.

Derived tables

Using a derived table is in effect like using a temporary

table, without the hassle of specifically creating and referring to one. I have

re-coded the BoL query to use derived tables in the example below:

SELECT Ord.OrderID,
       Ord.OrderDate,
       maxUnitPrice
FROM   Northwind.dbo.Orders AS Ord inner join
(
    SELECT orderID,
           MAX(UnitPrice) as maxUnitPrice
    FROM       Northwind.dbo.[Order Details]
    group      by OrderID
) AS   OrdDet
on     ordDet.orderID = Ord.orderID
order by Ord.OrderID desc,
       Ord.OrderDate,
       maxUnitPrice

The code in red causes the SQL Server optimiser to generate a

notional (or derived) table called OrdDet for the duration of the query. The

derived table notionally takes up much less space than the original [order

details] table, because it contains only two columns and only one detail row for

each order. Because of this, my ‘derived table’ version of the query should

run even faster than the Join version. When I checked the

execution plan for the derived table version against that of the

"join" version to see what sort of improvement I got, the results came

out....... exactly the same!

Heck!

I got no improvement there at all! Both queries generate the

same execution plan, and use the same amount of SQL Server’s resources to

return the data. So is my theory all blown to hell? Not quite….

The key to understanding why the derived table technique may

or may not produce a more efficient result is understanding the query optimiser.

The query optimiser looks at all SQL queries and works out the

most efficient way of accessing the tables used in the query, primarily by using

index statistics. While by re-coding my query I have given SQL Server a

different set of instructions for how to get at the data I want, SQL Server has

decided in both cases that the same method - or execution plan – is the

optimal one. This is always not the case though.

Following exactly the same principle, here is a Group By and Derived table query that produce

different execution plans to return exactly the same data. Again, both are for

the Northwind database:

select companyName,
       max(orderDate)
from   orders o inner join customers c
on     o.customerID = c.customerID
group by companyName
select companyName,
       max(orderDate)
from   customers c inner join (
       select customerID,

max(orderDate)as orderDate
       from   orders
       group by customerID) as o
on     o.customerID = c.customerID
group by companyName

This time the optimiser chose to use different execution

plans for the two queries, and the derived table version of the query comes up

with roughly 30% improvement in terms of resources used to run the query.

Wrapping up.

You can see from the examples that the Query Optimiser

sometimes needs a little help in picking the most efficient way to execute a

query. It’s worth coding up a couple of versions of critical queries and

comparing their performance characteristics to find the most efficient way of

doing things.

About the author

Neil Boyle is an independant SQL Server consultant working out of London,

England. Neil's free SQL Server guide is available on-line at

http://www.impetus-sql.co.uk

Speed Tips For Select Statements