It’s Friday and I’m ready for the weekend as I’m sure everyone else is. This weekend I’m looking forward to getting yard work done and browsing through the whoisactive SQL Ops Studio extension code.
I plan on writing a few posts on creating extensions; but, before we get to those I’d like to share a tip from one my my Co-workers at DBBest.
Recently we were doing a project that heavily focused on query tuning and many tables had various outer joins. My co-worker pointed out that many of these could be converted to an apply rather than a join.
Apply gives you both CROSS and OUTER. Think of CROSS APPLY like an INNER JOIN and OUTER APPLY like an OUTER JOIN.
Let’s compare some code to see how APPLY stacks up.
For this we’ll be comparing a derived table, CTE, and APPLY. We want to get the Amount for the last transaction for a certain range of customers.
First create the demo code:
create table #Users(UserID int not null primary key clustered,
UserName varchar(100),
Password varchar(100),
Fullname varchar(100))
create table #Transactions(TransactionID int identity not null primary key clustered,
UserID int,
Amount int)
;with Numbers as
(select top(100000) ROW_NUMBER() over(order by (select 1)) rn
from master..syscomments a
cross apply master..syscomments b
)
insert into #Users(UserID, UserName, Password, Fullname)
select rn, NEWID(), NEWID(), NEWID()
from Numbers
;with Numbers as
(select top(1000) ROW_NUMBER() over(order by (select 1)) rn
from master..syscomments a
cross apply master..syscomments b
)
insert into #Transactions(UserID, Amount)
select UserID, rn*40
from (select top 90 percent UserID
from #Users) u
cross apply (select top(cast(rand(checksum(newid()))*100 + 1 as int)) rn
from Numbers
where UserID > 0) n
Next, we want to run the queries with STATISTICS IO ON.
select u.UserID, t1.Amount
from #Users u
inner join (select UserID, max(TransactionID) LastiD
from #Transactions
group by UserID) t on t.UserID = u.UserID
inner join #Transactions t1 on TransactionID = LastiD
where u.UserID between 10 and 40
;with Trans as
(select u.UserID, t.Amount, row_number() over(partition by u.UserID order by t.TransactionID desc) rn
from #Users u
inner join #Transactions t on t.UserID = u.UserID
where u.UserID between 10 and 40
)
select UserID, Amount
from Trans
where rn = 1
select u.UserID, t.Amount
from #Users u
cross apply (select top 1 Amount
from #Transactions t
where t.UserID = u.UserID
order by t.TransactionID desc) t
where UserID between 10 and 40
The results are quite surprising indeed.
The query plan shows a much lower cost for the APPLY and the IO reports slightly fewer as well.
What if we look at “real” data. I mean demo data is fine and dandy but it’s often not representative of the real world. I could fill a database with realistic data using dbForge Data Generator but for this scenario I’ll use AdventureWorks2012. HAHA! Yes, I know I said “real-world data” but let’s look at AdventureWorks anyway.
Here’s the code:
*note that some of you may point out that a join isn’t needed at all since CustomerID is in Sales.SalesOrderHeader. I typed this up and then thought about it and let’s say instead you want to use u.AccountNumber. The query results are the same and now you have reason to use the Sales.Customer table.
select u.CustomerID, t1.TotalDue
from Sales.Customer u
inner join (select CustomerID, max(SalesOrderID) as LastSalesOrderID
from Sales.SalesOrderHeader
group by CustomerID) t on t.CustomerID = u.CustomerID
inner join Sales.SalesOrderHeader t1 on t1.SalesOrderID = t.LastSalesOrderID
WHERE u.CustomerID between 11019 and 11174
;with Trans as
(select u.CustomerID, t.TotalDue, row_number() over(partition by u.CustomerID order by t.SalesOrderID desc) as rn
from Sales.Customer u
inner join Sales.SalesOrderHeader t on t.CustomerID = u.CustomerID
WHERE u.CustomerID between 11019 and 11174
)
select CustomerID, TotalDue
from Trans
where rn = 1
select u.CustomerID, t.TotalDue
from Sales.Customer u
cross apply (select top 1 TotalDue
from Sales.SalesOrderHeader t
where t.CustomerID = u.CustomerID
order by t.SalesOrderID desc) t
WHERE u.CustomerID between 11019 and 11174
We’re still looking at a specific group of customers and getting a list of their totaldue for the last transaction. How does it turn out?
This shows that the CTE is actually the lowest cost; but, what if we look at the CPU and IO used with SET STATISTICS?
What we find is that the APPLY uses 1 fewer IO and elapsed time is consistently 0 ms. The CTE has 1 more IO and is consistently greater than 0 ms and the derived table has significantly greater IO.
What does this mean? Two things:
- Don’t always trust cost in a query plan
- APPLY can be used to enhance some queries. Note that your mileage will vary and you should always test before blindly making changes.
A good example of where you probably shouldn’t use apply is when we change the queries to look between certain dates. Date is on the joined table not the customer and this makes a big difference.
select u.CustomerID, t1.TotalDue
from Sales.Customer u
inner join (select CustomerID, max(SalesOrderID) as LastSalesOrderID
from Sales.SalesOrderHeader
where OrderDate between ‘2008-01-01’ and ‘2008-03-01’
group by CustomerID) t on t.CustomerID = u.CustomerID
inner join Sales.SalesOrderHeader t1 on t1.SalesOrderID = t.LastSalesOrderID
;with Trans as
(select u.CustomerID, t.TotalDue, row_number() over(partition by u.CustomerID order by t.SalesOrderID desc) as rn
from Sales.Customer u
inner join Sales.SalesOrderHeader t on t.CustomerID = u.CustomerID
where t.OrderDate between ‘2008-01-01’ and ‘2008-03-01’
)
select CustomerID, TotalDue
from Trans
where rn = 1
select u.CustomerID, t.TotalDue
from Sales.Customer u
cross apply (select top 1 TotalDue
from Sales.SalesOrderHeader t
where t.CustomerID = u.CustomerID
and t.OrderDate between ‘2008-01-01’ and ‘2008-03-01’
order by t.SalesOrderID desc) t
Note that the APPLY in this scenario is dreadful; so, be sure test!
I hope you found this article useful and it adds to your query tuning kit. Have a great weekend!