Over the past few weeks, I’ve been training developers in what they really need to know about SQL Server to produce efficient and robust SQL code. One of the areas we looked at was the “Index Lookup” aka “Bookmark Lookup”. An index lookup occurs when a non-clustered index seek is used to find the required rows, but to fully resolve the query data that is not held in the index is needed, and so a “Lookup” of the row in the table data is instigated. If the table has a clustered index then the clustered key is used to drive the lookup, else a rowid is used.
If you execute this :
select LastName,BusinessEntityID
from Person.Person
where LastName = 'Krane'
Then the execution plan will be
The query has been fully resolved using the non-clustered index “IX_Person_LastName_FirstName_MiddleName”. BusinessEntityId is the clustering key, this means that it will be implicitly included within any non-clustered index on Person.Person.
If we add “FirstName” to our selected columns , the plan will be
Exactly the same as before, the FirstName column has been returned by using the data within the index. However, if we add the column ‘RowGuid’ into our query:
select LastName,Firstname,BusinessEntityID,rowguid
from Person.Person
where LastName = 'Krane'
We have an additional Key Lookup Operator added. The data for RowGuid is not contained within the index itself and so the engine has had to find and pull it from the base table data.
The question that I was asked about this is “Why is the Lookup so slow ?”. The answer to that is, “Its not, you just have to appreciate the amount of work that is required”. The point is that by looking up in the non-clustered index, you may well find that all the required data is on a single 8k page. Let us assume that you have 100 rows of data on this one single page. Now, for each of those rows the engine will have to instigate a lookup. So, as there are 100 rows thats 100 lookups, the engine will now have to find and read 100 pages of table data. So, ‘slow’ it is not, doing a lot more work it is.
The way to mitigate this lookup overhead is to create a “covering” index. This means adding the required data, in the case above rowguid, explicitly into the index and so avoiding the lookup. In 2005, the INCLUDE clause was added to SQLServer, so that although the data is held within the index, it does not form part of the key and so the overhead of maintaining the column with the key itself is avoided.
There are some scenarios where you may be prevented from doing this for operational reason, DML performance or schema ownership issues , are just two but , if the existing indexes are ‘aligned’ then we can still avoid the overhead of reading the base data. We will still have to lookup *some* data, but as the clustered index is the widest set of data we can still improve performance by lookup up in a different non-clustered index.
Lets first build some test data and define some indexes.
Create Table Lookup
(
Id integer identity Primary Key,
Status integer not null,
ChildId integer not null,
ReadData integer not null,
Filler char(512) not null
)
go
with cteData
as(
Select top(3000000)
ROW_NUMBER() over (Order by (select null)) as Rown
from sys.columns a
cross join sys.columns b
cross join sys.columns c
)
insert into Lookup(Status,ChildId,ReadData,Filler)
select Rown%100000,
abs(CHECKSUM(newid())),
Rown%10000,
'djdjdjdjdjd'
from cteData
go
create index idxLookupStatus on Lookup(status) include (childid)
go
create index idxChild on Lookup(childid) include (ReadData)
go
select ReadData
from Lookup
where Status = 10000
select KeyLookup.ReadData
from Lookup
join Lookup KeyLookup
on Lookup.Id = KeyLookup.Id
where Lookup.Status = 10000
produces the plan of
select KeyLookup.ReadData
from Lookup
join Lookup KeyLookup
on Lookup.Id = KeyLookup.Id
and Lookup.ChildId = KeyLookup.ChildId
where Lookup.Status = 10000
103 Logical IO’s a nice drop of 25%.