Index Strategy

Question

Index Strategy

BSavoie

SSCertifiable

Points: 6309
More actions
March 23, 2012 at 9:25 pm

#270723

I have a big table (4 million records or so). It has FirstName, LastName among other things. The front end guys created a UI that lets the user type in a partial full name: "bob sm" which they expect to find bob smith, bob smothers, bob smart, etc., and in the sproc that handles the search, this translates to where FirstName + ' ' + LastName like 'bob sm%'. We have an index on FirstName, LastName.
In an effort to improve performance, I also created a computed column on FirstName + LastName and indexed that column thinking this would help, but according to the execution plan, it doesn't think much of my FirstName + LastName column/index because it ignores that and instead does an index scan FirstName, LastName column.
This is not performing so well, particularly on the first try with clean cache/buffers. Must be a better way???
If it would help, I can produce some sample code to reproduce this scenario.
Thanks!
.

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply

BSavoie SSCertifiable Points: 6309 More actions · Answer 1

I decided it would be easier to post an example. This demonsrates my problem exactly. Notice the second and final query does an index seek, and the first query does an index scan. I created the FullName computed column and indexed it thinking the optimizer would choose that for the first query and do an index seek instead of the index scan, but that's not what's happening.

--* Create the table

IF EXISTS (SELECT *

FROM sys.objects

WHERE object_id = OBJECT_ID(N'[dbo].[TestNames]') AND type in (N'U'))

DROP TABLE [TestNames]

GO

CREATE TABLE TestNames (

TestNameId int not null primary key identity(1,1)

,FirstName nvarchar(15) not null

,LastName nvarchar(25) not null

,InsertTime DateTime not null DEFAULT(GETDATE())

-- Hoping this will help the WHERE FirstName + ' ' + LastName search. But it doesn't

,FullName AS FirstName + ' ' + LastName

)

GO

--* Create the indexes

IF EXISTS (SELECT *

FROM sys.indexes

WHERE object_id = OBJECT_ID(N'[dbo].[TestNames]') AND name = N'IX_FirstName_LastName')

DROP INDEX [IX_FirstName_LastName] ON [dbo].[TestNames] WITH ( ONLINE = OFF )

GO

CREATE NONCLUSTERED INDEX [IX_FirstName_LastName] ON [dbo].[TestNames]

(

[FirstName] ASC,

[LastName] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,

ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[TestNames]') AND name = N'IX_FullName')

DROP INDEX [IX_FullName] ON [dbo].[TestNames] WITH ( ONLINE = OFF )

GO

CREATE NONCLUSTERED INDEX [IX_FullName] ON [dbo].[TestNames]

(

[FullName] ASC

)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,

ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

--* On to the test

INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smith')

INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smothers')

INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smart')

INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'West')

--* Trying to get the optimizer to use my FullName index and seek instead of scan

SELECT FirstName, LastName, FullName

FROM TestNames

WHERE FirstName + ' ' + LastName LIKE 'Bob Sm%'

--* The optimizer uses the correct index here and does a seek

SELECT FirstName, LastName, FullName

FROM TestNames

WHERE FirstName = 'Bob' AND LastName LIKE 'sm%'

.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 2

Lynn Pettis

SSC Guru

Points: 442467

March 24, 2012 at 12:48 am

#1463893

How many rows in the test table on your server?

BSavoie SSCertifiable Points: 6309 More actions · Answer 3

BSavoie

SSCertifiable

Points: 6309

March 24, 2012 at 4:52 am

#1463902

Like I said. I'm expecting about 4 million.

.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 4

BSavoie (3/24/2012)
Like I said. I'm expecting about 4 million.

Didn't ask you what you were expecting, I asked how many rows of data in the table you are testing against.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 5

Here is why I ask. I worked with a developer you asked why in a test environment his procedure wasn't using the index we had just elevated. Doing some checking I found that his procedure was basically returning anout 95% of the data in the table. SQL Server ignored the index because it was cheaper to scan the table (clustered index scan in this case).

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 6

Head to desk. I just saw the problem.

Your code:

SELECT FirstName, LastName, FullName

FROM TestNames

WHERE FirstName + ' ' + LastName LIKE 'Bob Sm%'

What you needed:

SELECT FirstName, LastName, FullName

FROM TestNames

WHERE FullName LIKE 'Bob Sm%'

BSavoie SSCertifiable Points: 6309 More actions · Answer 7

Thanks Lynn, you are exactly right. That was a mistake on my part. I did not make that mistake in the real system. I corrected it, but it still ignores my FullName index. I think you are exactly right though. Not enough rows. I'll get more rows in the table and try again. Appreciate the feedback.

.

BSavoie SSCertifiable Points: 6309 More actions · Answer 8

Now my table has well over 500,000 rows. Still ignoring my index though. I've attached my query plan if that helps.

.

Lynn Pettis SSC Guru Points: 442467 More actions · Answer 9

Here is what I see from looking at the execution plans. I would appreciate others thoughts in this regard as well.

First, if you look at both plans they are using the same index, IX_FirstName_LastName. One is doing an index seek, the other an index scan.

The first query is looking for FirstName = 'Bob' and LastName like 'sm%'. Seeking on 'Bob' then scanning LastName for rows where LastName starts with 'Sm'. Makes sense. In addition, SQL Server knows that FullName is the concatenation of FirstName and LastName and this index has both, so it doesn't need to go to the table for this information.

Looking at the second query where we are looking for FullName like 'Bob sm%', SQL Server can again look to the same index. It can concatenate FirstName and LastName from the data in the index, and look for the rows the start with 'bob sm'. If SQL Server used the index IX_FullName, it could find rows it needs, but would then need to do a bookmark lookup in the data table itself to return FirstName and LastName.

I may be wrong, but it appears to me that SQL Server has determined that for these two queries (even though one is a seek and the other a scan) that using the index IX_FirstName_LastName is more cost effective than using the index IX_FullName with bookmark lookups to return FirstName and LastName as well as FullName.

BSavoie SSCertifiable Points: 6309 More actions · Answer 10

That makes great sense. Thank you Lynn.

I had not considered that SQL Server might find the FirstName_LastName index to be more efficient even when searching for a full name. I did not know that it was smart enough to deduce that FirstName + ' ' + LastName could be found in the FirstName_LastName index. I assumed that an expression like that would be a problem for the optimizer.

I'm not sure I understand why FirstName_LastName is more efficient than FullName though.

Thanks again!

.

Jeff Moden SSC Guru Points: 1004343 More actions · Answer 11

BSavoie (3/23/2012)
--* The optimizer uses the correct index here and does a seek SELECT FirstName, LastName, FullName FROM TestNames
WHERE FirstName = 'Bob' AND LastName LIKE 'sm%'

I guess my suggestion is that you've correctly identified the symptoms and have found a solution. The only problem would be with implimentation.

If the GUI is passing a single string like 'bob sm%' to the lookup proc and you're having problems using it as it is against a "full name" column, then maybe the full name column isn't the correct solution. Maybe it would be better to use a "splitter" to split the input parameter at the space and use the code above to do the seeks.

As a side bar, if you do elect to split the input parameter and change the query just a tiny bit, then you can enhance the flexibility of the code so that someone could pass 'rob% sm%' and have it find Rob Smith, Robert Smithe, Roberta Small, etc, etc.

The changes to the lookup that would use the split parameter would look like the following.

--* The optimizer uses the correct index here and does a seek

SELECT FirstName, LastName, FullName

FROM TestNames

WHERE FirstName LIKE 'Bob%' AND LastName LIKE 'sm%'

If someone used 'Bob Smith' as the passed in parameter, the code would end up looking like this...

--* The optimizer uses the correct index here and does a seek

SELECT FirstName, LastName, FullName

FROM TestNames

WHERE FirstName LIKE 'Bob%' AND LastName LIKE 'sm%'

... which is essentially identical to ...

--* The optimizer uses the correct index here and does a seek

SELECT FirstName, LastName, FullName

FROM TestNames

WHERE FirstName = 'Bob' AND LastName = 'Smith'

... because the LIKEs will be treated as equalities by SQL Server.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004343 More actions · Answer 12

PS. It will also allow for parameters like '% sm%' to find anyone with a last name starting with 'sm' and 'Bob %' to find anyone with a first name of 'Bob'. I believe that second would probably use index seeks but maybe not the first one. A "reverse" index may be needed for that.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

BSavoie SSCertifiable Points: 6309 More actions · Answer 13

GMTA. I have had that solution exactly in the back of my mind. I was just trying to meet the challenge of optimizing what they came up with, before suggesting a change. IMHO, that is a better solution Jeff.

.

HowardW One Orange Chip Points: 29109 More actions · Answer 14

Lynn Pettis (3/24/2012)
I may be wrong, but it appears to me that SQL Server has determined that for these two queries (even though one is a seek and the other a scan) that using the index IX_FirstName_LastName is more cost effective than using the index IX_FullName with bookmark lookups to return FirstName and LastName as well as FullName.

Yep, the important point is that of the two queries that produced the scan and seek, one is covering and one is not. You can't return the First Name and Last Name fields from the FullName index as it's a concatenation.

With the test data given, SQL Server knows that 75% of the rows match the filter criteria and there are only 4 rows in the table, so will clearly pick a scan.

It's easy to shift the data so that it uses the index, just add a big chunk of data that's not going to match that criteria (e.g. make it more selective):

--* On to the test

INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smith')

INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smothers')

INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smart')

INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'West')

INSERT INTO TestNames(FirstName, LastName)

SELECT TOP 10000 REPLICATE(CHAR(CAST(RAND(CHECKSUM(NEWID()))*26 AS INT) +65),10),REPLICATE(CHAR(CAST(RAND(CHECKSUM(NEWID()))*26 AS INT) +65),10)

FROM MASTER.SYS.columns SC, MASTER.SYS.COLUMNS SC1

UPDATE STATISTICS TESTNAMES

Bookmark lookups are quite expensive - Gail wrote a good article[/url] that looks at the tipping points (although they'll vary massively depending on the actual data).

Of course, you could also make your index covering, which puts it on an even playing field with the other index:

CREATE NONCLUSTERED INDEX [IX_FullName] ON [dbo].[TestNames]

(

[FullName] ASC

) INCLUDE(FirstName, LastName) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,

ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

GO

If going with a splitter approach, make sure it works ok with double barrelled names (they don't have to be hyphenated)

FYI - there's no need to reference the FullName column explicitly in your query, its clever enough to work out that it can use the index if the definition of the computed column is identical