Index Strategy

  • I have a big table (4 million records or so). It has FirstName, LastName among other things. The front end guys created a UI that lets the user type in a partial full name: "bob sm" which they expect to find bob smith, bob smothers, bob smart, etc., and in the sproc that handles the search, this translates to where FirstName + ' ' + LastName like 'bob sm%'. We have an index on FirstName, LastName.

    In an effort to improve performance, I also created a computed column on FirstName + LastName and indexed that column thinking this would help, but according to the execution plan, it doesn't think much of my FirstName + LastName column/index because it ignores that and instead does an index scan FirstName, LastName column.

    This is not performing so well, particularly on the first try with clean cache/buffers. Must be a better way???

    If it would help, I can produce some sample code to reproduce this scenario.

    Thanks!

    .

  • I decided it would be easier to post an example. This demonsrates my problem exactly. Notice the second and final query does an index seek, and the first query does an index scan. I created the FullName computed column and indexed it thinking the optimizer would choose that for the first query and do an index seek instead of the index scan, but that's not what's happening.

    --* Create the table

    IF EXISTS (SELECT *

    FROM sys.objects

    WHERE object_id = OBJECT_ID(N'[dbo].[TestNames]') AND type in (N'U'))

    DROP TABLE [TestNames]

    GO

    CREATE TABLE TestNames (

    TestNameId int not null primary key identity(1,1)

    ,FirstName nvarchar(15) not null

    ,LastName nvarchar(25) not null

    ,InsertTime DateTime not null DEFAULT(GETDATE())

    -- Hoping this will help the WHERE FirstName + ' ' + LastName search. But it doesn't

    ,FullName AS FirstName + ' ' + LastName

    )

    GO

    --* Create the indexes

    IF EXISTS (SELECT *

    FROM sys.indexes

    WHERE object_id = OBJECT_ID(N'[dbo].[TestNames]') AND name = N'IX_FirstName_LastName')

    DROP INDEX [IX_FirstName_LastName] ON [dbo].[TestNames] WITH ( ONLINE = OFF )

    GO

    CREATE NONCLUSTERED INDEX [IX_FirstName_LastName] ON [dbo].[TestNames]

    (

    [FirstName] ASC,

    [LastName] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

    IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF, ALLOW_ROW_LOCKS = ON,

    ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    IF EXISTS (SELECT * FROM sys.indexes WHERE object_id = OBJECT_ID(N'[dbo].[TestNames]') AND name = N'IX_FullName')

    DROP INDEX [IX_FullName] ON [dbo].[TestNames] WITH ( ONLINE = OFF )

    GO

    CREATE NONCLUSTERED INDEX [IX_FullName] ON [dbo].[TestNames]

    (

    [FullName] ASC

    )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

    IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,

    ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    --* On to the test

    INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smith')

    INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smothers')

    INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smart')

    INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'West')

    --* Trying to get the optimizer to use my FullName index and seek instead of scan

    SELECT FirstName, LastName, FullName

    FROM TestNames

    WHERE FirstName + ' ' + LastName LIKE 'Bob Sm%'

    --* The optimizer uses the correct index here and does a seek

    SELECT FirstName, LastName, FullName

    FROM TestNames

    WHERE FirstName = 'Bob' AND LastName LIKE 'sm%'

    .

  • How many rows in the test table on your server?

  • Like I said. I'm expecting about 4 million.

    .

  • BSavoie (3/24/2012)


    Like I said. I'm expecting about 4 million.

    Didn't ask you what you were expecting, I asked how many rows of data in the table you are testing against.

  • Here is why I ask. I worked with a developer you asked why in a test environment his procedure wasn't using the index we had just elevated. Doing some checking I found that his procedure was basically returning anout 95% of the data in the table. SQL Server ignored the index because it was cheaper to scan the table (clustered index scan in this case).

  • Head to desk. I just saw the problem.

    Your code:

    SELECT FirstName, LastName, FullName

    FROM TestNames

    WHERE FirstName + ' ' + LastName LIKE 'Bob Sm%'

    What you needed:

    SELECT FirstName, LastName, FullName

    FROM TestNames

    WHERE FullName LIKE 'Bob Sm%'

  • Thanks Lynn, you are exactly right. That was a mistake on my part. I did not make that mistake in the real system. I corrected it, but it still ignores my FullName index. I think you are exactly right though. Not enough rows. I'll get more rows in the table and try again. Appreciate the feedback.

    .

  • Now my table has well over 500,000 rows. Still ignoring my index though. I've attached my query plan if that helps.

    .

  • Here is what I see from looking at the execution plans. I would appreciate others thoughts in this regard as well.

    First, if you look at both plans they are using the same index, IX_FirstName_LastName. One is doing an index seek, the other an index scan.

    The first query is looking for FirstName = 'Bob' and LastName like 'sm%'. Seeking on 'Bob' then scanning LastName for rows where LastName starts with 'Sm'. Makes sense. In addition, SQL Server knows that FullName is the concatenation of FirstName and LastName and this index has both, so it doesn't need to go to the table for this information.

    Looking at the second query where we are looking for FullName like 'Bob sm%', SQL Server can again look to the same index. It can concatenate FirstName and LastName from the data in the index, and look for the rows the start with 'bob sm'. If SQL Server used the index IX_FullName, it could find rows it needs, but would then need to do a bookmark lookup in the data table itself to return FirstName and LastName.

    I may be wrong, but it appears to me that SQL Server has determined that for these two queries (even though one is a seek and the other a scan) that using the index IX_FirstName_LastName is more cost effective than using the index IX_FullName with bookmark lookups to return FirstName and LastName as well as FullName.

  • That makes great sense. Thank you Lynn.

    I had not considered that SQL Server might find the FirstName_LastName index to be more efficient even when searching for a full name. I did not know that it was smart enough to deduce that FirstName + ' ' + LastName could be found in the FirstName_LastName index. I assumed that an expression like that would be a problem for the optimizer.

    I'm not sure I understand why FirstName_LastName is more efficient than FullName though.

    Thanks again!

    .

  • BSavoie (3/23/2012)


    --* The optimizer uses the correct index here and does a seek

    SELECT FirstName, LastName, FullName

    FROM TestNames

    WHERE FirstName = 'Bob' AND LastName LIKE 'sm%'

    I guess my suggestion is that you've correctly identified the symptoms and have found a solution. The only problem would be with implimentation.

    If the GUI is passing a single string like 'bob sm%' to the lookup proc and you're having problems using it as it is against a "full name" column, then maybe the full name column isn't the correct solution. Maybe it would be better to use a "splitter" to split the input parameter at the space and use the code above to do the seeks.

    As a side bar, if you do elect to split the input parameter and change the query just a tiny bit, then you can enhance the flexibility of the code so that someone could pass 'rob% sm%' and have it find Rob Smith, Robert Smithe, Roberta Small, etc, etc.

    The changes to the lookup that would use the split parameter would look like the following.

    --* The optimizer uses the correct index here and does a seek

    SELECT FirstName, LastName, FullName

    FROM TestNames

    WHERE FirstName LIKE 'Bob%' AND LastName LIKE 'sm%'

    If someone used 'Bob Smith' as the passed in parameter, the code would end up looking like this...

    --* The optimizer uses the correct index here and does a seek

    SELECT FirstName, LastName, FullName

    FROM TestNames

    WHERE FirstName LIKE 'Bob%' AND LastName LIKE 'sm%'

    ... which is essentially identical to ...

    --* The optimizer uses the correct index here and does a seek

    SELECT FirstName, LastName, FullName

    FROM TestNames

    WHERE FirstName = 'Bob' AND LastName = 'Smith'

    ... because the LIKEs will be treated as equalities by SQL Server.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • PS. It will also allow for parameters like '% sm%' to find anyone with a last name starting with 'sm' and 'Bob %' to find anyone with a first name of 'Bob'. I believe that second would probably use index seeks but maybe not the first one. A "reverse" index may be needed for that.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • GMTA. I have had that solution exactly in the back of my mind. I was just trying to meet the challenge of optimizing what they came up with, before suggesting a change. IMHO, that is a better solution Jeff.

    .

  • Lynn Pettis (3/24/2012)


    I may be wrong, but it appears to me that SQL Server has determined that for these two queries (even though one is a seek and the other a scan) that using the index IX_FirstName_LastName is more cost effective than using the index IX_FullName with bookmark lookups to return FirstName and LastName as well as FullName.

    Yep, the important point is that of the two queries that produced the scan and seek, one is covering and one is not. You can't return the First Name and Last Name fields from the FullName index as it's a concatenation.

    With the test data given, SQL Server knows that 75% of the rows match the filter criteria and there are only 4 rows in the table, so will clearly pick a scan.

    It's easy to shift the data so that it uses the index, just add a big chunk of data that's not going to match that criteria (e.g. make it more selective):

    --* On to the test

    INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smith')

    INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smothers')

    INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'Smart')

    INSERT INTO TestNames(FirstName, LastName) VALUES('Bob', 'West')

    INSERT INTO TestNames(FirstName, LastName)

    SELECT TOP 10000 REPLICATE(CHAR(CAST(RAND(CHECKSUM(NEWID()))*26 AS INT) +65),10),REPLICATE(CHAR(CAST(RAND(CHECKSUM(NEWID()))*26 AS INT) +65),10)

    FROM MASTER.SYS.columns SC, MASTER.SYS.COLUMNS SC1

    UPDATE STATISTICS TESTNAMES

    Bookmark lookups are quite expensive - Gail wrote a good article[/url] that looks at the tipping points (although they'll vary massively depending on the actual data).

    Of course, you could also make your index covering, which puts it on an even playing field with the other index:

    CREATE NONCLUSTERED INDEX [IX_FullName] ON [dbo].[TestNames]

    (

    [FullName] ASC

    ) INCLUDE(FirstName, LastName) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, SORT_IN_TEMPDB = OFF,

    IGNORE_DUP_KEY = OFF, DROP_EXISTING = OFF, ONLINE = OFF,

    ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]

    GO

    If going with a splitter approach, make sure it works ok with double barrelled names (they don't have to be hyphenated)

    FYI - there's no need to reference the FullName column explicitly in your query, its clever enough to work out that it can use the index if the definition of the computed column is identical

Viewing 15 posts - 1 through 15 (of 18 total)

You must be logged in to reply to this topic. Login to reply