Optimising “Ends With” searches with REVERSE

Question

Post reply

Optimising “Ends With” searches with REVERSE

ben-564110

SSC Eights!

Points: 942
More actions
January 12, 2010 at 9:27 pm

#215416

Comments posted to this topic are about the item Optimising “Ends With” searches with REVERSE

Viewing 15 posts - 1 through 15 (of 21 total)

You must be logged in to reply to this topic. Login to reply

Varun.c Valued Member Points: 73 More actions · Answer 1

Nice Article.

I think you need to correct this query

from

SELECT NAME FROM TEST_TABLE WHERE NAME LIKE 'DI%'

To

SELECT NAME FROM TEST_TABLE WHERE NAME_REVERSED LIKE 'DI%'

Best,

Varun.C

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 2

Good article and makes a lot of sense... providing that the table doesn't have a lot of inserts where every index counts AGAINST inserts. Most people don't know that INSERTS on heavily indexed tables are one of the primary causes of very high reads. For example and using the given example table, if I run the following code...

--drop table test_table

SELECT object_id, name, system_type_id INTO test_table FROM master.sys.all_parameters

CREATE NONCLUSTERED INDEX ix_name ON test_table (name ASC)

ALTER TABLE test_table ADD name_REVERSED AS REVERSE(name)

PRINT '========== Insert with no index =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

INSERT INTO test_table SELECT 1,'Dodah',2

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

GO

CREATE NONCLUSTERED INDEX IX_REVERSED ON TEST_TABLE (NAME_REVERSED ASC) INCLUDE (NAME)

GO

PRINT '========== Insert with index =========='

SET STATISTICS IO ON

SET STATISTICS TIME ON

INSERT INTO test_table SELECT 1,'Dodah',2

SET STATISTICS TIME OFF

SET STATISTICS IO OFF

... then we can see that a single insert after the index is created now takes 11 reads instead of just 3.

(6756 row(s) affected)

========== Insert with no index ==========

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Table 'test_table'. Scan count 0, logical reads 3, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

(1 row(s) affected)

========== Insert with index ==========

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

SQL Server parse and compile time:

CPU time = 0 ms, elapsed time = 0 ms.

Table 'test_table'. Scan count 0, logical reads 11, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

SQL Server Execution Times:

CPU time = 0 ms, elapsed time = 1 ms.

(1 row(s) affected)

If you have an IO bound system, you need to be really careful about adding any indexes to tables that have a high insertion rate. Like everything else, "It Depends" and only a bit of "complete" testing will show you things that you may have not considered.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Paul White SSC Guru Points: 150468 More actions · Answer 3

Jeff Moden (1/13/2010)
Good article and makes a lot of sense... providing that the table doesn't have a lot of inserts where every index counts AGAINST inserts. Most people don't know that INSERTS on heavily indexed tables are one of the primary causes of very high reads. For example and using the given example table, if I run the following code...
... then we can see that a single insert after the index is created now takes 11 reads instead of just 3.

On my 2008 system, the logical reads change from 3 to 5 (there are 7090 rows).

Jeff Moden (1/13/2010)
If you have an IO bound system, you need to be really careful about adding any indexes to tables that have a high insertion rate. Like everything else, "It Depends" and only a bit of "complete" testing will show you things that you may have not considered.

It is important to consider the impact of index maintenance, for sure, but there's nothing unusual about that. Every new index has to justify itself in terms of the trade off between increased maintenance overhead versus the benefits produced by having the index. Generally, one would expect most useful indexes to involve a net saving in reads; the reduced reads coming from queries that benefit from the index should outweigh the reads added by index maintenance.

Don't get me started on what a poor metric the number of 'logical reads is' 😉

All that aside, I found this to be a good article, and a fine approach for any system that finds itself requiring a lot of 'ends with' searches. It is also well presented and clear - so top marks for style.

Paul

Paul White
All articles available on SQL.kiwi
@SQL_Kiwi

ben-564110 SSC Eights! Points: 942 More actions · Answer 4

Hi Varun.C,

Well spotted!

I have made the correction.

Many thanks,

Ben

Rome1981 SSC-Addicted Points: 455 More actions · Answer 5

I was just about to post the same. Good to know there are people testing these solutions/examples.

Jeff- That is very good advice, I did not know that.

Tab Alleman SSCommitted Points: 1590 More actions · Answer 6

When I saw the title of this article, I immediately envisioned replacing:

...WHERE Name LIKE '%son'

with:

WHERE LEFT(REVERSE(Name),3)='nos'

Because to answer the introductory question, "How often do I need to do a search like...?": Not nearly often enough to justify creating a new indexed column solely for that purpose. A pure-SQL solution would be interesting though.

peter-757102 SSCertifiable Points: 6877 More actions · Answer 7

Note that indexes can only have a limited width (300 bytes or so , I believe).

Because of this, it makes sense to do a reverse of only the last N characters and limit the length of the search text to match. If your strings have varying lengths between 3 and 250 characters and your search strings are strict enough with 12 characters, add a few extra and use the reverse of the rightmost 16 characters to put the index on and truncate the search string to 16 characters as well.

This saves you a ton of otherwise unneeded duplicate data and keeps the number of page reads to a minimum.

alen teplitsky SSC-Dedicated Points: 30014 More actions · Answer 8

this is great and would be a great replacement for Full Text Indexing for smaller text columns. my biggest gripe about FTI is the blocking caused by inserting large amounts of data if you have change tracking set to auto

ben-564110 SSC Eights! Points: 942 More actions · Answer 9

peter-757102 (1/13/2010)
Note that indexes can only have a limited width (300 bytes or so , I believe).
Because of this, it makes sense to do a reverse of only the last N characters and limit the length of the search text to match. If your strings have varying lengths between 3 and 250 characters and your search strings are strict enough with 12 characters, add a few extra and use the reverse of the rightmost 16 characters to put the index on and truncate the search string to 16 characters as well.
This saves you a ton of otherwise unneeded duplicate data and keeps the number of page reads to a minimum.

That's a great idea and a good refinement. I suppose you could also search on the last n characters which would potentially return you a list of primary keys from the table and then do a normal 'forwards' search on the resulting rows to narrow those down to the specifc results you want. The max index key size is 900 bytes BTW. See for details.

Regards,

Ben

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 10

Jeff Moden (1/13/2010)
Don't get me started on what a poor metric the number of 'logical reads is'

Heh... now you know why I couched it all in "It Depends". :hehe: Between that and the crud that shows up in the execution plan....

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 11

peter-757102 (1/13/2010)
Note that indexes can only have a limited width (300 bytes or so , I believe).
Because of this, it makes sense to do a reverse of only the last N characters and limit the length of the search text to match. If your strings have varying lengths between 3 and 250 characters and your search strings are strict enough with 12 characters, add a few extra and use the reverse of the rightmost 16 characters to put the index on and truncate the search string to 16 characters as well.
This saves you a ton of otherwise unneeded duplicate data and keeps the number of page reads to a minimum.

It's actually 900 bytes for an index but very good point especially if you need to do it often.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Jeff Moden SSC Guru Points: 1004704 More actions · Answer 12

Rome1981 (1/13/2010)
I was just about to post the same. Good to know there are people testing these solutions/examples.
Jeff- That is very good advice, I did not know that.

Thanks for the feedback. They key is that you have to look, though. Like Paul says, "Logical Reads" isn't the best and most reliable parametric and execution plans can lie like a rug.

--Jeff Moden

RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
First step towards the paradigm shift of writing Set Based code:
________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

Change is inevitable... Change for the better is not.

Helpful Links:
How to post code problems
How to Post Performance Problems
Create a Tally Function (fnTally)

Peter Brase SSC Enthusiast Points: 101 More actions · Answer 13

Thanks for this, an excellent tip! I discovered that even if you skip the extra column and just run the reverse query, i.e., "where reverse(fieldname) like 'cba%'" instead of "where fieldname like '%abc'" the response time improves by 50%.

JJ B SSCertifiable Points: 5157 More actions · Answer 14

What great timing for this article!

I'm just finishing up the design for an application that involves a "call module". This module is a place for users to record various data about phone calls that come into our agency. One of the features in the application is that the users need to be able to look up (search) records by the caller's phone number.

For various reasons, when entering the phone number data, users will have to enter the full phone number, including an area code. Ex: (541)444-5555.

However, the VAST majority of phone numbers that our users will enter or see will have the same area code. So, I've been playing with the idea of adding a feature where staff only have to enter the last 7 digits when doing a search.

I've been reluctant to implement such a feature for a couple reasons, one of which is precisely because I didn't want to have a WHERE clause such as:

WHERE phone like '%444-5555'

Yuck.

This article has given me a technique that makes that feature more palatable. I understand Jeff's caution. Even so, I'm excited about this idea because I think it might be perfect for the situation in my application. Thanks.

(The reason I explained my situation in detail is because I thought it might be helpful to provide a real-world example of when this need might come into play. If I didn't have such a need myself right now, I would not have been able to think of a realistic example of why anyone would really want to provide a "search the end" feature for users.)