March 5, 2009 at 2:50 am
Hello,
I need to get very very quickly the records from a table, whoose column [name] contains or begins with a given string, which is a phrase prefix. The table has about 250000 records currently. The [name] column contains names of places, typically 1-4 words long, but there are longer names too. I was expecting that this is a perfect candidate for using Full-text search, but could not see any performance difference between
WHERE [name] LIKE '% blabla%'
and
WHERE CONTAINS([name], '"blabla*"')
Is this normal?
March 5, 2009 at 9:18 am
The answer to these kinds of questions is usually "It depends." Two significant variables here are the size of the name column (average of four words) and the number of rows that must be scanned when doing a LIKE test (250000). You don't mention whether or not there are other search criteria that limit the number of rows that must be tested by the LIKE. If there are, it may be that you are only having to run the LIKE test against a fraction of the table. Looking at your actual execution plan would show what is happening.
If the LIKE test BEGINS with a certain string, for example LIKE 'New%' instead of LIKE '%New%', then a simple index over the Name column will almost certainly run fastest.
You may find that the time difference is minimal at your current volumes, but might be significantly different at 10 million rows, or with a significantly larger amount of text in the name column, say a full page instead of 4 words. So, all you can do is test at your anticipated volumes and see what the results are. It all depends.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
March 5, 2009 at 10:11 am
Hi,
Thank you for your suggestions. There is no other search criteria and the LIKE is two-sided. The search criteria is exactly as I posted it.
The problem is that both solutions are not fast enough with the current volumes. Maybe with growing volumes the performance difference between the two of them will grow, but each of them will be slower too... It seems that I have overestimated the power of full-text search. Now I'm considering a third solution - build a special "index" table so that I can do
[index table].[namepart] LIKE 'blabla%'
instead of
[original table].[name] LIKE '% blabla%'
with an index on the [namepart] column and get the same results. It has to be repopulated periodically - the problem permits such a solution. This is very very ugly, but it is really fast and that's more important. If anyone can give me a better idea I will gladly consider it.
March 5, 2009 at 12:58 pm
I've seen an article in here recently where someone had implemented a similar scheme. Essentially, at the time the name field was inserted, they parsed out all the words individually, and built a separate table consisting of two columns: one for the individual words and one containing the key to the primary table. If I can find it again, I'll post the link here.
I didn't ask earlier. Are the names going to be repeated?
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
March 5, 2009 at 2:04 pm
As a matter of fact I've seen this article here too some time ago but now I can't find it. 🙂 it is very similar indeed but there are differences too - I need phrases, not keywords. For example the name "Empire State Building" would be broken to 3 records:
Empire State Building
State Building
Building
with a foreign key to the original table.
If there are other names containing "Building" there will be duplication, but I think this is not a big deal.
The functionality itself is Autocomplete for a text field. For example "Empire State Building" would be matched if user enters "Empi" or "Empire St" or "Stat" Or "State Buil"
I see now that maybe I can use triggers in order to keep the original and index tables in sync and avoid the periodic repopulation.
March 5, 2009 at 2:13 pm
Empire State Building
State Building
Building
Since you will be writing the parsing to build the strings, doing that shouldn't be a problem.
How volatile is the data in the name column? Does it get updated often? If so, remember to delete all the old rows before building new phrase entries from the name string. Otherwise you will wind up with false pointers.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
March 5, 2009 at 2:28 pm
They will be not changed very often. The typical opartaion will be INSERT...
of course on UPDATE I wiill delete the current index records and insert new ones...
What I don't understand is why in this scenario I can't use the full-text search but rather have to create some custom schemes myself... i thought the full-text search is doing something similar to this but much more complex, so I can use it for free.
March 5, 2009 at 2:43 pm
I'm afraid I don't have an answer for you. We haven't implemented full-text search in production. Someone suggested using it to replace some existing search functionality that I had implemented, but they also found out it was slower. In that case it didn't surprise me, since I had covering indexes to specifically support the stored procedure that did the search. But it seems like full-text search would perform better for the scenario you described.
__________________________________________________
Against stupidity the gods themselves contend in vain. -- Friedrich Schiller
Stop, children, what's that sound? Everybody look what's going down. -- Stephen Stills
July 30, 2010 at 5:10 am
Thinking of the same requirement.,i am using URLS to search so my base URL will be something like http:\\www.google.com . I have another table which contains the keywords or like urls to be excluded from the base table .this is the exclusion table .. I have all this exclusion urls starting with % and ending with %..I am planning to use FullText search and indexes .. I have this topic in discussion at this thread
http://www.sqlservercentral.com/Forums/Topic960587-338-2.aspx?Update=1
Thanks
Raj Deep
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply