Ways to Optimize Large numbers of Wild Card Searches

  • Currently I am working on a database table that has a very large text field (we will call it ExtractedText). The query has to look for wildcard searches through the whole text, and determine how many records each term hits on.

    Currently the basis of the query looks like this:

    select ID FROM TABLE WHERE ExtractedText like '%BILL%' UNION ALL

    select ID FROM TABLE WHERE ExtractedText like '%Tommy%' UNION ALL

    select ID FROM TABLE WHERE ExtractedText like '%Willy%'

    NOTE: Each term has to be run on each record, because I need the number of records each term hits on.

    We are doing this for over 300 different terms on a large database, so days of running are not uncommon.

    I am looking for ways to speed this up.

    I have recently heard of Full Text indexes which may be a solution to the problem, but the results would have to be exactly the same as if I ran the above query. Would that be the case with a Full Text Index search?

    Are there other techniques I can use to speed this up?

  • huston.dunlap (10/30/2009)


    Currently I am working on a database table that has a very large text field (we will call it ExtractedText). The query has to look for wildcard searches through the whole text, and determine how many records each term hits on.

    Currently the basis of the query looks like this:

    select ID FROM TABLE WHERE ExtractedText like '%BILL%' UNION ALL

    select ID FROM TABLE WHERE ExtractedText like '%Tommy%' UNION ALL

    select ID FROM TABLE WHERE ExtractedText like '%Willy%'

    NOTE: Each term has to be run on each record, because I need the number of records each term hits on.

    We are doing this for over 300 different terms on a large database, so days of running are not uncommon.

    I am looking for ways to speed this up.

    I have recently heard of Full Text indexes which may be a solution to the problem, but the results would have to be exactly the same as if I ran the above query. Would that be the case with a Full Text Index search?

    Are there other techniques I can use to speed this up?

    1) Your query has no possible way of telling you how many records each term hit on. You will simply get a single collection of IDs that are NOT correlated with the term each hit on.

    you can fix this (AND eliminate 2 table/index scans) in the above example with something such as this:

    select sum(case when ExtractedText like '%BILL%' then 1 else 0 end) as billcount,

    sum(case when ExtractedText like '%Tommy%' then 1 else 0 end) as Tommycount,

    sum(case when ExtractedText like '%Willy%' then 1 else 0 end) as Willycount

    FROM TABLE

    WHERE ExtractedText like '%Tommy%'

    OR ExtractedText like '%Willy%'

    OR ExtractedText like '%BILL%'

    2) if you index the big column you can at least do an index scan instead of a table scan. This could save a good bit of time (at the expense of a huge index) if the table has lots of other columns

    3) full text indexing should be able to help with this query and you should set up a quick proof of concept

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • Index scans will never scale, performance will always be directly related to table size, and there is no way I have found not to scan when using forward and backwards wild card searching. If there is anyway you can only wild card on the backside, you have a chance to seek the index. Many times this is not possible per requirements.

    Again look at full text indexing, I don't feel there is a wealth of information out there, but there is enough of it to get you where you are doing. If you have a MS support contract the FTS group was very helpful to us with many questions we had. Especially look at 2008, where FTS is now not a separate component but integrated into the engine itself. You will get really good performance, but you in fuzzy search land which is an odd land to be in.

    Here is a good one:

    http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/

    Mr. Cotter always has some great insight on this topic.

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply