The Challenge
When you make the decision to use full text indexing (FTI) in SQL Server, you
need to be very careful with the design of the user interface and the impact on
the end users. Why do you ask? well is actually due to FTI itself and its
range of clauses that enhance the search.
For example, if you type in "Electrical Training" in your application
search text box, and passed this directly to FTI using the CONTAINS clause like
this:
select orgname_name
from organisation_name
where contains(*, '"Electrical Training"')
we happen to get 8 rows returned
Well that's fine and is probably no better than what you were doing before
FTI with LIKE '%Electrical Training%'; but do you really want to get your users
to type in more FTI friendly/smart clauses to expand the result set? here
is a classic example related to word proximity:
select orgname_name
from organisation_name
where contains(*, '"Electrical" NEAR "Training"')
now we have 16 rows return, but required the user to wrapper double
quotes (perhaps) and enter the NEAR operator (yeh right!)
So do you smarten up your interface and give you users the endless
combinations of criteria to select from? or is it just business as usual and,
through a smart parsing solution, we "re-work" the users clause to expand on and
enrich the search experience without the need to train users or alter
front-ends.
This short paper discusses one possible solution for string parsing for FTI
searching.
One of Many Solutions
The following stored procedure is basically a string parser. It takes a
single string as input, the returns a FTI friendly string back primarily
designed for CONTAINS or CONTAINSTABLE only. The examples below are calls
to the FTI routine, showing the input and output strings. Be aware that
INFLECTIONAL statements are ignored if you use a neutral language on your FTI
indexed columns. Also, see documentation regarding support when used with
other clauses, namely contains, freetext etc.
Example 1 - Single Word
declare @parsed_string varchar(200) set @parsed_string = "mcdonalds" -- what the user entered exec FTI_FixString 'drive safe', 'and', @parsed_string OUTPUT print @parsed_string
("mcdonalds") OR ("mcdonalds*") OR (FORMSOF(INFLECTIONAL,"mcdonalds"))
Example 2 - Multiple Words
declare @parsed_string varchar(200) set @parsed_string = "drive safe" -- what the user entered exec FTI_FixString 'drive safe', 'near', @parsed_string OUTPUT print @parsed_string
"drive safe" OR ("drive" near "safe")
OR ("drive*" near "safe*") OR (FORMSOF(INFLECTIONAL,"drive") AND
FORMSOF(INFLECTIONAL,"safe"))
This particular solution is a good one from a majority of searches (in our
application). Please note that I stripped out the logging calls to an
audit table to track the incoming and outgoing parsed text for ongoing analysis
of user search criteria.
In the examples above, we look for an exact
match "drive safe", then a proximity match ("drive" near "safe") OR
("drive*" near "safe*") and finally, an exact match using the
inflectional clause to really get FTI working for us. The routine is very
each to customise to add/remove clauses as required.
The Stored Procedure
The routine is relatively simple, we take the incoming string, strip out
selected FTI syntax and some of the noise words. From here we strip the
sting into into individual words and wrap around our FTI search clauses to
enhance the users search criteria as shown above.
NOTE - With your testing, if a table has multiple indexed columns make
sure FTI is searching appropriately over them as values in one column, but not
in another can cause much heartache.
CREATE PROCEDURE FTI_FixString ( @keywords varchar (500) = null , @proximity varchar (10) = 'NEAR', -- and, or, near @parsedstring varchar(500) OUTPUT ) AS SET CONCAT_NULL_YIELDS_NULL OFF SET NOCOUNT ON BEGIN DECLARE @sql varchar(2000) DECLARE @where varchar(1000) DECLARE @i int -- old posn in string DECLARE @j int -- match posn in string DECLARE @k int -- new start posn DECLARE @len int -- string length DECLARE @search varchar(500) DECLARE @search2 varchar(500) DECLARE @fuzzy varchar(1000) DECLARE @keyword varchar(500) DECLARE @synonyms varchar(255) DECLARE @new_keywords varchar(1000) declare @found int -- Clean up the keywords string first, removing typical FTI join conditions, a majority of users dont type this in from long term analysis -- of their search clauses
set @keywords = LTRIM(RTRIM(@keywords)) set @keywords = REPLACE(@keywords,'*',' ') set @keywords = REPLACE(@keywords,' AND ',' ') set @keywords = REPLACE(@keywords,' OR ',' ') set @keywords = REPLACE(@keywords,'"',' ') set @keywords = REPLACE(@keywords,'(',' ') set @keywords = REPLACE(@keywords,')',' ') set @keywords = replace(@keywords, 'o''', '') -- mcdonald's ==> mcdonald''s, escape out the ' which is a valid piece of punctuation set @keywords = replace(@keywords, '''', '''''') -- fix é type characters set @found = charindex('é', @keywords) if @found > 0 if @found = len(@keywords) set @keywords = replace(@keywords, 'é', '') else set @keywords = replace(@keywords, 'é ', '* ') -- set proximity type if @proximity is null or @proximity = '' set @proximity = 'NEAR' -- Replace keywords with their synonyms if their exist -- THIS IS COMMENTED OUT FOR A GOOD REASON, WE DON'T USE A KEYWORD TABLE LOOKUP, UNCOMMENT AND -- HOOK IN YOUR OWN AS REQUIRED. /* SELECT @i = 1 -- start of the first string SELECT @j = 0 -- end of the first string unknown SELECT @len = LEN(@keywords) SELECT @new_keywords = '' WHILE (@j <= @len) BEGIN SELECT @k = @j+1 -- New start posn -- Detect the end of each substring SELECT @j = CHARINDEX(' ', @keywords, @k) -- Finish if no more found IF (@j <= 0) BEGIN SELECT @keyword = LTRIM(RTRIM(SUBSTRING(@keywords,@k,@len-@k+1))) SELECT @synonyms = synonyms FROM tbl_keyword_synonyms WHERE keyword = @keyword IF @@ROWCOUNT = 1 SELECT @new_keywords = @new_keywords + ' ' + @synonyms ELSE SELECT @new_keywords = @new_keywords + ' ' + @keyword BREAK END IF (@j < @len) BEGIN IF (SUBSTRING(@keywords,@j-1,1) <> ' ') BEGIN SELECT @keyword = LTRIM(RTRIM(SUBSTRING(@keywords,@i,@j-@i))) SELECT @synonyms = synonyms FROM tbl_keyword_synonyms WHERE keyword = @keyword IF @@ROWCOUNT = 1 SELECT @new_keywords = @new_keywords + ' ' + @synonyms ELSE SELECT @new_keywords = @new_keywords + ' ' + @keyword SELECT @i = @j + 1 END END END SELECT @keywords = @new_keywords */SELECT @i = 1 -- start of the first string SELECT @j = 0 -- end of the first string unknown SELECT @len = LEN(@keywords) SELECT @search = '' SELECT @fuzzy = '' WHILE (@j <= @len) BEGIN SELECT @k = @j+1 -- New start posn -- cut up the string into its individual words, split this into its inflectional part and standard FTI search portion SELECT @j = CHARINDEX(' ', @keywords, @k) -- Finish if no more found IF (@j <= 0) BEGIN -- Add the last part of the string SELECT @search = @search + '"' + LTRIM(RTRIM(SUBSTRING(@keywords,@i,@len-@i+1))) + '*"' SELECT @search2 = @search2 + '"' + LTRIM(RTRIM(SUBSTRING(@keywords,@i,@len-@i+1))) + '"' if not SUBSTRING(@keywords,@k,@len-@k+1) in ('of', 'and', 'at', 'a', 'the', '&', 'it', 'for') SELECT @fuzzy = @fuzzy + 'FORMSOF(INFLECTIONAL,"' + LTRIM(RTRIM(SUBSTRING(@keywords,@k,@len-@k+1))) + '")' BREAK END IF (@j < @len) BEGIN IF (SUBSTRING(@keywords,@j-1,1) <> ' ') BEGIN SELECT @search = @search + '"' + LTRIM(RTRIM(SUBSTRING(@keywords,@i,@j-@i))) + '*" ' + @proximity + ' ' SELECT @search2 = @search2 + '"' + LTRIM(RTRIM(SUBSTRING(@keywords,@i,@j-@i))) + '" ' + @proximity + ' ' if not SUBSTRING(@keywords,@i,@j-@i) in ('of', 'and', 'at', 'a', 'the', '&', 'it', 'for') SELECT @fuzzy = @fuzzy + 'FORMSOF(INFLECTIONAL,"' + LTRIM(RTRIM(SUBSTRING(@keywords,@i,@j-@i))) + '") AND ' SELECT @i = @j + 1 END END END if substring(@fuzzy, len(@fuzzy) - 2, 3) = ' OR' set @fuzzy = substring(@fuzzy, 1, len(@fuzzy) - 2) if substring(@fuzzy, len(@fuzzy) - 3, 4) = ' AND' set @fuzzy = substring(@fuzzy, 1, len(@fuzzy) - 3) -- Example: ("mcdonalds") OR ("mcdonalds*") OR (FORMSOF(INFLECTIONAL,"mcdonalds")) SELECT @search = '"' + @keywords + '" OR '+ '(' + @search2 + ') OR (' + @search + ') OR (' + @fuzzy + ')' set @parsedstring = @search SET CONCAT_NULL_YIELDS_NULL ON END GO
Example Call
<<some stored procedure>> SET NOCOUNT ONset @ftisenabled = 1 -- 1 is YES, we are using Full Text Indexing, essential if catalogs are down/unavailable if @ftisenabled = 1 begin set @ftop = 'CONTAINSTABLE' -- so we can quickly move between different FTI operations for testing if ltrim(rtrim(@p_searchstring)) <> '' begin set @p_searchstring= replace(@p_searchstring, '%', '') exec FTI_FixString @p_searchstring, 'near', @p_searchstring OUTPUT end else set @ftisenabled = 0 end<< do your search here using the altered @p_searchstring
in your CONTAINS clause >>
Conclusion
What does all this mean? well it highlights the importance of testing,
testing not only the result sets from FTI using its variety of clauses and
options, but understanding the impact on your end-users and what you, as a
DBA/Developer, need to take on board to make the road to FTI implementation a
simple and effective one. Have fun!
Credits
A big thanks to Lindsay Cocks for his early work in this area and
tireless testing.