Full Text Searching

  • I am looking for a way (or product) that will do a full text search as follows:

    ·       Search logic allows wild cards and reverse order names.  For example, name John Smith (i.e., %johnsmith%) returns:

    -       John Smith,

    -       John Smith, esq.

    -       Smith, John

    -       John Smith Bakery, Inc., etc. 

     

    I doesn't seem like a big problem until I found out that the users will be entering the name without spaces (i.e. johnsmith)

    Any ideas would be appreciated.

  • Sorry I know of no tool that can do that kind of fuzzy logic. Might be a cool tool but you might be able to do in the background a search for each individual letter as opposed to the word. Then you just have to determine how to deal with situations where all the letters appear but not the words they may expect. You would have to come up with a weighting based on closeness of characters and the fewer sets together in a word the less likely the should appear.

  • I'm not saying that this will run lightning fast, but I believe the following would work. Below is a SQL function definition that will take a string and create various combined mutexes. You can then perform a LIKE on the results within your where clause. If you want it to perform better you could store the results of this function as a new column in the table (update this new column whenever the source column(s) change), but it would still be a LIKE clause against a large varchar. Perhaps even a new table with a foreign key instead of a column (due to excessive row length and subsequent excessive page storage use). I only did 2 word mutexes here. To find "John B. Smith" from "Smith, John B." would take some more work but could be done. It will run even slower so only support it if there is a need to.

    EXAMPLE QUERY:

    SELECT * FROM MyTable WHERE dbo.NameMutate(ClientName) LIKE '%johnsmith%'

    EXAMPLE FUNCTION RESULTS:

    print dbo.NameMutate('John Smith')

    print dbo.NameMutate('John Smith, esq.')

    print dbo.NameMutate('Smith, John' )

    print dbo.NameMutate('John Smith Bakery, Inc., etc.')

    GO ----->

    JohnSmith,JohnSmith,SmithJohn

    JohnSmithesq,JohnSmith,Johnesq,SmithJohn,Smithesq,esqJohn,esqSmith

    SmithJohn,SmithJohn,JohnSmith

    JohnSmithBakeryIncetc,JohnSmith,JohnBakery,JohnInc,Johnetc,SmithJohn,SmithBakery,SmithInc,Smithetc,BakeryJohn,BakerySmith,BakeryInc,Bakeryetc,IncJohn,IncSmith,IncBakery,Incetc,etcJohn,etcSmith,etcBakery,etcInc

    ALTER FUNCTION NameMutate

     (@List  varchar(200))

    RETURNS varchar(8000)

    AS

    begin

     declare @IndexStart smallint

     declare @IndexEnd smallint

     declare @Length smallint

     declare @Delim  char(1)

     declare @Word  varchar(200)

     declare @Row  smallint

     declare @AllWords table ( AWord varchar(200), WordNumber smallint )

     declare @Results varchar(8000)

     set @Results = ''

     set @IndexStart = 1 

     set @IndexEnd = 0

     set @Delim = ','

     --First, convert all possible delimeters to single delimeter

     set @List = REPLACE( @List, ' ', @Delim )

     set @List = REPLACE( @List, '.', @Delim )

     -- Now compress out all consequtive instances of a delimeter.

     set @List = RTRIM( @List ) -- just in case @Delim becomes a space since LEN does an implicite RTRIM

     set @Length = LEN( @List )

     set @List = REPLACE( @List, @Delim + @Delim, @Delim )

     while @Length > LEN( @List )

          begin

      set @Length = LEN(@List)

      set @List = REPLACE( @List, @Delim + @Delim, @Delim )

          end

     --Make certain the raw input becomes a result

     set @Results = REPLACE( @List, @Delim, '' )

     --Now parse out each word into a table

     set @Row = 0

     while @IndexStart <= @Length

          begin

      set @Row = @Row + 1

      if @Row >= 999 return @Results -- hard limiter just in case

      

      set @IndexEnd = charindex(@Delim, @List, @IndexStart)

      

      if @IndexEnd = 0

       set @IndexEnd = @Length + 1

      

      set @Word = substring(@List, @IndexStart, @IndexEnd - @IndexStart)

      

      set @IndexStart = @IndexEnd + 1

      

      INSERT INTO @AllWords

       VALUES ( @Word, @Row )

          end

     

     --Now create a combinatorial mutation of every word pairing (hope there are few)

     set @IndexEnd = ( SELECT COUNT(*) FROM @AllWords )

     set @Row = 1

     while @Row <= @IndexEnd

          begin

      SELECT @Word = AWord

       FROM @AllWords

       WHERE WordNumber = @Row

      SELECT @Results = @Results + @Delim + @Word + AWord

       FROM @AllWords

       WHERE WordNumber != @Row

      set @Row = @Row + 1

          end

     return @Results

    end

Viewing 3 posts - 1 through 2 (of 2 total)

You must be logged in to reply to this topic. Login to reply