Full Text Search Using Soundex()

  • Is there any way to perform Sql Server 2000 full text search using Soundex() function? I've developed a customized search engine in ASP.NET using Sql Server 2000, that works fine with exact and prefix searaching using CONTAINS() predicate, but I don't know how to use CONTAINS() with SOUNDEX().Is there any expert who could help me. Thanks a lot.

  • CONTAINS is Microsoft proprietry so I don't think you can use SOUNDEX with it.  The code below may be what you can do with word sound.  Run a search for CONTAINS in SQL Server BOL (books online).  Hope this helps.

    USE Northwind

    GO

    SELECT ProductName

    FROM Products

    WHERE CONTAINS(ProductName, ' FORMSOF (INFLECTIONAL, dry) ')

    GO

    Kind regards,

    Gift Peddie

    Kind regards,
    Gift Peddie

  • Actually, CONTAINS is not proprietry to Microsoft as it is in the SQL/MM spec for Full Text Search. Is is how Microsoft SQL Server implments Full Text Search (FTS) within its RDBMS. No matter, SOUNDEX and FTS queries can be combined, for example:

    -- CONTAINS query that Searches for names that start with "Mich".

    select au_lname, au_fname FROM authors -- returns 0 rows

     where contains(au_fname, '"Mich*"')

    -- Combined SOUNDEX OR CONTAINS query that

    -- searches for names that sound like "Michael".

    select au_lname, au_fname FROM authors -- returns 2 rows

     where contains(au_fname, '"Mich*"') or SOUNDEX(au_fname) = 'M240'

    However, Full-text Search (FTS) does not use SOUNDEX directly, but

    it can be used in combination with SOUNDEX (see above example). Additionally, you may want to review the following links as well as the below TSQL examples of combining CONTAINS & SOUNDEX:

    You may want to look at some of the improved soundex algorithms as well as

    the Levenshtein Distance algorithm You should be able to search Google to

    find more code examples, for example: 'METAPHONE soundex "sql server" fuzzy

    name search' and I quickly found - "Double Metaphone Sounds Great" at

    http://www.winnetmag.com/Article/ArticleID/26094/26094.html You can freely

    download the code in a zip file that has several a user-defined function

    (UDF) that implement Double Metaphone.

    Below are some additional SOUNDEX links:

    http://www.merriampark.com/ld.htm

    http://www.bcs-mt.org.uk/nala_006.htm

    Hope that helps!

    John

    SQL Full Text Search Blog

    http://spaces.msn.com/members/jtkane/


    John T. Kane

  • I was not clear but CONTAINS as implemented in Microsoft FULL TEXT is proprietry.  And combing it with SOUNDEX string function is redundant because some version of CONTAINS gives the same result as SOUNDEX.  BTW when CONTAINS was implemented in SQL Server 7.0 in 1999 it was in the standard but not implemented in other RDBMS(relational database management systems). Hope this helps.

    Kind regards,

    Gift Peddie

    Kind regards,
    Gift Peddie

  • From a performance perspective, if you are running code like

    select au_lname, au_fname FROM authors -- returns 2 rows

     where contains(au_fname, '"Mich*"') or SOUNDEX(au_fname) = 'M240'

    Try to have a calculated column that stores the value of SOUNDEX(au_fname) rather than calculating it on the fly every time as SQL will need to evaluate the function for every row!  If you store the SOUNDEX (or metaphone) value in a colum in the table (you could use a trigger or populate during an insert in your Stored proc), then you can do a direct match with an index, making things MUCH faster.

    So you would do...

    select au_lname, au_fname FROM authors -- returns 2 rows

     where contains(au_fname, '"Mich*"') or au_fname_soundex = 'M240'

    with an index on the column au_fname_soundex.

    Cheers

  • wouldn't it be quicker/better to do a union between the 2?

    select au_lname, au_fname FROM dbo.authors where contains(au_fname, '"Mich*"') 
    UNION
    select au_lname, au_fname FROM dbo.authors where au_fname_soundex = 'M240'

    Assuming that it's actually going to be used to search through a lot of records (it's probably overkill for small DB's)

  • Gift,

    While Microsoft's implementation of CONTAINS may be proprietry, CONTAINS is actually part of the SQL/MM standard, specificlly:

    The SQL/MM spec for Full-Text Search is maintained under: "Information technology -- Database languages -- SQL multimedia and application packages -- Part 2: Full-Text" at http://www.iso.org/iso/en/CatalogueDetailPage.CatalogueDetail?CSNUMBER=31368&scopelist= (for purchase). However, I already have a pdf version of this spec and I've reviewed it and CONTAINS is part of this standard, but FREETEXT is unique to SQL Server and is proprietry to Microsoft. Each RDBMS vendor is free to implement this ISO standard in with its own methods and while MS was first to implment this is debatable as Oracle IntraMedia/Context/Text also has simalar Full-text Search features at this time. 

    Combining the SOUNDEX string function with CONTAINS is not redundant as the two SQL query examples above proves. Finally, I worked at Microsoft during the early development of SQL Server 7.0 with the Full-text Search Dev team and FTS was first incorporated in Beta3 of SQL 7.0 in mid-1998.

    Kind Regards,

    John

    SQL Full Text Search Blog

    http://spaces.msn.com/members/jtkane/


    John T. Kane

  • I am sorry I was off by a year 1998 and I did not say the other RDBMS(relational database management system ) vendors cannot implement CONTAINS but I know as of Oracle  10g release 1 CONTAINS is not in ORACLE.

    And I don't pay for ANSI SQL the US Goverment pays for ANSI SQL so their copy is always on the web free, got my first copy pre Google days in the mid 1990s, the current version is ANSI SQL 2003 and yes I have  a copy.

    Kind regards,

    Gift Peddie

    Kind regards,
    Gift Peddie

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply