Full Text Search to Compare Values within the Same Column

  • Hi, I am not new to SQL in general, but I new to Full Text Search. Here's my challenge. I have a table of 994 distinct project titles. But, in reality, many of these represent the same title. For example, I may have one row that says "NPYD Blue" and another row that says "N.Y.P.D. Blue: Episode 1". These are actually the same title in the real world. In order to assist someone from the business in normalizing all of these titles, I have been asked to do a comparison to see what titles are similar to make their job a little easier. I have explored the Full Text Search idea and am not quite sure how to implement it in this way. I have already tried this using a combination of the CHARINDEX and LEFT(RIGHT()) functions. It was somewhat successful, but I wanted to see if Full Text Search would yield better results.

    Thanks,

    Kris Traughber

  • You can refer this link http://www.simple-talk.com/sql/learn-sql-server/sql-server-full-text-search-language-features/

    -------Bhuvnesh----------
    I work only to learn Sql Server...though my company pays me for getting their stuff done;-)

  • Full-text is used to find things based upon search criteria.

    What full-text indexing would do would be to tokenize all of the terms (words) and then index them. But NYPD and N.Y.P.D. would be indexed differently.

    I.e.

    NYPD would be one term "NYPD"

    While N.Y.P.D. would be four (4) terms "N", "Y", "P' and "D"

    if you have removed all of the Stop (noise) words. By default, single letters and numbers are considered Stop (noise) words, and are not indexed.

    You have a data scrubbing / standardization problem that I do not believe full-text will help in the scrubbing effort. I worked at a company where we had a similar problem with the names of companies (businesses) being provided by numerous data providers. For that application we wrote a sophisticated program that addressed capitalization, spelling, abbreviations and utilized a thesaurus for term replacement.


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Thanks for both responses. We too have come up with our own algorithm that we are going to start testing to address the issue. But, I'm still going to play around with full text search and see if it can help, if only just a little.

Viewing 4 posts - 1 through 3 (of 3 total)

You must be logged in to reply to this topic. Login to reply