Word count using Full text search

  • Hi All,

    I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results.

    Is there a way where i can get the total number of words in that column.

    For eg:

    mytextfield (field name) ntext (datatype)

    Data is : "I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results. "

    Now I need to query which will return the count of words in that field for that row. Ideally i should get an answer : 22.

    How do i achieve that ? Is there a simple way using Full text search queries ?

    Awaiting your responses

    Thanks in advance

    krishna

  • i think that by using the REPLACE function for every space, and then adding one to it, you can get thecount of the words, because they are seperated by spaces;

    example:

    select 'I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results. ' As Article

    into #tmp

    union all

    select 'Now I need to query which will return the count of words in that field for that row. Ideally i should get an answer'

    select 1 + len(Article) - len(replace(Article,' ','')) AS WordCount

    from #tmp

    --Results: 22 and 24

    Lowell


    --help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

  • Hi Lowell,

    thank you for the quick response.

    Is there any better way to find the word count. I mean is there any keyword or option to get the word count if we are using Full Text Search ?

    One of my friend was suggesting that he had seen some where that there is an option to get the word count if we use FTS. He wasn sure of the correct method or syntax.

    I am trying to search if we have some better methods other than the replace option alone. The reason is the table will contain number of records and we need to find the total number of words in the Article field in that table.

    Thanks again for your reply.

    Anyone can you let me know if there is some better options.

    Warm Regards

    krishna

  • SQL Server's full-text subsystem cannot solve your problem for several reasons:

    1. The number of "words" would not be accurate as the full-text indexer tokenizes the text into "terms" based upon the word breaker for the language. White space and punctuation are generally used to tokenize Western European languages.

    Examples:

    The "word" SQLServerCentral.com is tokenized into two (2) terms: "SQLServerCentral" and "com".

    The word "full-text" is tokenized into two (2) terms: "full" and "text".

    2. Stop (noise) words are not indexed. A totally different topic as to use or not use stop (noise) words.

    3. Even if you could directly query the full-text index (tables in SQL Server 2008), which BTW, you can in Oracle, the data is an inverted index. I.e., the "terms" are indexed with a reference to the document. So you would have to perform an aggregation of all indexed terms for the particular indexed item (PK).


    [font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

  • Hi Mauve,

    Thank you so much. It saved a lot of my R & D time.

    Thanks again to all who checked my issue and for giving me a solution.

    Warm regards

    krishna

  • If this is new development, you should avoid NTEXT it is deprecated, and will be removed from SQL Server. If you need to store large Unicode strings (> 4,000 characters) use NVARCHAR(MAX).

    It is potentially more efficient to count the number of words when the row is inserted (i.e. once) and store the value in an extra column. Ideally, the application would do the counting, not SQL Server.

    Paul

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply