Word count using Full text search

Question

Word count using Full text search

krishna.kumar

Right there with Babe

Points: 775
More actions
March 1, 2010 at 6:52 am

#98548

Hi All,
I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results.
Is there a way where i can get the total number of words in that column.
For eg:
mytextfield (field name) ntext (datatype)
Data is : "I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results. "
Now I need to query which will return the count of words in that field for that row. Ideally i should get an answer : 22.
How do i achieve that ? Is there a simple way using Full text search queries ?
Awaiting your responses
Thanks in advance
krishna

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply

Lowell SSC Guru Points: 323518 More actions · Answer 1

i think that by using the REPLACE function for every space, and then adding one to it, you can get thecount of the words, because they are seperated by spaces;

example:

select 'I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results. ' As Article

into #tmp

union all

select 'Now I need to query which will return the count of words in that field for that row. Ideally i should get an answer'

select 1 + len(Article) - len(replace(Article,' ','')) AS WordCount

from #tmp

--Results: 22 and 24

Lowell

--help us help you! If you post a question, make sure you include a CREATE TABLE... statement and INSERT INTO... statement into that table to give the volunteers here representative data. with your description of the problem, we can provide a tested, verifiable solution to your question! asking the question the right way gets you a tested answer the fastest way possible!

krishna.kumar Right there with Babe Points: 775 More actions · Answer 2

Hi Lowell,

thank you for the quick response.

Is there any better way to find the word count. I mean is there any keyword or option to get the word count if we are using Full Text Search ?

One of my friend was suggesting that he had seen some where that there is an option to get the word count if we use FTS. He wasn sure of the correct method or syntax.

I am trying to search if we have some better methods other than the replace option alone. The reason is the table will contain number of records and we need to find the total number of words in the Article field in that table.

Thanks again for your reply.

Anyone can you let me know if there is some better options.

Warm Regards

krishna

Mauve SSChampion Points: 11316 More actions · Answer 3

SQL Server's full-text subsystem cannot solve your problem for several reasons:

1. The number of "words" would not be accurate as the full-text indexer tokenizes the text into "terms" based upon the word breaker for the language. White space and punctuation are generally used to tokenize Western European languages.

Examples:

The "word" SQLServerCentral.com is tokenized into two (2) terms: "SQLServerCentral" and "com".

The word "full-text" is tokenized into two (2) terms: "full" and "text".

2. Stop (noise) words are not indexed. A totally different topic as to use or not use stop (noise) words.

3. Even if you could directly query the full-text index (tables in SQL Server 2008), which BTW, you can in Oracle, the data is an inverted index. I.e., the "terms" are indexed with a reference to the document. So you would have to perform an aggregation of all indexed terms for the particular indexed item (PK).

[font="Arial Narrow"](PHB) I think we should build an SQL database. (Dilbert) What color do you want that database? (PHB) I think mauve has the most RAM.[/font]

krishna.kumar Right there with Babe Points: 775 More actions · Answer 4

Hi Mauve,

Thank you so much. It saved a lot of my R & D time.

Thanks again to all who checked my issue and for giving me a solution.

Warm regards

krishna

Paul White SSC Guru Points: 150467 More actions · Answer 5

If this is new development, you should avoid NTEXT it is deprecated, and will be removed from SQL Server. If you need to store large Unicode strings (> 4,000 characters) use NVARCHAR(MAX).

It is potentially more efficient to count the number of words when the row is inserted (i.e. once) and store the value in an extra column. Ideally, the application would do the counting, not SQL Server.

Paul

Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi