March 1, 2010 at 6:52 am
Hi All,
I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results.
Is there a way where i can get the total number of words in that column.
For eg:
mytextfield (field name) ntext (datatype)
Data is : "I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results. "
Now I need to query which will return the count of words in that field for that row. Ideally i should get an answer : 22.
How do i achieve that ? Is there a simple way using Full text search queries ?
Awaiting your responses
Thanks in advance
krishna
March 1, 2010 at 7:09 am
i think that by using the REPLACE function for every space, and then adding one to it, you can get thecount of the words, because they are seperated by spaces;
example:
select 'I am using ntext variable to store articles in my table. I am also using Full Text search for quicker search results. ' As Article
into #tmp
union all
select 'Now I need to query which will return the count of words in that field for that row. Ideally i should get an answer'
select 1 + len(Article) - len(replace(Article,' ','')) AS WordCount
from #tmp
--Results: 22 and 24
Lowell
March 2, 2010 at 4:39 am
Hi Lowell,
thank you for the quick response.
Is there any better way to find the word count. I mean is there any keyword or option to get the word count if we are using Full Text Search ?
One of my friend was suggesting that he had seen some where that there is an option to get the word count if we use FTS. He wasn sure of the correct method or syntax.
I am trying to search if we have some better methods other than the replace option alone. The reason is the table will contain number of records and we need to find the total number of words in the Article field in that table.
Thanks again for your reply.
Anyone can you let me know if there is some better options.
Warm Regards
krishna
March 3, 2010 at 8:17 am
SQL Server's full-text subsystem cannot solve your problem for several reasons:
1. The number of "words" would not be accurate as the full-text indexer tokenizes the text into "terms" based upon the word breaker for the language. White space and punctuation are generally used to tokenize Western European languages.
Examples:
The "word" SQLServerCentral.com is tokenized into two (2) terms: "SQLServerCentral" and "com".
The word "full-text" is tokenized into two (2) terms: "full" and "text".
2. Stop (noise) words are not indexed. A totally different topic as to use or not use stop (noise) words.
3. Even if you could directly query the full-text index (tables in SQL Server 2008), which BTW, you can in Oracle, the data is an inverted index. I.e., the "terms" are indexed with a reference to the document. So you would have to perform an aggregation of all indexed terms for the particular indexed item (PK).
March 3, 2010 at 10:32 am
Hi Mauve,
Thank you so much. It saved a lot of my R & D time.
Thanks again to all who checked my issue and for giving me a solution.
Warm regards
krishna
March 4, 2010 at 6:22 am
If this is new development, you should avoid NTEXT it is deprecated, and will be removed from SQL Server. If you need to store large Unicode strings (> 4,000 characters) use NVARCHAR(MAX).
It is potentially more efficient to count the number of words when the row is inserted (i.e. once) and store the value in an extra column. Ideally, the application would do the counting, not SQL Server.
Paul
Paul White
SQLPerformance.com
SQLkiwi blog
@SQL_Kiwi
Viewing 6 posts - 1 through 5 (of 5 total)
You must be logged in to reply to this topic. Login to reply