March 12, 2009 at 2:48 pm
hi all,
I use full text index (sql server 2005) to search several million rows on a column of TEXT type. I only search a very small set of keywords. So there is no need to index all the words. Is there anyway to build a custom index only on the few words that I'm actually going to search on?
I know that noise words can filter out the words that I'm not interested. But in my case, I kinda need the opposite of the noise words approach because most of the words in the rows will be noise words...
If I understand it correctly, IFilter is only for searching different format of the documents. So it won't help in my case...
Any suggestions?
thanks in advance.
Tony.
March 12, 2009 at 7:40 pm
I think you can do that by configuring a Therausus but I have not used it so check the links below to see if it is relevant to you.
http://msdn.microsoft.com/en-us/library/ms142491.aspx
http://www.simple-talk.com/sql/learn-sql-server/understanding-full-text-indexing-in-sql-server/
Kind regards,
Gift Peddie
March 13, 2009 at 10:44 am
Thank you very much for the reply.
I read the 2 articles. But Thesaurus doesn't look like it's what I'm looking for. Thesaurus defines synonyms expansion set or replacement set.
What I'm looking for is a way to build a full text index only on a very small set of selected keywords (couple of hundred) while filtering out most of the noise words (millions)...
Any other suggestions?
thanks in advance
Tony.
March 13, 2009 at 1:03 pm
Curious what the reason is for wanting to limit the words in the index? Do you think you will get faster index results (not sure you will), trying to save space on the full text index? I don't really have a solution at this point (previous poster already suggested what I thought of) based on your initial question, but wondering what the ultimate goal you are trying to achieve is. Maybe there is a better way than just eliminating words from the index if you have a specific goal.
March 13, 2009 at 2:16 pm
hi,
Thanks for the reply.
My #1 goal is to speed up full text search speed.
It will save lot of space too if we can build index only on the words that we are interested in, which is like few hundred words out of millions.
thanks
Tony.
March 14, 2009 at 10:56 am
How can you tell in advance that your users will perform searches using only certain words? Am I missing something?
March 14, 2009 at 11:30 pm
I have a similar question concerning SQL Server 2005. I have a table with about 300 million rows of data. One of the columns is a real value. Is there a way to index just the NULL values, for that column? The rest of the index is completely wasted as the only query I use the index for is to find the NULL values.
March 16, 2009 at 4:50 am
You can write a custom iFilter to limit the scope of Full Test search.
The job of the iFilter is to take the original document and strip away the documnt formatting to give an output file of just the words. There is nothing to prevent you writing a custom iFilter that just returns the words you want.
Having said that, SQL Server Full Text search is very fast, especially in SQL Server 2008. If you are not hapy about the performance of FTS, are you sure the dictionary is the core problem? Other aspects that can have a big impact on FTS speed are:
a) Isolate all FTS data on a disk or disks of it own, do not mix this with other data. This prevents I-O to the other data slowing down FTS.
b) Make sure you have enough memory to run FTS efficiently. This means a 64-bit server is essential.
c) Consider storing the iFilter output in a column of its own and definging the FT index on this column, not the original document. If you have to rebuild the FT index this approach will be much faster.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
March 16, 2009 at 10:45 am
Thank you very much for the IFilter suggestion. It seems to be a viable solution...
Thanks
Viewing 9 posts - 1 through 8 (of 8 total)
You must be logged in to reply to this topic. Login to reply