August 1, 2005 at 7:25 am
I've got a client who is proposing to build a web application for people to post and search large amounts of data. I think we're talking about 150 -200 fields, many of them large Varchar fields and possibly a few text fields.
He wants to really facilitate search functions on those fields and is asking me if we're going to need any 3rd party search software. Now I'm looking at Verity and some others and they're ridiculously expensive, and I'm thinking "I don't think these are really what we want or need," because they're mostly for indexing data that lies outside a reltational database, i.e. files on the server. Since all of our data is going to reside in SQL Server, I'm thinking the only tool we'll really need is going to be Full Text Indexing.
What is you opinion on the subject? The searches will need to be for keywords and phrases.
Thanks.
August 2, 2005 at 4:10 am
Microsoft SQL FTS is ideal for this - and its a free component.
--
Hilary Cotter
Looking for a SQL Server replication book?
http://www.nwsu.com/0974973602.html
Looking for a FAQ on Indexing Services/SQL FTS
August 2, 2005 at 4:32 am
I agree with Hilary, FTS is definately the way to go in this situation.
If you're storing documents inside SQL Server (PDF etc) you can also download iFilters for them which allow the search service to look inside and include them in the results too. You get the standard MS ones built in so it's just other formats you need to look for if you're going down that route.
August 2, 2005 at 8:08 am
Wow. Cool. I had no idea about the iFilters. That's amazing.
How does FTS deal with phrase searches? Does it index longer phrases, to speed up searches for, say "Four Score and Twenty Years ago", or does it just index the individual words and rely on the searching algorithm to find the result with the highest incidence of those words?
Thanks.
August 2, 2005 at 8:14 am
Hi Aaron,
SQL Server Full Text Search (FTS) indexes the words (or really tokens) based upon the specific column's "Language for Word Breaker" or if you want to use the Neutral word breaker, it will break the words into tokens based upon the white space between the words.
As for phrases (short or long), FTS treats them as any string of words within double quotes, so "word1" and "word2 word3" are both treated equally as search words or phrases. A good reference souce is SQL Server Books Online (BOL) for CONTAINS or FREETEXT.
Another factor is the OS-supplied wordbreaker that you have SQL Server installed on. Could you post the full output of SELECT @@version ?
Thanks,
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
John T. Kane
August 2, 2005 at 9:32 am
We don't have the server yet. It's still in the planning stages. I have other servers I could get ther version info for, but no point.
It will probably be a Windows Server 2003 machine, probably running SQL 2005 (when it's out).
Thanks.
Aaron
August 3, 2005 at 11:35 am
Does FTS index files (Word, PDF etc.) on the harddrive or just within SQL Server as Binary data types?
Aaron
August 4, 2005 at 3:11 am
The FTS for SQL only does files inside SQL Server, however, if you use sharepoint then i think you can set it up to do files in the OS.
I have seen situations where a DBA has managed to create linked server to the indexer service and used it from inside SQL but I don't have any more details on it other than it's possible.
John Kane and Hilary Cotter are both what I would call experts (and most others would no doubt agree) in the area of FTS, if I had a problem like this then I'd be looking to see if I could get them to help out with suggestions.
August 4, 2005 at 4:59 am
And I take it they're both on this forum?
Thanks for your help, Mike.
Aaron
August 4, 2005 at 5:16 am
I've seen them in this forum before (earlier in this thread for instance ).
If you want to attract their attention then you should probably think about posting into the microsoft.public.sqlserver.fulltext group.
Hope this helps,
Mike
August 4, 2005 at 6:24 am
It helps a lot. Thank you.
Aaron
August 4, 2005 at 8:04 am
Hi Arron,
Yes, I post many of the replies (and answers) in the public fulltext newsgroup, and I also monitor and post in this fourm as well as others as "forums" seem to be the new and up-coming method for asking questions and getting answers today in the 21st century!
Q. Does FTS index files (Word, PDF etc.) on the harddrive or just within SQL Server as Binary data types?
A. Short answer, Yes. Longer answer, the Indexing Service (IS) is the app to use to FT Index files (Word, PDF, etc.) on the hard drive and SQL Server Full-text Search (SQL FTS) is used to FT Index textual content in SQL Server tables. You can also use a Linked Server to the Indexing Service via the MSIDXS to query results from the IS. You will need to create the IS Linked Server via:
sp_addlinkedserver FTIndexWeb, 'Index Server', 'MSIDXS', 'Web'
(from SQL 2000 BOL title "OLE DB Provider for Microsoft Indexing Service")
Then you can use this Linked Server and join it with a SQL FTS query, such as:
There are two approaches to this that you can use:
1. Use the Indexing Service OLEDB Provider ('MSIDXS') and define a Linked
Server and then use OpenQuery to query the IS from SQL Server, for example:
EXEC sp_addlinkedserver
@server = 'lsIndexServer', -- Name
@srvproduct = 'Index Server', -- product name of the OLE DB data source
@provider = 'MSIDXS', -- Indexing Servics (IS) OLE DB Provider
@datasrc = 'IS_DDrive' -- IS Catalog
go
SELECT * FROM OPENQUERY( lsIndexServer,
'SELECT Path, Filename FROM IS_DDrive..SCOPE()
WHERE CONTAINS( ''john'' ) AND CONTAINS( ''kane'' )' ) AS jtk
Note, the above query may not perform to your expecations as others have reported poor peformance with this type of query to an external source of data for SQL Server.
Hope that helps!
John
SQL Full Text Search Blog
http://spaces.msn.com/members/jtkane/
John T. Kane
Viewing 12 posts - 1 through 11 (of 11 total)
You must be logged in to reply to this topic. Login to reply