Full-Text Indexing Limitations

  • Hello All,

    I am involved in a document management project.I am right now weighing the pros and cons of storing the documents on the database/filesystem.I have to implement full text indexing on my documents.We might be having a maximum of 50,000 documents and max document size is 2MB.We are storing all kinds of files(.doc,.txt,.html,.pdf etc).

    These are the 2 choices I have:

    1) Store everything in the database and implement FullText Indexing supported by builtin MS Search Service.By using this method I/O 's might take some time but I can issue population whenever I want.

    2)Store the Documents in FileSystem and path in Database.Indexings is implemented by Microsoft Indexing Service.Choosing this method will take less I/O time but I might not have full control on population.

    Any sugessions on how much I/O time the first method might take and how much control we have on population in second method would be greatly appreciated.

    Regards

  • There is more flexibility in the Index Service that ships with Windows 2000, but it probably won't give you the control you are looking for. Here's a Q article to force population immediately (which was the default in 2.0), but that's about the best you can do, so far as I am aware:

    http://support.microsoft.com/default.aspx?scid=kb;EN-US;q270056

    One of the things that stands out in what you've said is .pdf. Index Server by default doesn't filter .pdf, you've got to get a filter from Adobe:

    http://www.adobe.com/support/downloads/product.jsp?product=ac&platform=win

    My group at one point looked at DTSearch to replace Index Server for document management on our corporate intranet, we just never got around to fully testing it (mission critical apps pulled away developer and DBA resources). However, it does support searches against .PDF files for one of its engines. It's been highly rated by a ton of different orgs, too. Here's a link:

    http://www.dtsearch.com/

    Hope that helps.

    K. Brian Kelley

    bk@warpdrivedesign.org

    http://www.sqlservercentral.com/columnists/bkelley/

    K. Brian Kelley
    @kbriankelley

  • I dont think there is as much difference in overhead as you might think. Regardless of where you store it, it takes x reads to get a file from disk into memory. Depending on the usage pattern SQL could even be faster due to caching.

    Jon has a lot more experience with this that I do - he worked for a company that moved from file system to SQL. I remember him telling me that with everything in the file system if you clicked the folder by mistake in Explorer it would take an hour or so to paint (with 100k or some hugh number of files like that)!

    Jon?

    Andy

  • Just want to make sure we're talking about full-text indexing rather than index server. If I recall, the problems we were having were with index server, not full-text indexing in SQL.

    I don't have any specifics in regards to performance comparisons, but my experience using full-text indexing has been a good one. I haven't had any difficulties with maintaining it, despite its few query limitations like a restriction on the number of contains clauses or multiple column catalogs not always returning results that at first glance seemed intuitive.

    All in all, full-text indexing offers a great deal of flexibility. Inparticular, you can set it up so that it indexes real time. ('update index in background' option)

    I haven't had any experience accessing blobs even close to the 2 mb size, so I can't give any good advice on how it performs at that level. Typically, I have had no problems with performance.

    I have written an article that might answer a few questions on the subject: http://www.sqlservercentral.com/columnists/jwiner/20010422025723_1.asp

    Let me know any other details you might have questions on.

  • I am really thankfull to 3 of you in assisting me to opt for full text indexing of sql server.

    I am pretty new to databases and doing my best to get to know all the concepts involved.So I will be around for quite a while to complete this project succesfully.Is this the right Newsgroup to post these questions?

    I am right now looking into the best way to store my file into database image field.I am prefering ODBC method but I could not get detailed information on how to implement SQL_GetData and SQL_PutData.I have gone through msdn library but it was not that helpfull.Any piece of code using these functions or a link to nice article would be really helpfull.

    TIA.

  • Just try to put the question in the appropriate topic when you can, makes it easiers for others to find.

    Andy

  • We'll let you know if you miss post. No harm though.

    One other thing, I'm not sure the performance will matter either way as the control is pretty good with either service. One thing I'd suggest is to use physical drives to separate the content, perhaps using filegroups so you can place the BLOBs to be indexed somewhere on a fast array.

    Steve Jones

    steve@dkranch.net

Viewing 7 posts - 1 through 6 (of 6 total)

You must be logged in to reply to this topic. Login to reply