Store Huge Files in database

  • Dear Friends,

    I have to write a Document management software and I want to store my files into database. My files have different sizes, they could be between 5 MB to 10GB.

    What should I do?

    Which type is better?

    Has sql2008 any kind of new features to cover files or it is same as 2000?

    Thanks alot

    Ashkan

    Best Regards,
    Ashkan

  • Microsoft say that filestream performs better when storing files greater than 1MB in size.

    This is a SQL2008 and 2008R2 feature.

  • Any specific reason, you need to keep the file in DB?

    However, Filestream is an option.

  • Yes, I want sql to index my file content, because I want to do a content search on that...

    Best Regards,
    Ashkan

  • You want SQL Server to index the contents of your file?

    What type of file are you going to store?

    -------------------------------------------------
    Will C,
    MCITP 2008 Database Admin, Developer

  • ashkan siroos (6/11/2011)


    Yes, I want sql to index my file content, because I want to do a content search on that...

    Lookup FULL TEXT SEARCH and the files need to be converted to "plain text".

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Thanks dudes;)

    Best Regards,
    Ashkan

  • You can index some native type documents, like Word as well. I would assume PDFs would count, but you'd have to experiment

    http://www.codeproject.com/KB/architecture/sqlfulltextindexing.aspx

  • Thanks dude,

    You are wonderful!Sorry article was about SQLServer 2000 and 2005,So I want to know Is there any difference or additional feature about Full-text search in SQLServer 20008 by any chance?

    Best Regards,
    Ashkan

  • Yes, full text changed a lot in 2008. It became Integrated Full Text Search, with additional enhancements for multi-languages, better indexing of documents and more. Check Books Online for more of the enhancements.

  • ashkan siroos (6/6/2011)


    ...My files have different sizes, they could be between 5 MB to 10GB...

    For files larger than 2GB you'll be forced to use the FILESTREAM data type. Luckily full-text can operate on a FILESTREAM. Have a look at this article, there are some limitations: http://msdn.microsoft.com/en-us/library/bb895334.aspx#FullText

    Are you going to be rewriting the file data in the database via UPDATE statements? How often?

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • Dear Steave and OPC,

    Thanks for reply.

    And No I won't rewrite on file contents, because I have a versioning system and my client wants me to save his versions.

    Thank you very much

    Best Regards,
    Ashkan

  • You do NOT want to store large files in the database engine. Performance can be disastrously bad when those files get lifted up and force other more useful and important data out of the buffer pool. Fortunately based on your file size you are FORCED to use FILESTREAM. Another option is to simply store OS files and use a third-party indexing engine/device to index and search the files. That would actually be my choice here given the limited knowledge of actual system requirements.

    Best,
    Kevin G. Boles
    SQL Server Consultant
    SQL MVP 2007-2012
    TheSQLGuru on googles mail service

  • My initial thought was to recommend looking into the Search Server built into SharePoint. I have implemented it. It's a great way to expose an organizations disparate data down to the use of one search box. You can have it index network shares and it also supports custom filters so you can index things like PDFs and other third-party binary document formats.

    That said, with the proper hardware FILESTREAM + Full-text Indexing + the custom filters mentioned on this thread are a viable option.

    FILESTREAM does not affect data in the buffer pool according to this article http://msdn.microsoft.com/en-us/library/bb933993.aspx:

    FILESTREAM uses the NT system cache for caching file data. This helps reduce any effect that FILESTREAM data might have on Database Engine performance. The SQL Server buffer pool is not used; therefore, this memory is available for query processing.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

Viewing 14 posts - 1 through 13 (of 13 total)

You must be logged in to reply to this topic. Login to reply