Binary data pros/cons

  • I am working on a system that keeps a store of small (~25KB) Office documents, as well as all sorts of customer/vendor/user data. I was considering storing the document data as binary blob (type Image?) in a record, along various other fields. I mentioned idea of storing document data in a record to someone who has more experience than myself with SQL; he was instantly and adamantly apposed to the idea. He said the database would be too large and slow, but didn’t ask about the database design, indexes, PKs, etc, just about how long the records will be kept in the store (6 months). His alternative design was to store a text path to a shared document, vs. the document proper. I was trying to avoid this scheme because software would then have to be created to maintain/sync each Office document (roaming clients, achieving, backups, etc). I was led to believe that MS SQL Server 2005 has been tuned over time (page sizes, dedicated pages for binary data, etc), and would be quite capable in this regard. So what am I missing? What are the pros and cons of storing documents as binary blobs?

  • there are a few reasons for not storing the data in the database, but they are mainly to do with manageability.

    the data stored in the binary field is not stored "in row" with the rest of the data, so performance, indexes etc are not really an issue - there are some performance issues based around performing "select * from ...." statements against these tables, but that's where you need to implement a best practice policy to prevent this.

    the biggest problem for the dba comes into play when backups, defrags and checkdbs are all performed - they'll take an age and any weekly or nightly maintenance on the system will take much longer.

    as a dba i'd say don't store it in the database - it's not what databases are for. they're for querying data, not to be used as a a file system. SQL server is designewd for set based operations such as the SELECT statement, not for running ado getchunk and appendchuck operations on single rows.

    as a developer it makes sense to store it in the database - no broken file links, back up the databse and you back up all the files etc...... but i'm afraid the benefit of these doesn't outweigh the drawbacks.

    although your dba has not explained himself properley he's actually right (in most instances)

    MVDBA

  • Just to be a little contrarian...

    Text and image data types can be performance concerns and you need to make sure your hardware is up to the task and your physical design is well-tuned.

    Storing the documents outside the database can also be a management nightmare when a recovery is necessary.  If you restore the database to some past point in time, how do you sync it with the documents stored somewhere else?  If you have to restore the document store, how do you re-sync it with the database?  These questions should be answered before the problem crops up. If everything is in the database, you may still lose some information, but at least the information you have will match. 

    I've seen it work both ways and the choice of the types of pain you sign up for usually ends up being driven by the business need. 


    And then again, I might be wrong ...
    David Webb

  • i agree david,

    but there is a way to keep both in sync - although it's expensive

    store the documents on a nas or san and use snapshots and the incremental change recording (i can't remember what it's called) so that you can do a PIT recovery on your file storage at the same time as PIT on your database

    david's comments about performance are also valid as i overlook that fact that on any system i create we alsways specify a raid 1+0 array for data not in row and move the binary filegroup to this array to lessen the impact

    MVDBA

  • I've done a lot of work with web CMS and doc management systems and many of those store binary objects in the DB.

    This is done so that those systems can hold different revisions of those objects and also so that those objects can be restricted via the application security.

    For example, if I have a confidential "Oh my God we are all going to die" document on Avian Flu I may want to restrict it's use and visibiliy within the CMS to those users who should have access to it.

    Another reason for storing binaries in the DB is to allow workflow operations. That is Author, Review, Approve, Publish activities.

    Yet another reason is when your delivery method is via a web farm. Your load balanced web servers have got to be able to serve up a required document. Getting both server A and server B to service the request for the document can be a nightmare.

    The instant you start talking about security in a mixed file/db storage you are talking about exponential complexity.

    When it comes to storing binaries in the DB you need to remember that there is a default of 64Kb for replication though this can be over-ridden using sp_configure.

    On the whole, if your requirements are simple then I use file based storage. If they are more complex then I would use DB storage.

  • The requirements are simple. But human resources are limited too. 1 guy, about 1 hr per day for the whole project: UI forms, DB schema & SPs, reporting, doc & training, setup/install, training. I am doing the work for free for a friend. I hope he is still my friend after he uses what I deliver. Performance will probably not be an issue as there are 2 or 3 users and DB server is PC is 3GHz/2GB/400GB/10000RPM. I took on the project because I wanted to force myself to learn SQL plus more, which I am doing. The system may be discarded in a yr or two when profits warrant buying a 3rd-party package that has all the wiz-bang features. I can’t help but wonder if the pros outweigh the cons in this case (despite the warnings).

  • We've got yet another reason to use the DB storage method.  We allow hundreds of customers to trade work between themselves.  This includes attachments that move from one customer to another around the nation.  At the local level, they create a word document or scan a signed Affidavit to a PDF, then attach those attachments to a job in our workflow management software.  We then allow them to upload those jobs and attachments (completely from within our VB EXE) to our SQL server using ADO.

    The "destination customer" then downloads the job and it's attachments again through ADO completely within our EXE.  No one has to use FTP, or click on a link to get the PDF, or anything else "outside" the main workflow software.  Our customers like the ease of use, and it hasn't bogged down the server yet.

    If they want to view the status of a job online, and view the attachment, it's drop dead simple to show the attachment in a web page (ASP) through ADO.

    I can't imagine trying to accomplish this task storing the attachments in a separate directory based system and keeping everyone in synch.

     


    Student of SQL and Golf, Master of Neither

  • And if you want some real life examples of storing documents in databases directly to refute your DBA, just look at MS SharePoint and MS Project Server.  I've seen an Hyland Software OnBase installation that had 1.5tb of documents online and available, using SQL Server and SAN.  You can even make the case that is exactly what MS Exchange and Lotus Domino do.  These (as well as dozens more we could name) all store documents in the database itself.



    Mark

  • I'm not sure if it’s true but I read or heard that MS implemented http://terraserver.microsoft.com/ using SQL 2000 to prove that SQL is up to the task of processing lots of binary data efficiently. Anyone else here this?

  • I'd say it's a bit more than a rumor.  From the About page at the very link you quoted:

    The current configuration is the scale-out configuration first deployed in November 2003. All TerraServer-USA images are managed by Microsoft SQL Server™ 2000, the award-winning database server.

     


    Student of SQL and Golf, Master of Neither

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply