March 2, 2005 at 4:46 pm
I'm interested if anyone has seen (or created) any real performance tests comparing storing binary data in the database versus using a file system. There are many other factors (security, backup, maintenance, access mechanisms) to consider but we've got a good handle on them. The types of files we are talking about would be all over in terms of size. Everything from a couple of KB to a couple hundred MB.
Also, if anyone has any thoughts on comparing binary objects to other binary objects I'd be interested in that. We could use that to save space by storing single copies of the data with pointers whenever it is referenced. My thought is that the SQL engine would cache this better as well since it would be the exact same row (assuming the BLOB is in it's own table).
We're using SQL 2000 Standard but are planning on rolling out SQL 2005 Standard or Enterprise.
March 3, 2005 at 3:24 am
My preference is to keep everything under transactional control. This normally means everything in a database, but also allows for MQ (etc) messaging.
If you have mixed flat file and database data you have 3 main problem areas:
1) Ensuring maintenance actions do not disrupt the synchronisation between the file store and the database. Every mixed system I have been involved with has had problems where (non-DBA) support staff have both moved files unintentionally (drag & drop is too easy to do) or planned or emergency file store maintenance has moved files to a different location from where the database points. However it happens you get orphan files and orphan rows.
2) Recovery. It needs very careful planning to recover the database and the file store to the same point in time. Restoring to different point in time gives a jagged edge of recovery, resulting in orphan files and rows.
3) With the advent of SOX and similar legislation, the whole integrity of your system can be called into question because of orphan data. How do you show that orphans are unintentional, deliberate, or malicious changes. How do you prove that a flat file contents match what was originally put there, given that file create and modify dates can be set to any value.
Keeping blob data in a database gives its own problems, but the risks of getting orphans are greatly reduced. Using Full Recovery model and tools such as Log Explorer help greatly in proving the data store contains what was originally put there.
Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.
When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara
March 3, 2005 at 10:48 am
If you are 'rolling your own' so to speak, then I agree to a limited degree with the posted issues. If not, this it's called a DMS (Document Management System) and they've been around for quite a while (the legal professsion was my bacground in these). The SQL databases are smal and fast because they are just pointers and everything else (the documents or in this case executable/blobs) are on the OS system. There are utilities to 'sync' things and take care of 'orphans'. Of course good policy and procedures are essential as well (even for 'emergency maintenance').
RegardsRudy KomacsarSenior Database Administrator"Ave Caesar! - Morituri te salutamus."
March 4, 2005 at 3:16 pm
I have worked with document management and image management software for a few years and all of the systems that I have seen suffer from the issues raised by EdVassie. Some are marginally better than others, but ultimately they all have problems restoring their files and indexes to a consistent state unless the system is taken down during the backup. This, of course tends to preclude any kind of point in time recovery...
My recommendation is to keep it in the database if you are building your own system; if not, look for a vendor that does.
/*****************
If most people are not willing to see the difficulty, this is mainly because, consciously or unconsciously, they assume that it will be they who will settle these questions for the others, and because they are convinced of their own capacity to do this. -Friedrich August von Hayek
*****************/
March 6, 2005 at 9:17 am
Excellent points to chew on. What we are putting together is a hybrid of a content management system, a document management system and a photo gallery. There are small files and large files within the same application domain.
I never likened what we are doing to a document management system per se (thanks rudy komacsar) but it makes sense.
My real issue is I know I can handle the capacity with 1-2 file servers if I go that route. I want to store in the database, but if it is going to cost me 5-10 SQL Server licenses (rather than 1-2 using the file server), then I need to decide if I'm going to spend money on DB software or development labor. If we can get it done with a couple of server licenses, the decision is much easier.
As I see it, BLOB performance is going to be probably the most critical unknown in this decision.
Thanks for everyone's thoughts to date.
Viewing 5 posts - 1 through 4 (of 4 total)
You must be logged in to reply to this topic. Login to reply