August 25, 2010 at 9:41 am
Hi all,
We currently are developing a system that will store around 2 million documents. The initial thought by the previous DBA before me was to use FileStream to do this. However, the more I'm researching and looking into this, FileStream may not be the best choice for us. Of the 2.2 million files, 94% are under 256k, which according to MS wil perform the best when stored in a varbinary(max) field vs on the filesystem via FileStream. There are however a few large files, 151 larger than 100mb to be exact. The largest is 693MB. I'm looking for some opinions on what the best route to go would be here. My gut tells me to store it all in the DB and forget FileStream, however dealing with files as large as 693MB, although rare, is concerning to me. I'm also not exactly sure how this would work when we are inserting into the varbinary(max) field - would it have to store the entire 693MB file in memory as it is inserting it into the database? Any opinions would be greatly appreciated..let me know if you would like some more information
August 25, 2010 at 10:02 am
I would look at using both.
ie Filestream for files greater than 1MB and varbinary(MAX) for the rest.
I would also put the table with the varbinary(MAX) column into it's own file/filegroup.
August 25, 2010 at 10:19 am
The catch to using both is going to be the need to track the file size and somehow let the application know where to store or get each file based on that.
My question is how often these files will be accessed. If the documents/files are in the system for storage and indexing purposes, but will not be accessed often then you could store the files using either method. If they are accessed often then the hybrid approach may be best. Accessing large files via FILESTREAM is supposed to be more efficient and faster than storing them as varbinary(MAX).
August 25, 2010 at 10:22 am
The files will be accessed often - this application is basically a document repository. I'm beginning to think a hybrid apporach would be best as well.
August 25, 2010 at 10:31 am
Do they need to be stored in the database at all? Do you require that level of referential integrity for binary files?
If not, then I'd probably approach this by putting file locations in the database and just using the application layer to move them about on storage. Then you can also distribute the storage across multiple servers more easily.
Neither Filestream nor varbinary(max) are ideal for your situation - 700mb files is not ideal for varbinary(max). A combination of the two may be a better approach if you are going down this route.
August 25, 2010 at 10:34 am
Thanks for the reply. However, doesn't storing blobs under 256k in the database perform better than storing them on the filesystem as you're suggesting?
August 25, 2010 at 10:35 am
Also, these files will be continously checked out and edited - similar to a SharePoint repository.
August 25, 2010 at 10:38 am
I went with the storing of file paths in the database and then using the application to read the path and open the file. This was mostly because I needed document management in SQL 7.0 and before we had storage datatypes that were easy to use.
Now I wish I had been able to place all references to the file, including the file itself, directly into the database. It would have made many application development issues much easier. Especially in the attachement and management of the original file. We had some struggles keeping track of the original file and the copy placed in the repository folders. Especially when files had the same name.
Placing the files in the database would eliminate many of those issues.
However, like pointed out, it does provide a good measure of flexibility.
August 25, 2010 at 10:40 am
Another question raised by the frequent editing is whether you need to track the file versions as changes are made.
August 25, 2010 at 10:42 am
File versions are tracked, yes.
August 25, 2010 at 2:44 pm
With that being the case, I am not sure which is best. Now you are down to which is easiest to manage and how you are going to version the files.
If the files are stored in the database, then all versions of the same file can have the same name if you prefer. Storing them on the filesystem necessitates some kind of file naming system.
Viewing 11 posts - 1 through 10 (of 10 total)
You must be logged in to reply to this topic. Login to reply