December 23, 2002 at 2:31 pm
I have recently been working with some co-workers on a system that will store files that our users will be uploading into our web site. Right now with very little effort we can store the files as BLOB’s in a database table. We have done so with other items and we have not seen problems with this (20,000 records with an average BLOB size of 50k). The only real issue we had was with administration of the database (i.e. replication from our co-location rack). I was wondering if anyone has seen any performance issues with databases that the contain BLOB records.
December 23, 2002 at 2:45 pm
BLOB types are slow in general and by nature. BLOB data is stored separately from data pages, only a pointer is stored on the data page.
It is a good idea to find ways around BLOB's. When dealing with files, it is always better to store only file locations in database (name and path) but keep files outside of the DB. Really, I do not see a single disadvantage of this approach. It will make your life easier and give you time to spend on more amuzing activities.
December 23, 2002 at 4:05 pm
If this is the case then what process do you propose to use to keep a database table in sync with a file directory? I ask because on the surface it seems like an easy thing to do, but in practice you could get into a condition where both are out of sync. And at that point there is not very much more then you can do run a (more then likely very time & cpu consuming) synchronization. Also keeping restrictions on file sizes and maintaining file permissions correctly might become very hard. I was mostly asking weather or not anyone has experienced any weird performance issues with databases that are about 1gig in size.
Edited by - michaelp on 12/23/2002 4:09:24 PM
December 23, 2002 at 4:44 pm
I use blobs some, no issues. We actually use them here on the site to store all the content, then we write the asp with the content out to disk. Cuts server load, yet we can regenerate our content at any time. Makes backup/restore/disaster recovery decent to handle.
Keeping stuff in the file system can get out of hand. My friend Jon has some articles on the site where he discusses full text. At one point he had more than 100k documents in a folder, said if you clicked on it by mistake in Explorer it would take an hour or more to load.
Andy
December 23, 2002 at 5:26 pm
Well design your file system so that you do not have 100K in one folder; use subfolders instead.
1 GIG database is not that big. A warehouse I worked on was 900 GIG. Proper handling of a Very Large DataBase (VLDB) is a complicated issue where hardware, physical and logical design is involved.
If you need to keep database content and file system in sync, do not allow users access rights to the folders nor to the tables in DB. Make it all going through your interface only (stored procedures). Then it all in your hands... Sorry if it did not help.
December 23, 2002 at 7:40 pm
I'll grant that sub folders make sense - except that you then have to spend more time coming up with a system of distributing the files. One folder per first letter? x MB per folder? Doable. Not sure which way would end up being less work, file system or db.
We kinda have that on the site, separate folders per columnist. It's practical for us, we don't have to worry about file name collisions between columnists.
Andy
December 24, 2002 at 9:27 am
We do the files on the file system for our Intranet, but that's because we keep the files on the web server. We take the IDENTITY value generated by SQL Server and create a folder on the web server matching that number whenever anyone creates a folder or subfolder under our documents/forms. The root path is handled within the ASP code.
The database holds the filename along with the folder id. The ASP page is able to put these together to determine where the file is. The only thing a user has to remember is to not upload a file with the identical name as one already in the same folder.
K. Brian Kelley
http://www.truthsolutions.com/
Author: Start to Finish Guide to SQL Server Performance Monitoring
http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1
K. Brian Kelley
@kbriankelley
December 24, 2002 at 12:55 pm
This was the way I would handle it if we are forced to use this on our production site (Although you said this was on your INTRANET??). There are a bunch of issues my IT person brings up associated with storing files on a NAS device other then the obvious above mentioned. He was worried about performance with shares and because we are dealing with secure data so we have to have an in-between layer. We just can’t create a virtual web share to the NAS device it self otherwise we risk having the site scanned by hackers. Also there is an issue about virus protection of the files themselves.
December 24, 2002 at 2:17 pm
When building our intranet, there were a couple of requirements that drove the design:
1) Users should be able to create folders and subfolders by using a web form.
2) Users should be able to post files, also through a web form.
3) Users would access the files through the web interface, not via shares or other means.
4) There was no NAS or SAN available in the environment (this has changed, but the files still go to the web server because it has plenty of space).
We ended up having users post files and used ASPUpload. Users would enter the folder, then choose to add a file. They'd give the name and description they wanted displayed on the web site (name didn't have to match the file name) and then browse to the file and add it using the standard <input type="file"> method. When the user clicks submit, the file gets added, the name, description, user posting, and actual filename get extracted and stored in the database. The file then gets moved from a holding folder for uploads to the proper folder based on the ID # of the folder (from the database).
When a user visits the respective folder, the ASP code pulls the information from the database and the user is presented with a simple link to get the file.
K. Brian Kelley
http://www.truthsolutions.com/
Author: Start to Finish Guide to SQL Server Performance Monitoring
http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1
K. Brian Kelley
@kbriankelley
December 26, 2002 at 9:40 am
i use a clipart management app that i developed own my own that stores files and images as BLOBs in a MS SQL Server db. for simple retrieval you cant beat the filesystem (as the db also relies on the filesystem) but with lots of files with lots of attributes that youll need to search with, using BLOBs should be a serious consideration.
the database is now around 3GB in size with the files and thumbnails stored compressed (by the client app) as BLOBs. that represents around 200,000 files from small (5kb or so) to large (5MB and larger).
one big advantage of using BLOBs is making the storage more efficient by eliminating cluster slack. the files in the DB takes up around 5GB in size uncompressed but if stored in a 16KB cluster sized FAT32 partition will take up around 8GB in size (as a lot of the files are small 5K or less in size). storing it in a 4K cluster sized NTFS system saves some space but not much. the BLOB approach still stores the files most efficiently.
also, using the filesystem directly makes the links dependent on the physical location of the files. this makes using a different server name (for stored UNC links) or a different local drive (for drive letter references) problematic. plus local drive references are not immediately usable over a network.
these together with the custom compression/decompression by the client app makes the system i described way more advantageous for us than using the filesystem directly.
December 26, 2002 at 10:29 am
I think I agree with mromm. One of my previous jobs was dealing with documents uploaded by end users, converted to adobe pdf, and stored on the system. The end system after it was finished stored the server name, filenames and unc paths in the database as pointers based on business logic to name the subdirectories. There just simply wasn't a way from a performance viewpoint to deal with them as effectivly. We tried many many different scenario's and profiled them to come to this conclusion. By allowing the filesystem to handle the files, the database was not tied up with that overhead. The files were served by a secured file server, performed by an ASP page from the web server, and never interacted with the database at all after getting the path or storing the path info. We created a job that run on a schedule that verified the files by checking the paths to handle integrity issues. We also has similar requirements to bkelley, and approached it likewise.
And although I agree with omen about the efficiency (from a space perspective) of storing the data as blobs, I don't believe that the hit to overall performance of the database is worth it. Consider a single 1 MB file. In the scenario of loading the file to a file system and updating the DB with path and filename, you have a write to a row of some character data and the file system would take the file. I would expect that to be very quick. Under the scenario of a blob, you now wait for it to be stored as a blob (remember it was 1 MB). I would expect this to take much longer. This ties up system resources in the DB for the length of the download as apposed to the write of a pointer seperate from the download. Another consideration is the blob handling itself and the resources used for that. etc..... By the time the user uploads the file, and turns around to view it to make sure it got there (common scenario), the database has a minimum of 2 MB of data transferred, plus the transaction overhead, plus the blob conversion, as compared to a write of varchar data, and a read of the same data with file system overheads being performed by a different server. Of course, if its all on the same server to start with, these points become academic, but even in that case, I would still give a lot of consideration to them, as someday, someone will start talking about breaking out the database to improve performance and giving it a dedicated machine to run on.
December 26, 2002 at 12:11 pm
One other reason we went the direction we did was that the files are also within the Index Server catalogs. We have mixed content, include some pure HTML pages that have been built. Keeping things simple, everything is under Index Server so a user can see both static content that has been created with FrontPage and the "dynamic" content through the main intranet site.
K. Brian Kelley
http://www.truthsolutions.com/
Author: Start to Finish Guide to SQL Server Performance Monitoring
http://www.netimpress.com/shop/product.asp?ProductID=NI-SQL1
K. Brian Kelley
@kbriankelley
December 27, 2002 at 11:33 pm
the decision to use the filesystem directly or use BLOBs is always a trade-off. the filesystem excels at one thing: storing files. if the system you will be developing uses mostly large files that need to be accessed by a large suite of commercial apps and mostly one file at a time, using the filesystem is probably the best way as it makes the files automatically available for manipulation and management by other apps.
if you use mostly small files needing to be retrieved quite often and in sizeable batches and you don't generally need it to be accessible to apps outside of your development control, then using BLOBs is simpler, easier to administer and manage, easier to program (well depending on your development tool), and probably even faster than using the filesystem.
December 30, 2002 at 7:02 am
Back to the original question. We have a database (55GB) that is primarly one table that our custom CAD package uses to store its data. The data is stored in BLOBs and we have not had any weird problems. We have about 200 users on the box at any given time.
December 30, 2002 at 12:15 pm
OK people, we will be putting a cap on the size of the media and only allowing a certain number of items to be uploaded. We will be putting a cap on the file size of about 500k per-item and about 30 items per-user with about 20000 users. Does it sound BAD to store these in the database as BLOB’s. Simple Yes/No will work. If the majority of you say no them I will not go this way.
Viewing 15 posts - 1 through 15 (of 20 total)
You must be logged in to reply to this topic. Login to reply