SPEED. BLOBS in DB vs. Files in the file system

  • Ok I have been experimenting the other day to randomly retrieve files from the varbinary cell vs the same binary from the file system. I had 100k blobs and 100k files created from the same blobs on a file system spread over 1000 folders. The same machine was used with raid 0. First I executed 10 minutes of DB retrieval at random, then file system retrieval, also at random. To my dismay file system fared much slower that is to say that in the same 10 minutes I was able to fetch far fewer files from the file system.

    I need to feed files (blobs) as varbinary to the application that requests them. So the way I had files on the file system is to have a CLR function that returns varbinary.

    Are there any tricks to make file system BLOBs strategy work on par with varbinary in terms of speed?

  • So, let me be sure I understand this.

    You wrote a function (in a .NET language) to read the contents of a file from disk. You then added that function to SQL Server as a CLR function. Then you call that SQL Server function remotely from an application you wrote.

    You find that slower than storing the same data in a VARBINARY column (you didn't say if it was MAX or not), and retrieving it using a SELECT.

    And, to summarise further, you are surprised by this.

    Is that a fair assessment of the situation?

    😉

  • Paul White (11/19/2009)


    So, let me be sure I understand this.

    ...

    yes your understanding is absolutely correct. Basically I have

    SELECT myfunc(..) as fileBLOB

    vs.

    SELECT fileBLOB from mytable where fileID=...

    I made it so I dont have to redo underlying app too much. Anyway both return varbinary(MAX) except the former has no DB operations at all. It picks the indexed file straight from the file system while the latter has to rummage through the Db and exctract the blob from the cell.

    I would welcome any suggestions on how to feed the file from the file system back to the app more efficiently. Or can the file system even compete in speed of retrieval to the DB all else being equal?

  • What is your app structure? The way that most people have done this < SS2K8, is to feed the "path" of the file back to the application. The path being through the IIS server or a UNC path. The application then picks up the file as needed.

    In SS2K8, FILESTREAM was added to make this more efficient and allow the server to stream out of the file system.

  • Steve Jones - Editor (11/19/2009)


    What is your app structure? The way that most people have done this < SS2K8, is to feed the "path" of the file back to the application. The path being through the IIS server or a UNC path. The application then picks up the file as needed.

    In SS2K8, FILESTREAM was added to make this more efficient and allow the server to stream out of the file system.

    Right now app is accessing SQL Server over the Internet, so feeding the path will require IIS support and reading from the URL (which is even slower than SQL server reading local FS and passing it as varbinary described above).

    Or am I worrying too much about reading blobs from the DB? I was thinking about breaking the DB into many filegroups to make its growth more manageable and call it a day. I am simply not sure at this stage which approach (DB blobs vs file system) is more viable in a long term to store files.

    I dont want my DB size to run away on me and I get stuck with a service I cant manage. FS approach feels more familiar but I never dealt with large DBs before that is the source of my concern. 70G now and counting (adding 5G a month).

  • 5GB/month, 60GB a year, not that much.

    Do you mean you have client apps connecting to SQL Server across the Internet? Nothing in the way to secure it? I assume a firewall, but if you aren't limiting by IP, you are creating a security risk here.

    IIS serves files incredibly quickly. I'm not sure why you would see this as slow, but pulling an image off IIS tends to work well, especially if it's cached.

    If you are doing lots of 100KB files, why not make a separate database for these? Easy to back up and control in case you need to scale this out. And it will work well for you in terms of just reading the varbinary.

  • I just want to add, for the record, that if your files are 1MB or more on average, consider a 2008 upgrade for the FILESTREAM stuff. Otherwise, and especially if the files are relatively small (say <= 256KB) I would probably stick with VARBINARY. Passing that much data through hosted CLR is always just going to add overhead.

    Paul

  • I'll just speak up for the contrary argument (most DBAs are in favor of advancing the "hand off the path" method and keeping the blobs out of the database). I have a strong suspicision that most have never tried it, but that's another story.

    For over 8 years, starting in SQL 7, we have been storing blobs, some rather large in the DB. We did it for reasons similar to yours, the client apps are located at a distance over the internet, and we like the ability to:

    get the data in one trip instead of two (very key over the internet at any distance)

    mirror or replicate the db including the blobs

    simplify our maintenance and disaster recovery

    synchronize copies of the blob at the client and at the central server

    retain the ability to make any disk or directory changes we feel like at the center of the trading hub without having to concern ourselves with mass editing of a file name table.

    It works great, we keep them in separate file groups. We don't have any exotic functions at the server, the client simply pulls them out of ADO recordsets.


    Student of SQL and Golf, Master of Neither

  • Bob, I think it depends on your app. Here we've done both. Originally dropping them in the file system since we have all this web code and we can easily pull them, have them cached on IIS, etc.

    However we also (now) have a number of the images served here coming out of the DB now for some things. I personally prefer them in the DB if it's not a web app. It definitely makes things easier.

  • Google for the paper "To blob or not to blob", it contains information from Microsoft performance tests on storing large objects in the database or file system.

    If speed is your only issue, then objects smaller than about 1MB perform faster when stored in the DB than in the file system, using the tested configuration.

    This 1MB figure is likely to change over time, and I would expect the threshold value to increase in size in the future just as it has increased in the past. The 1MB figure also applies only to SQL Server, and will be different for Oracle or DB2, etc.

    Your application may have other issues than just retrieval speed to consider, which could easily change the threshold for when you want to store objects in the DB or in the file system.

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 10 posts - 1 through 9 (of 9 total)

You must be logged in to reply to this topic. Login to reply