Large binary data set - database or file system?

  • I have a well-structured but also very large binary data-set that is generated by a C++ application every five minutes. The data needs to be accessed by SQL applications. Since data is generated every five minutes, performance is key, both for write and read. The data set is about 500MB.

    If data is written to the file system, the write performance doesn't involve SQL server. For reading it, I have a CLR to read the portions of the data that I need based on offset and length. That works and is very fast. The problem is that data is stored in the file system, so it is not self-contained within the database.

    A second option that I haven't explored yet, is to write the data into a table as VARBINARY(MAX). I would read the data using SUBSTRING with appropriate offset and length.However, I am afraid the performance will be quite slow, both for writing and reading.

    I was wondering if anyone has had experience with performance of SQL write/read of binary data of this size, and whether there is a third option I haven't thought off. I'm using SQL Server 2014.

    Thanks

  • If it's "well-structured", that might suggest that loading it into an appropriate table design via SSIS might make sense, and depending on what testing bears out, you might even script any needed INDEX creation as something to occur AFTER the initial INSERT so as not to slow that down. As you haven't provided any details, it's impossible to know if there are any data-related "gotcha" scenarios to worry about. I'm at a loss as to why the data needs to remain in binary form...

    Steve (aka sgmunson) 🙂 🙂 🙂
    Rent Servers for Income (picks and shovels strategy)

  • Thanks for the response. The data is well structure. For a while we've been doing exactly what you suggested, storing the data in a table and creating the indexes after the data is loaded. The data was persistent for a short time, so we only could keep different data uploads in different tables. However, our customer now requires the data to be accessible for quite some time, and we tried creating partitioned tables to store the data - write to a temporary table, create index, and switch the data into the partitioned table.

    The performance however, is still not what is needed. I did try writing the data to the file system and the performance for write is many times faster since it is just a stream of bytes and no indexes are created. Accessing the data via CLR is also significantly faster than reading from database. The key is whether we could store the data in database as a VARBINARY(MAX) with similar performance as in the file system, perhaps even reading the data with a C# CLR.

  • Since it appears you are using SQL Server 2008 based on the forum you posted your question, I would suggest looking into FILESTREAM, https://msdn.microsoft.com/en-us/gg471497.aspx and https://technet.microsoft.com/en-us/library/bb933993(v=sql.105).aspx.

  • I'm actually using SQL 2014. Anyhow, thanks for the advice. I'll look into that as I never used FILESTREAM.

  • N_Muller (6/3/2015)


    I'm actually using SQL 2014. Anyhow, thanks for the advice. I'll look into that as I never used FILESTREAM.

    Check out FileTables as well. They were introduced in SQL Server 2012.

    https://msdn.microsoft.com/en-us/ff929144.aspx

Viewing 6 posts - 1 through 5 (of 5 total)

You must be logged in to reply to this topic. Login to reply