Ideas on Archiving

  • All -

    I've been working on coming up with a solution on archiving a VLDB (currently 1.6 TB) where most of the size that we are dealing with is in a few tables that contain IMAGE fields. The requirements are not much, but we have to keep 7 years accessible at all times. By doing this, we are looking at a DB in the range of 25 - 30 TB'S of size (of which 98% of that will be images). I've gone through a few scenario's to isolate the images but I'm curious to know if anyone has been faced with a similar task and what they had done to solve the issue.

    At this point, I'm open to any idea.

  • Have you looked at FILESTREAM?

    http://msdn.microsoft.com/en-us/library/bb895234%28v=sql.105%29.aspx

    FILESTREAM filegroups can stored on compressed volumes.

  • Yes, and we actually had someone from MS come out to configure the FILESTREAM options on the server and they were not successfully able to get it configured and working properly so management backed off.

  • Compression ratios achieved are completely dependent on data composition, i.e. if the data in your IMAGE columns are jpeg images, or some other highly-variable binary data, chances are you will receive low compression ratios, my guess is ~10%...far less than standard text-based data. To that end 10% of 30TB is significant but is it a fair tradeoff for the CPU cost to apply that compression? Only you can decide.

    I say use whatever method is easiest to transition away from the IMAGE datatype, at first. At the very least forget compression and just get your IMAGE data converted to VARBINARY(MAX) columns before the DB is too large and that ceases to be a feasible option, if it is hasn't already. If you really want to pursue FILESTREAM and compression enlist a(nother) FILESTREAM-expert.

    There are no special teachers of virtue, because virtue is taught by the whole community.
    --Plato

  • I'd be curious why they couldn't get Filestream to work. Was it something with the application changes? The coding is a little tricky.

    You could get away from Filestream and go back to storing the images in the file system and a path in the db, then let people retrieve the images with SMB (file explorer) access. You can compress the volumes then, potentially even offloading the image data to another server.

    The flip side is that you have to manage security, administration and backup/restore of the files yourself. That's what Filestream is there to avoid.

    Compression can vary, but I'd grab a random sample, maybe 5% of the images, drop them in a zip engine and see what compression you get.

    If you have control of the application, you might even consider compressing the images before they go in the DB, less backup/restore/etc to manage, and have the application do the decompression after the client gets the data.

  • I was actually looking at converting these over to VARBINARY(MAX) since I've seen some deprecation warnings about IMAGE/TEXT/NTEXT/ect. column types. My question on that is when they insert the image to the DB, is there any conversion that needs to be done on the application side to prep the incoming data stream to ready it for the VARBINARY column?

    AS far as why it wouldn't work, I am not sure. I wasn't present at that time, but from what I was told was that when they were trying to get the record to insert into the DB from the application, it was inserting 'corrupted data' and it could not fetch the record once it was stored into the DB. They did everything from resintalling SQL Server to changing the code and just couldn't get it working.

  • Is the concern largely the cost of the storage? If its an archive system infrequently accessed perhaps performance is not a great concern in which case you get the cheapest (slowest) storage you can get. I believe SQL server even supports NAS.

    I am not endorsing that last statement but since you said you are open to any ideas I'll put it out there.

    Here's another thought. If permissible, reduce the resolution of you images.

  • No change to move from image to varbinary(max). You could mess with the text in row properties to better handle the storage of the data, but no change for the field.

    Weird on the FS stuff. The coding is a little tricky to work with file handles and the Win32 stuff, but I wouldn't expect corrupt data. My guess is someone didn't really encode the data and was trying to send it as text somehow. Not a FS issue, AFAIK, but a programming problem. Inserting into FS shouldn't be harder than an image column.

Viewing 8 posts - 1 through 7 (of 7 total)

You must be logged in to reply to this topic. Login to reply