Splitting up a large database

  • Our production-level database that sits behind our company's main web application is currently around 55 GB. The majority of the size is due to the storing of images inside several tables. I would like to be able to split those images off somehow, so backups/restores of everything but the images won't take so long. I have a few ideas:

    1) Put the images in their own DB, so if we ever need to restore from backups, we can have the majority of our application up and running quickly; we can just temporarily shut off all functionality related to images until the images DB is fully restored. the only problem I see with this solution, is the ability to retain referential integrity from the 'non-image DBs' to the 'image-DBs'

    2) Horizontally partition all table containing images, so we can put all the images in their own filegroup. This would allow us to do separate backups of the filegroups, which means backing up everything but the images would take less time, but it still wouldn't help with our restore problem. If data corruption ever occured, and we needed to restore from a backup, we'd still need to wait until both DBs are fully restored before we could allow people to connect to our application again.

    Does anyone have any recommendations, info, input, etc.?

    Thanks in advance!

  • A third option would be to speed up the backups and restores, either through third-party compression or by striping the backups over multiple drives; it has the added benefit of no code or db changes. I would investigate a combination of the above and option 2.

  • Thanks for the reply. However, I don't see how striping the backups across drives will help. Currently, space is not an issue for us. Also, due to the amount of images in our DB, I'm not sure third-party tools will be much help. We tested Lite-Speed, and we only got 12% compression, and the restores still took a long time. If we got those images out of there, though...

  • Heh... Lots of folks scoff at it but I prefer to save just the file names of images in a database. The images are stored as 1 file per image. What's that buy you? A couple of things... the SQL Server Backups are certainly smaller, for one. The other thing it buys you is the "Archive" bit... only those images that are changed will be backed up by the backup system that does the image directory unless you tell it to do the whole directory. Think of it as an incremental backup for the images.

    The bad part is that it may require separate trip back to the server but it's not quite so painful if those images are stored on the Web Server instead of the SQL Server.

    Of course, you could just put the buggers in a separate database on the same server and get most of the backup benefits you seek.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • I'm with Jeff if you can make this happen. The other thing that happens is a web server, or client browser, can cache images, resulting in better performance. That's if you can make your application work.

    Are you enterprise edition? if not, I don't think the filegroup thing works. My recollection is that online restore of the filegroup is an EE feature only.

    I like a separate database, and what I'd suggest is regular jobs to look for disconnects between the databases. I'd set this up with the idea that eventually the two databases would be on separate servers, so plan for that in your code.

  • Yeah...I think the separate DB thing is probably gonna be the way we have to go. I'm 99% sure that they're not gonna go for taking the images out of the DB and storing themin the file system - although, I do plan to propose that as a solution anyway.

    Basically what we have is a bunch of tables that hold data about people, and one of those tables has images related to those people and contains those people's ids as a foreign key. We want to maintain that relationship between the people and the images.

    Would the best way to do that between separate databases be with triggers? And then, maybe have some type of cleanup job like was mentioned in the earlier post?

  • Yeah... unfortunately, triggers are the only way to do any type of DRI between databases. Make sure that they're not RBAR and that they're well written (ie: performant) and you should be OK.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Clint (7/23/2009)


    Thanks for the reply. However, I don't see how striping the backups across drives will help. Currently, space is not an issue for us. Also, due to the amount of images in our DB, I'm not sure third-party tools will be much help. We tested Lite-Speed, and we only got 12% compression, and the restores still took a long time. If we got those images out of there, though...

    Striping works because the majority of the time spent on a backup is in writing to disk, and if you can double your IO throughput then you will (almost) half the time taken. By splitting the backup across multiple drives you increase the throughput subject to any IO controller limitations. As I mentioned, this can all be done by making small changes to your backup/restore scripts - and totally eliminates any code or database changes that every other solution is going to require.

    As for compression, it depends entirely on the data - if your images are already packed or not conducive to compression then you won't see much benefit. I mentioned it because if the data can be compressed effectively then the file size is smaller and therefore takes less time to write - typically there is more cpu time required, but the lower IO costs dwarf this. I don't have good information on how the different compression/backup products compare but I'd guess that they are similar, so this probably isn't an option for you.

  • Thanks to everyone for the replies. Some good info.

Viewing 9 posts - 1 through 8 (of 8 total)

You must be logged in to reply to this topic. Login to reply