Quickly Copy Data

  • lptech (6/6/2013)


    Looks like we need to get introduce deduplication to the DBMS world. In the virtual world, VMWare and Hyper V only keep one copy of the same block in memory, and when doing backups for that matter. Granted, we have transaction log and differential backups, but deduplication for database backups should be built into the backup subsystem. So if only 1% of the database blocks have changed, the backup command should only back up those blocks. During restores, ALL of the blocks should be retrieved without DBA intervention.

    Backupand restore is an area where SQL Server isn't even where DB2 was in V3, back in the early 1990's.

    I would agree. Some of the de-dupe products don't work well over time with SQL Server, since those original blocks have to be maintained over time and assembled with the changed ones.

    However, I'd think the backup process could be streamlined and at the same time smart enough to produce some "changed blocks" from a base backup, excluding indexes. If we throw those out, because they are re-org'd, then I think lots of data rarely changes.

    The thing is, we'd need to rewrite those blocks the same instead of just streaming out pages. After index movement of data, including the clustered indexes, the blocks are not necessarily in the same spot each day when they are streamed out from the datbaase.

  • 1) Compression Tools - the free tools do such a great job, why bother paying for something? 7zip works great and I usually get a SQL backup down to 5-10% of the original size. If you want to pay for proprietary tools, have at it - but consider if you are actually getting a better tool. I'm not aware of any magic tool that gives you significantly better compression just because you pay for it (in fact, I (very unscientifically) seemed to get better results with 7zip than with winzip.

    2) I was also going to suggest rsync. There is a product called Syncrify Backup that incorporates rsync packaged up with a backup client and backup web server package, very reasonably priced. Works great. It will also compress the network traffic, so you don't need to compress beforehand (and probably works better uncompressed, since the SQL backup is likely to have fewer changed blocks of data that way - depends on specific workloads for your database of course). Allows for versioning on the backup server as well, so you could potentially restore a backup from days or weeks ago (and only uses up space for the changed blocks, so you don't need full copies of the database for each version).

  • Being a resourced (money) constrained part-time DB admin, I use free tools to transfer my backups.

    I compress with gzip using the rsyncable flag, then transfer with rsync. For those wanting to use rsync, and are storing compressed data, make sure the compression is rsync compatible. Plenty of information on this is out there.

  • To copy a backup of a large database over a WAN, I recommend the following:

    1. Make a compressed backup of the database to multiple output files. Example: Make 20x5 GB files, instead of 1x100 GB file. The compression and multiple output file options are available with the BACKUP command.

    2. Setup multiple streams of file copies so that you are copying multiple files at a time, say 5 in parallel. This will help to make sure that the WAN pipe stays full.

    3. If possible, make sure that the source and target machines for the file copy are running Server Message Block (SMB) 2.0 or higher protocol (Windows Server 2008 and above, Windows Vista, Windows 7, Windows 8) . There were big enhancements to this protocol that dramatically improve performance on high latency links (WAN connections).

  • It comes down to doing analysis. Too often we get into the rut of doing things exactly the same way every time.

    When faced with a problem that seems overwhelming, throw out all assumptions. Everything. Then ANALYZE the problem. Determine the facts, restraints, and make sure they are real and not imagined. Then start thinking about how to fix it. Take things a step at a time.

    A good example occurred to me today. Someone had a complex spreadsheet that had data elements used to make a decision. Instead of helping them write a formula to do what they wanted, I wrote a formula to do a portion of what they wanted. Then another, and another, and another. Once those were done and tested, I explained that they could combine them all into one formula (good luck!) or simply write one more that based the answer on the ones we wrote. All I did was break the work down into manageable pieces.

    What seemed to be impossible (to them) seemed so obvious once I showed them the methodology I use. I didn't do anything special, and nothing they could not have done themselves. The difficulty they had was they were focused entirely on the big picture of AZ columns and 7,000 rows. When they explained each piece to me one at a time, it was easy to do.

    Dave

  • Sure, but when you've encountered similar enough problems to solve enough times - you can often skip the analysis and use a "good enough" solution that works well enough in general cases. Rather than spend time treating every issue as brand new and starting from scratch. Then if the standard solution (e.g. compress and copy) doesn't meet the requirements (too slow for database size or whatever), then you can regroup and perform the analysis from scratch. Otherwise you are wasting time and resources optimizing a solution that isn't really necessary.

  • jason 63816 (6/6/2013)


    Sure, but when you've encountered similar enough problems to solve enough times - you can often skip the analysis and use a "good enough" solution that works well enough in general cases. Rather than spend time treating every issue as brand new and starting from scratch. Then if the standard solution (e.g. compress and copy) doesn't meet the requirements (too slow for database size or whatever), then you can regroup and perform the analysis from scratch. Otherwise you are wasting time and resources optimizing a solution that isn't really necessary.

    Yes, but I didn't think it was necessary to say that. The point I got out of Steve's post was an issue where "normal" wasn't working.

    I am a believer in automating everything I can. I then design and document steps to handle things that can't be automated, but can be handled similarly every time. When we see something where our normal solutions don't work, and we have "tried everything" but can't find a solution, obviously we need to step back and rethink things. I believe the majority of the requests for help on Internet sites fall into that category. Usually the person is stuck in that rut, and has forgot to climb out and think.

    One complaint I have in line with what you posted is classes on SQL Server where the instructor refuses to give an answer to a basic, fits everywhere, backup job. They always seem to say every situation is different. Well of course, but why can't they start with the basics? It could be the classes I went to had poor instructors, but the example could be modified for a lot of things. There are times when a general solution works, and times when it doesn't. When it doesn't, we need to analyze. Sometimes even if it does work, we still need to analyze due to performance issues.

    Dave

  • I think this falls under not normal.

    I have had to resolve this for a number of DB solutions and reasons.

    Ever since SQL 2005 was new I used publishing and replcation to resolve it.

    Maybe I had to do some schema or add some manual scripts to update all DB tables this way, but once started with the right settings it will solve all the remote copy/backup issues. It even Auto recovers gracefully after any part of it is offline for long periods.

    You just set up the initial copy at the remote site, set up replication, and then run the backup jobs and any other access to the data at the remote site.

    These types of Transaction Replication updated copies also have another advantage to reducing the size and bandwitdth used when backing up a database. They do not need all the Non-Clustered Indexes that the original version has. You have no idea how much backup storage can be saved for a VLDB if you move the backup and restoration of NC Indexes to scripts. I know this sounds like you are adding time to a DR restore, but in practice it does not. I have yet to see the DR restore plan that did not need a Rebuild of all Indexes after the Database was restored from baclup in the new environment.

    Using Replication to publish transaction to a DB copy used only for Backup, DR, and maybe reporting or analysis might sound like a lot of moving parts. Trust me when I tell you it makes these things so easy it amazes me every day how much of my life is made easier by using replication over Log Shipping or DB Mirroring.

    :smooooth:

  • krowley (6/6/2013)


    Put the files on a USB drive and drop it in the mail or get in your car and drive them to the other location. :hehe:

    +1000 to that.

    I worked for a company that was trying to move the data on their servers from Michigan to their "new" data center in New York. The key was that they had to be able to do it in one weekend because they couldn't afford to be down for more than that. They failed for 2 years even when they tried compression.

    After the systems were down, I made a tape backup and FedEx'd it. We had everything back up and online with nearly half a day to spare.

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Does this count as using compression?

    using rar instead of xcopy to speed up the network transfer?

    Once we had to move an entire JDE 8.X database with 12 years of data in one night.

    We were able to be down during this period and we had a WAN connection to the new system. We mapped a drive from the old server to the new one and used command line rar to copy the unattached Database files (no tran logs) to the target location on the new server.

    Once all the files had been copied and where in place on the new server a SQL agent Job was executed than ran a series of scripts to restored the databases and repaired any login or dbo issues.

    This replaced a SQL server backup and restore process that moved the files via FTP and was 20 hours faster.

  • PHYData DBA (6/7/2013)


    Does this count as using compression?

    using rar instead of xcopy to speed up the network transfer?

    Once we had to move an entire JDE 8.X database with 12 years of data in one night.

    We were able to be down during this period and we had a WAN connection to the new system. We mapped a drive from the old server to the new one and used command line rar to copy the unattached Database files (no tran logs) to the target location on the new server.

    Once all the files had been copied and where in place on the new server a SQL agent Job was executed than ran a series of scripts to restored the databases and repaired any login or dbo issues.

    This replaced a SQL server backup and restore process that moved the files via FTP and was 20 hours faster.

    I've not worked with RAR... not sure why... just haven't so pardon my ignorance. Does RAR handle files of more than 2GB in size after the final compression?

    --Jeff Moden


    RBAR is pronounced "ree-bar" and is a "Modenism" for Row-By-Agonizing-Row.
    First step towards the paradigm shift of writing Set Based code:
    ________Stop thinking about what you want to do to a ROW... think, instead, of what you want to do to a COLUMN.

    Change is inevitable... Change for the better is not.


    Helpful Links:
    How to post code problems
    How to Post Performance Problems
    Create a Tally Function (fnTally)

  • Jeff Moden (6/7/2013)


    PHYData DBA (6/7/2013)


    Does this count as using compression?

    using rar instead of xcopy to speed up the network transfer?

    Once we had to move an entire JDE 8.X database with 12 years of data in one night.

    We were able to be down during this period and we had a WAN connection to the new system. We mapped a drive from the old server to the new one and used command line rar to copy the unattached Database files (no tran logs) to the target location on the new server.

    Once all the files had been copied and where in place on the new server a SQL agent Job was executed than ran a series of scripts to restored the databases and repaired any login or dbo issues.

    This replaced a SQL server backup and restore process that moved the files via FTP and was 20 hours faster.

    I've not worked with RAR... not sure why... just haven't so pardon my ignorance. Does RAR handle files of more than 2GB in size after the final compression?

    Yes, RAR will do that (along with it's GUI companion, WinRAR).

    RAR/WinRAR also supports high security encryption (AES-128) of compressed archives, something that you should probably do before you copy them to a USB drive and drop them in an overnight express envelope.

  • Michael Valentine Jones (6/7/2013)


    Jeff Moden (6/7/2013)


    PHYData DBA (6/7/2013)


    Does this count as using compression?

    using rar instead of xcopy to speed up the network transfer?

    Once we had to move an entire JDE 8.X database with 12 years of data in one night.

    We were able to be down during this period and we had a WAN connection to the new system. We mapped a drive from the old server to the new one and used command line rar to copy the unattached Database files (no tran logs) to the target location on the new server.

    Once all the files had been copied and where in place on the new server a SQL agent Job was executed than ran a series of scripts to restored the databases and repaired any login or dbo issues.

    This replaced a SQL server backup and restore process that moved the files via FTP and was 20 hours faster.

    I've not worked with RAR... not sure why... just haven't so pardon my ignorance. Does RAR handle files of more than 2GB in size after the final compression?

    Yes, RAR will do that (along with it's GUI companion, WinRAR).

    RAR/WinRAR also supports high security encryption (AES-128) of compressed archives, something that you should probably do before you copy them to a USB drive and drop them in an overnight express envelope.

    Actually you have a great point. Something you would have to do with many databases that are transfered over any "public" network.

    I was just pointing out how it could be used to speed up a network file copy operation.

  • 7-Zip is my compression tool of choice and also has GUI and CLI versions. Also supports AES encryption and the native 7z format offers the best compression I have seen. I have backup jobs that backup databases to the local server, then compresses and encrypted using 7-Zip and then FINALLY robocopy the 7z file over the LAN to the storage where it is stored.

  • There are other ways of tackling this issue, but they do not exist yet for SQL Server.

    Back in the late 1990's a number DB2 shops stopped taking full backups of their databases. They used functionality in some third-party backup products called 'offline database merge'.

    The idea of this way to take an existing full backup, and apply to it either a differential or a log backup, giving you a new full backup. If you applied a log backup, you could choose any point of time in the log that the new full backup should relate to.

    One advantage of this aproach is that the merge process could take place away from the database server. If you already had your full backup on an offsite server, you only needed to get your log backup sent there, and you could do the backup merge. The only time this process needed to connect to a database server was to register the new full backup.

    IMHO this is a process that vendors of SQL backup products could ad to their product, assuming they can work around any patent issues that may exist. Anyone want to take up this challenge?

    Original author: https://github.com/SQL-FineBuild/Common/wiki/ 1-click install and best practice configuration of SQL Server 2019, 2017 2016, 2014, 2012, 2008 R2, 2008 and 2005.

    When I give food to the poor they call me a saint. When I ask why they are poor they call me a communist - Archbishop Hélder Câmara

Viewing 15 posts - 16 through 30 (of 34 total)

You must be logged in to reply to this topic. Login to reply