In addition to being a SQL DBA I'm also a network administrator, or at least I pretend to be. This is a bit off-topic from my usual SQL Server fare, but I figured my pain is your gain. Today, July 31, is System Administrator Appreciation Day. If you use DFS Replication in your company, give your sysadmin a gift by forwarding this on to him\her.
Background - What Is DFS Anyways?
DFS, or Distributed File System, is a feature built into Windows Server that allows you to organize multiple SMB shares into a single DFS share. Accessing the DFS share automatically redirects (under the covers) to one of the SMB shares that are part of the group. It's a neat bit of technology that provides location transparency and redundancy simultaneously. Another part of DFS is the ability to replicate changes made to a file (or files) in one of the SMB shares to all of the other SMB shares in the group. Prior to Windows Server 2003 R2 this feature was called File Replication Service and, while it worked, it was less than efficient because any change to a file resulted in the entire file being copied (imagine making a 1 line change to a 100 MB file). Beginning with Server 2003 R2 it was renamed to DFS Replication. Along with it came some nice improvements which included scheduling, bandwidth throttling, and most importantly Remote Differential Compression which detects and replicates only the part of the file which changed and not the entire file.
DFS (and DFS replication) is really nifty when it works. My company uses it to keep content synchronized on servers across multiple datacenters. However, there are a few caveats that you need to be aware of. I'll call them lessons learned, and unfortunately I had to learn them the hard way. I'm going to share them with you in the hopes that you won't have to go through the same pain that I did.
Lesson #1 – Be Very Careful When Removing And Re-Adding Members To A DFS Replication Group
Let's pretend you've got DFS Replication set up and you want to add a server (referred to as a member) into your group. You start with a blank target directory on the member and a short while after adding it to the group it gets populated with the files & folders from the other members in the group. At some point in the future you need to remove that member temporarily and then re-add it. Thinking you need to start with a blank target directory like the first time you wipe the directory clean and then add the member back to the group. A short time later you start to see the opposite of what you expect – instead of the member receiving the files & folders from the other members they start disappearing from every member in the group. What the heck happened?!?
It turns out that when you delete a member from a DFS replication group information about the member isn't actually deleted from the DFS replication database. Instead, the member is marked with a 30-day tombstone flag. If the member is added back into the DFS replication group the flag is deleted and the original objects for the member are reused. Any changes made to the recently re-added member are then replicated to the other members. So deleting those files from the member before re-adding it? They get picked up and replicated to the other members.
This "feature" and 3 workarounds are documented in Microsoft KB article 961655. Do yourself a favor and read it.
Lesson #2 – Back Up Your DFS Shares
Lesson #1 leads to lesson #2, which should a no brainer – Back up your DFS shares. DFS Replication will replicate all changes to files, including deletes. Would you like to explain to management how you just lost all your files permanently because someone mistakenly deleted all of them and you weren't taking backups? I wouldn't.
Lesson #3 – Recover Deleted Files In A Pinch
Suppose you didn't learn from lesson #2, files got deleted by mistake, and you don't have backups. All hope is not lost. It turns out that DFS Replication keeps a hidden, private folder which contains a copy of the deleted files. It's limited in size so it's not foolproof but it just might save you in a pinch.
Ned Pyle, a Technical Lead for the Directory Services team at Microsoft, posted a handy VB Script that you can use to restore data if you're in disaster recovery mode. I've used it and it saved my butt. By the way, remember lesson #2 about backing up your DFS shares? (hint, hint)
Lesson #4 – Files With The Temporary Attribute Won't Replicate
Filters can be applied to exclude files from replicating based on their extension (e.g. .BAK), but what about when a non-excluded file just won't seem to replicate? It might have the temporary attribute set. DFS Replication won't pick up changes to those files. You wouldn't know that unless you found the single line mentioning it in this TechNet article (see if you can find the line!) or came across this post on the Microsoft Storage Team's blog.
How do you fix that? One way is to use Robocopy to strip the temporary attribute off the file(s) when copying into the DFS share. The switch is: /A-:T
Lesson #5 – Monitor DFS Replication Performance
One big downside to DFS Replication is that unlike other Microsoft products there's no shiny GUI to monitor DFS replication performance. That doesn't mean it can't be done – it just requires a little extra work. There's a command line executable included with DFS called dfsradmin that will create an HTML report showing DFS Replication's health status. There's a nice writeup here and here on how to automate DFS replication health reports. I highly recommend that you take the time to read it and implement your own automated reports.
Conclusion
I hope that my lessons learned the hard way will save you some of the pain that I had to go through. Despite the hiccups that I've had with DFS I remain a big fan of using it to keep content synchronized across multiple locations. One last bit of advice – be sure to check out the File Cabinet blog from the Storage Team at Microsoft. It's a fantastic resource for DFS information that's helped me out many times and will no doubt help you too.