However, I take image backups in case both our DCs go down at the same time. To combat what you describe below, first of all I take the backup in the middle of the night when no one should be doing anything (password changes, etc). Secondly, I stop the NtFrs and Netlogon services while the backup is taken. Thirdly, I backup the two DCs at exactly the same time, restarting the services once the backup is complete.
Taking into consideration the scenario I describe above, I would gratefully welcome any further thoughts on this subject you might have to offer.
I hear you on the image backups, but I wonder if you've got a slight house-of-cards going there. I understand that you only intend to use them in case of severe circumstances, but you're going to have a lot of work and could possibly end up with a situation where your backups aren't going to be good. I suspect one might work, but it's just unpredictable what's going to happen with two images.
I guess my main question is: why not just do the 'official' thing and take a Microsoft backup of the System State (or any third-party product that supports Windows; we use Backup Exec [from Symantec Corp.], but any will probably work)?
That guarantees you've got a nice consistent System State which can be restored. (Note, by the way, that in Longhorn you can stop the directory service and get a very solid backup of the files on disk in a consistent space).
It's just that any database which is 'open' is going to have consistency problems when you backup/restore unless it is totally quiescent. So even database self-maintenance is going to cause problems. But there are these mechanisms within the "application" that guarantee a consistent state--those reduce the window of risk for you from "probably it will work" to "Microsoft says that this will work." (I do recognize that "Microsoft says" is no guarantee...)
Here's another option if you really love the disk image approach: get a VM and make it a third DC. Then boot it, let it synchronize. Then shut it down or (depending on how you are doing VMs) snapshot the disk. That would give you a more reliable image and something that you could 'cling to' in the case that your main DCs both go offline at the same instant. Q: I use Acronis [Inc.] products to create backup images of DCs (while online), and I have successfully restored the images of these DCs in a test environment: replicated AD between two restored DCs, added object, deleted objects, etc. Although I will say that *months* later now my replication is messed up...Is there something I should know about imaging and AD that I don't already know?
Disk imaging is an important tool--I use Ghost [from Symantec Corp.] all the time, but it can't work correctly with AD.
Imagine this situation. We have a database which is, effectively, replicated across two servers. Now, take one server down for a few hours and make a backup. While that is happening, the database is changing. People are changing their passwords; they're logging in and out; all sorts of stuff can be happening. DNS is changing (if you have it in AD) the database, and so are other things like printers.
OK, the first image is done. Now take down the second server, and image that one.
Those databases are different.
OK, now, let's suppose that 23 hours later, you trip over the first server (or the second; it doesn't really matter) and the hard drive crashes. Fortunately, AD is still operating--and the database is still changing. Head out to your local electronics part store, pick up a new hard drive, swap it in.
Now, what happens if you restore the server? You've got two different databases, each of which is 'authoritative and complete.' Depending on an enormous number of variables (such as how long it takes you to get the new hard drive in, when you took the backup, what people were doing), you may simply have a badly configured database or you may corrupt things or you may go "back in time" and so on.
The only safe way to use imaging is take your entire set of AD servers down, image them, and then if you have a failure of any one server, you have to restore them all from the same database version. That's not a 24x7 kind of solution, so I'm sure you aren't doing that.
Alternatively, you can restore the dead server, but leave it disconnected from the network. Then, demote it out of being a DC (and make sure the other DCs also have it demoted), bring it onto the network, and re-promote it.
In other words, if you think of AD as an application, you're really de-installing and re-installing the application. That works fine, but the point of [the original Active Directory backup tip]; was to remind you that you can't just restore a server; you have to think about the database changing all the time and how you are going to get things re-synchronized.