Managing Active Directory health usually comes down to three main tasks: installation and deployment, maintenance and break-fix repairs when something goes wrong. In this tip, we're going to review how to build a reliable Active Directory, focusing particularly on the preventative maintenance aspects.
Once your Active Directory is up and running, you do need to perform regular maintenance on it. Every AD guru has their own set of procedures on how to check Active Directory health, but in this article, I'll share mine.
- Check your backups. In fact, this is so important that I wrote a whole separate Active Directory management Tech Tip about it.
- Make sure your replications are working. AD depends on multiple databases being kept in close synchronization by passing around updates and changes. This entire process is called replication, but it's not the entire directory that's being replicated, just the changes.
The whole process isn't tremendously complicated, but if your replications stop working properly, your directory won't be reliable. You can always run "repadmin /showrepl" to see the status of recent replications and whether changes are properly synchronizing. It's unusual for there to be a failure in a LAN case, but if your domain spreads across a WAN, you can have delays.
When a change is made, it won't instantaneously replicate everywhere. Nevertheless, in a healthy AD forest, your replications will all be within a few hours of each other. Repadmin tells you when the last replication was, and all of the servers should be on the same timetable.
When I have more than 2 controllers to look at, I use "repadmin /showrepl * /bysrc /bydest" to get a snapshot of the entire AD domain controller network. Do this monthly.
- Check the event logs. As far as I can tell, it's impossible to eliminate all errors from the event logs, especially during boot time. But for an AD domain controller that has been booted for at least a few hours, you should have nothing other than informational messages in your event log for the directory service. It's important to check event logs both when things are working properly and when you think you have a problem, so you can see which error messages are "normal" for your Active Directory deployment. If you are regularly getting anything other than information messages (usually about defragmentation and backups) in your Directory or DNS application error logs, you have a problem which needs to be resolved. This is another monthly task.
- Know when to defragment. The Active Directory database can get large and fragmented if you have a large directory that runs for years and years, and you can increase performance by performing periodic maintenance. In Windows 2008, you can stop and start AD as a service and perform database maintenance tasks. In earlier versions, you have to boot up into DS Restore Mode to get direct access to the directory. In either case, your preferred utility is Ntdsutil, which lets you check database integrity and reclaim space from, or defragment, the database. This is more of an annual task than a monthly one, but it is something you should plan for at least once a year.
Ntdsutil has another important job: It is used to reset the Directory Services Restore Mode Admin Password, something you need to do every time a system administrator leaves your company. (This can be done without booting into Restore Mode in Windows 2003 and up.)
- Use Dcdiag. I saved the best for last, because I love this tool. Dcdiag has almost 30 different tests it can run to verify the health of your Active Directory, ranging from basic connectivity and security settings errors for directory servers to very specific issues such as missing machine accounts.
Yes, it's cryptic, it's confusing, it's about as hard to use as anything Microsoft has published. But it has an abundance of tests included, and it can catch all sorts of very interesting errors. I start with "dcdiag /a /v /c" (/a means "all domain controllers", /v means "verbose logging" and /c means "comprehensive set of tests.") to see what the big picture of errors is going to be -- and there are almost always a few that have to be looked at, even if they turn out to be innocuous. Some errors that Dcdiag will find, such as system log errors and KCC errors, are common but transient, often because a system has been rebooted. But others, such as the Role Holder test, indicate a serious problem when Dcdiag reports a failure. (Note: "Repadmin" and "Dcdiag" are both command shell-based applications included in the Windows Support Tools. They're in the Support\Tools folder on the Windows Server 2003 installation CD, or available from Microsoft as part of KB892777.)
If you can get a clean Dcdiag run for your domain controllers, then you are almost guaranteed a healthy and properly operating Active Directory. Not every error in Dcdiag is a big deal; some of them won't really impact operations at all. However, you should run this tool regularly and make sure you understand every single error and whether it is something you need to fix or not. I run Dcdiag monthly on systems which are not throwing errors, but if I have recently fixed a problem, I like to run it more frequently, such as once a week, to be sure that other problems do not creep into the directory.
Building a reliable Active Directory should give you the confidence to use AD for other applications. For example, most network and security devices can use RADIUS for authentication of administrators, which helps to centralize password management and account management. Similarly, almost all security devices that are user-aware (such as SSL VPN systems) and will authenticate against Active Directory.
About the author: This was first published in March 2010
Joel Snyder is a senior partner at Opus One, an IT consulting firm specializing in security and messaging.
Send comments on this technical tip firstname.lastname@example.org.
Join our IT Knowledge Exchange discussion forum; please use the midmarket security tag.
This was first published in March 2010