I just finished a disaster recovery with a client that really hit home the need to have good network security documentation. During a summer thunderstorm, one of their UPSes went berserk, and half the equipment in one rack was taken out, including their main firewall and several key Ethernet switches. The problem was that their main IT person was unreachable on a sailboat, and we were left putting the pieces back together in the dark without any guidance. It was like starting from scratch trying to rebuild their network. After it was all done, I jotted down these five key items that would have taken the downtime from days to hours.
- Basic Topology: If you can't quickly explain how your network fits together with some simple pictures, you've got a problem. You don't need a PhD in Visio to do this, but basic topology information showing the relationship between the Internet, your internal networks, and any firewalls or routers is your starting point. Add IP addresses and subnet masks, at a minimum, and that's 90% of the job. With the proliferation of VLANs, subnets, and multi-zone firewalls in many networks, the old strategy of "tracing the cable" just won't tell you how the network is put together when you're under a lot of stress and equipment isn't running. Even outside of a disaster, having an accurate network topology map is the starting point for any discussion on network operations or a technical support call. The most important idea here is that the topology data is easy to update and keep current, which means that fancy graphics and layout get in your way more than anything.
- Passwords: You must have a password book that has emergency "root" or "admin" passwords for every piece of equipment. Not online, not in someone's head, but printed out and put in a safe where the executives of your company can get to it when they need it. Your best bet is to set up these emergency users separate from your normal authentication system. In other words, your password is your own (hopefully centrally controlled from a RADIUS server somewhere) and that's what you use. But when something bad happens (like you getting hit by a bus), or when your RADIUS service dies, the password book will be critical.
- IP address management: If you're a big company, you probably have a commercial product or a homegrown tool to handle IP address allocation. If you don't, make sure you have network security documentation that reveals who is using what IP addresses and what for. All this information needs to be in the DNS as well, so that both forward and reverse lookups (at least on your LAN) work properly. Being able to work from a list, and from the DNS, to help diagnose problems is a timesaver and ensures you won't make a fatal mistake and assign two devices with the same IP. If you try and figure out whether an IP address is in use by pinging, then you've got a serious security and networking deficiency that needs to be fixed. And while you're at it, make a rule that nothing gets connected to the network without a label on it that shows its IP address.
- Configuration backups: Every device has a configuration, and you must have a backup -- preferably on a standalone system that you can get to even if the network has melted down -- for those devices available. The more complicated the device, like your firewall, the more difficult it will be to recover if you have to replace it. This means that keeping these up to date is a business-critical task. If you're waiting for people to make these backups, change that immediately and get a tool to automatically save every device's configuration daily.
- Philosophy: Network security changes over time, and everyone has a different idea on how to build things. For the team taking care of network security changes, over time, basic directions on how to maintain consistency that are obvious may be lost. For example, when writing firewall rules, you probably have a standard way of adding objects like networks and services. Document these little standards and conventions, so that continued growth will be consistent and easy to understand .
Having this documentation in place is an obvious way to speed up the resolution of large and small disasters. But documentation isn't only for use when something is wrong -- much of what I listed will be useful in the day-to-day operations and growth of any network, large or small.
About the author:
Joel Snyder is a senior partner with Opus One, a consulting firm in Tucson, Arizona