BP’s Gulf Oil Spill and IT Best Practices
Monday, June 28th, 2010BP’s oil spill is horrific of course, but there are a number of “lessons learned” which are very applicable to the way technology is managed.
Documentation. We are all guilty of a sick laugh over the oil companies’ collective safety plans essentially being carbon copies of each other, with an emphasis on protecting non-existent walruses from spills in the Gulf. But… when there is a disaster in IT, the written Disaster Recovery and Business Continuity plan is where everyone looks for salvation. If that Plan isn’t kept up to date nor reviewed objectively periodically, when an IT disaster strikes (note I said “when”, not “if”…) that disaster will almost assuredly be of longer duration and more costly than it would have otherwise been. Keeping Disaster Recovery and Business Continuity plans up to date in our experience is pretty cheap insurance, and while we understand completely that this activity generally gets deferred to accommodate more pressing matters, we consider it our responsibility to prod clients constructively on this front.
Testing Backups. All Disaster Recovery and Business Continuity plans rely on having good, accessible backups. You can be the best at rotating tapes off site, but if the office burns down you’ll need to get another tape backup device just to do the restores. And who knows if the tapes are any good? This is one good reason why we are in most cases migrating clients away from expensive tape backups to less expensive, easily verifiable, encrypted off site disk storage. We often muse why it’s called “Backup software” when all anyone really cares about are the restores. Unless you periodically test your backups for their restore capabilities, the best Disaster Recovery and Business Continuity plan is pretty worthless — with or without walruses.
Single Points of Failure. The news media has harped considerable coverage on the several “single points of failure” in the blowout preventer. In IT, eliminating all single points of failure is very, very expensive. But eliminating many common single points of failure is surprisingly inexpensive. For example, disk drives are dirt cheap nowadays, so having a fast RAID10 (versus a slower RAID5 or RAID6 system) doesn’t cost all that much more. Similarly, SonicWall for example sells the second unit of a failover pair of firewalls at a considerable discount over the primary unit. We generally recommend that once our clients have a good understanding of what an hour of downtime really costs them, that they consider making “insurance” technology hardware/software investments appropriate for their risk tolerance and lost revenues from downtime. If you can eliminate one four-hour outage every three years for a few thousand dollars when an hour of downtime costs you a few thousand dollars, isn’t that a good return on investment?
In the same way that “every author benefits from a good editor”, we work collaboratively with our clients to help ensure their documentation, backups and level of technology investments are uniquely appropriate and cost-effective.
If you think your company could benefit from a “fresh set of eyes” on your Disaster Recovery and Business Continuity plan, backups and/or levels of IT spend, please give us a call at (207) 772-5678. Remember, we are intentionally not a reseller, so we have no incentive to suggest you buy anything you don’t really need.
All the best,
Mark
CIO