Zoho Cloud Down Due To Power Outage at Equinix SV4 Data Center
Friday, January 20th, 2012Earlier today Zoho, a leading cloud services provider whose CRM solution is known as a solid competitor to Salesforce.com, went off the air. The root cause it turns out was a power outage at their colocation provider’s data center. Their colo provider, Equinix, is considered to be a top-tier provider, and while power at the data center has been restored, Zoho is still down hours later trying to fix all the data corruption from what was effectively pulling the power cords out of the back of the servers while the servers were still running.
Now, Zoho has several million users, including us, so fixing data corruption of that magnitude is not like letting Windows chkdsk just run for a few minutes after the server is rebooted. We’ll have to wait to see what the final outcome is, and for how long Zoho CRM (and SugarCRM and another 214 customers Equinix claims to host at that data center) remain down.
We suffered the same fate a few years ago at our former colocation host. That and other issues caused us to move to a new colocation facility. What happened at our former colocation host was that there was a power outage, the data center UPS (uninterruptible power supply) kicked in, and the system waited for the generator to start. Only the generator didn’t start, and the UPS system had only a few minutes of juice in their batteries, so every server in the data center crashed, quite hard. Fortunately, we had plans in place so we were able to recover quickly.
When we did a new colocation facility bakeoff, one of the detailed questions we asked was what happens if the power goes out and the generator fails to start? Most data centers told us things like “We test the generator weekly! That won’t happen!” (which is what our former data center provider told us as well). Well, guess what? You-know-what does happen periodically.
At the end of the day, we chose BayRing Communications, a New Hampshire-based phone company with two data centers at the old Pease Air Force base. When we asked that same question of them, they laughed, literally, and said that in their experience gear fails all the time and so one needs to be prepared. In their case, they bought a lot of batteries for their UPSs. When the power goes out, their UPS can run everything for several hours – plenty of time to either fix the generator or get a portable generator trucked in and hooked up. They reminded us that, as a phone company, they get in big trouble if things like 911 don’t work for any length of time.
Indeed, at the end of the due diligence, we understood in more intimate detail what “carrier-grade” really means. And why, if you are running your own and hosting your clients’ mission-critical applications (like electronic health records, and email for regulated companies for example), “carrier-grade” has to be the minimum standard.
Does that cost more? More than some and less than others.
Will we survive without access to our CRM application through Zoho for a few hours? Sure. For a few days would be a real problem though.
At the end of the day, the takeaway here is that, whether you are taking care of a few dozen customers or a few million, when you choose a data center provider you really need to do your due diligence carefully. Something clearly went horribly wrong at Equinix, and as of this writing, though power has been restored for a few hours, they haven’t disclosed the root cause. We’ll have to wait and see…
If you have mission-critical applications and you have concerns about their hosting, we’d be happy to help you through a due diligence process that we organized for ourselves and our clients who host with us. Just give us a call at (207) 772-5678.
Mark


