February 28th, 2014
Lessons From The Cold War
Back in the Cold War era, the US and Soviet air forces took a radically different approach to fighter jet design.
The US optimized designs for combat first, so the F-16 has its engine intake on the bottom of the aircraft for aerodynamic reasons. If you recall what happened to Capt. Sully’s aircraft over the Hudson River in New York City, you know what happens to a jet engine when it ingests a bird, or anything other than clean air (it’s called “FOD”: Foreign Objects and Debris in the trade…). Consequently, the US Air Force has crews whose job it is to keep runways totally clean.
The Soviets believe war is always dirty, and you will always be short handed, so they architect their tools to work in really messy conditions. AK-47s never jam, even when doused with mud, for example. Their MiG-25 fighter aircraft has doors on the top of the aircraft that close off the forward-facing jet engine intakes for take off and landing, and suck air in at a right angle from the top of the aircraft. It’s not efficient nor aerodynamic, but we’ve heard you could land a MiG-25 on a dirt air strip and not cause any damage to the engines.
There’s an old saying that “All hardware eventually fails; all software has bugs.” So on our Private Cloud infrastructure, we take from both the US Air Force’s and the Soviets’ play books and combine meticulous maintenance and monitoring with highly resilient and redundant architectural design.
This week it all paid off.
Earlier this week we were alerted that one of the network cards on one of our physical cloud servers (which each host between 20 and 30 of our clients’ servers) had negotiated its connect speed down. We replaced the network cable, and that worked for about an hour. And then the same thing happened on one of our other physical cloud servers. Not a network cable for sure.
After running a series of diagnostic tests, we determined that one of our Cloud’s two redundant core switches was flaking out (that’s a technical term BTW…) but not failing outright. We opened a ticket with HP (it’s a ProCurve switch) who agreed with our assessment and overnighted a warranty replacement to us, which we put in the next day.
If the core switches were not redundant (and too many providers do not have fully redundant switching), a core switch failure would have caused an outage for our entire Private Cloud.
But in our case, the switch flakiness and replacement happened with no service outage whatsoever. In fact, if we hadn’t sent a maintenance notice out to our clients, (we are a full-disclosure kind of shop) no one but us would have been the wiser.
The US Air Force and the Soviets’ methodologies were entirely complimentary in this case, to the mutual benefit of our clients and our engineers’ blood pressure readings.
So next time you are considering a Cloud hosting provider, ask them what would happen if a core switch failed outright, and whether they would notice if the switch didn’t fail but the line speed just dropped some. And then ask them the same thing about their firewall (we have a redundant pair of those too), the network cards on the servers (yup, redundant there too) and everywhere else along the chain.
And when you are ready for our trademarked “Uptime. All the time.” please give us a call at (207) 772-5678.
L. Mark Stone
General Manager, Managed and Private/Hybrid Cloud Services
A Division of OTT Communications
The information provided in this blog is intended for informational and educational purposes only. The views expressed herein are those of Mr. Stone and do not necessarily reflect those of Reliable Networks, OTT Communications or Otelco Inc. The contents of this site are not intended as advice for any purpose and are subject to change without notice. We make no warranties of any kind regarding the accuracy or completeness of any information on this site, and we make no representations regarding whether such information is up-to-date or applicable to any particular situation. All copyrights are reserved by Mr. Stone. Any portion of the material on this site may be used for personal or educational purposes provided appropriate attribution is given to Mr. Stone and this blog.