Core Switch Failure = Zero Downtime on our Private Cloud

February 28th, 2014

Lessons From The Cold War
Back in the Cold War era, the US and Soviet air forces took a radically different approach to fighter jet design.

The US optimized designs for combat first, so the F-16 has its engine intake on the bottom of the aircraft for aerodynamic reasons. If you recall what happened to Capt. Sully’s aircraft over the Hudson River in New York City, you know what happens to a jet engine when it ingests a bird, or anything other than clean air (it’s called “FOD”: Foreign Objects and Debris in the trade…). Consequently, the US Air Force has crews whose job it is to keep runways totally clean.

The Soviets believe war is always dirty, and you will always be short handed, so they architect their tools to work in really messy conditions. AK-47s never jam, even when doused with mud, for example. Their MiG-25 fighter aircraft has doors on the top of the aircraft that close off the forward-facing jet engine intakes for take off and landing, and suck air in at a right angle from the top of the aircraft. It’s not efficient nor aerodynamic, but we’ve heard you could land a MiG-25 on a dirt air strip and not cause any damage to the engines.

Lessons Applied
There’s an old saying that “All hardware eventually fails; all software has bugs.” So on our Private Cloud infrastructure, we take from both the US Air Force’s and the Soviets’ play books and combine meticulous maintenance and monitoring with highly resilient and redundant architectural design.

This week it all paid off.

Uh Oh…
Earlier this week we were alerted that one of the network cards on one of our physical cloud servers (which each host between 20 and 30 of our clients’ servers) had negotiated its connect speed down. We replaced the network cable, and that worked for about an hour. And then the same thing happened on one of our other physical cloud servers. Not a network cable for sure.

After running a series of diagnostic tests, we determined that one of our Cloud’s two redundant core switches was flaking out (that’s a technical term BTW…) but not failing outright. We opened a ticket with HP (it’s a ProCurve switch) who agreed with our assessment and overnighted a warranty replacement to us, which we put in the next day.

If the core switches were not redundant (and too many providers do not have fully redundant switching), a core switch failure would have caused an outage for our entire Private Cloud.

Whew!
But in our case, the switch flakiness and replacement happened with no service outage whatsoever. In fact, if we hadn’t sent a maintenance notice out to our clients, (we are a full-disclosure kind of shop) no one but us would have been the wiser.

The US Air Force and the Soviets’ methodologies were entirely complimentary in this case, to the mutual benefit of our clients and our engineers’ blood pressure readings.

Conclusions
So next time you are considering a Cloud hosting provider, ask them what would happen if a core switch failed outright, and whether they would notice if the switch didn’t fail but the line speed just dropped some. And then ask them the same thing about their firewall (we have a redundant pair of those too), the network cards on the servers (yup, redundant there too) and everywhere else along the chain.

And when you are ready for our trademarked “Uptime. All the time.” please give us a call at (207) 772-5678.

Take care,
L. Mark Stone
General Manager, Managed and Private/Hybrid Cloud Services
Reliable Networks
A Division of OTT Communications

The information provided in this blog is intended for informational and educational purposes only. The views expressed herein are those of Mr. Stone and do not necessarily reflect those of Reliable Networks, OTT Communications or Otelco Inc. The contents of this site are not intended as advice for any purpose and are subject to change without notice. We make no warranties of any kind regarding the accuracy or completeness of any information on this site, and we make no representations regarding whether such information is up-to-date or applicable to any particular situation. All copyrights are reserved by Mr. Stone. Any portion of the material on this site may be used for personal or educational purposes provided appropriate attribution is given to Mr. Stone and this blog.

Zimbra Password Strength – New Study From Dashlane Recommends Our Policies

January 27th, 2014

Most Websites Allow Crackable Passwords Like “123456″ and “password”
When we first deployed our Zimbra hosting farm years ago, we insisted on complex passwords, password rotations 3x per year and other security protections for our clients. Some prospective clients didn’t like that we wouldn’t let them use three-character passwords, or that they had to change their passwords every four months.

But we stuck to our guns and helped clients to make easy-to-remember passwords that were very complex. If they got stuck changing their password on their mobile device, we gladly helped them through it.  We know it can be frustrating, but we knew the alternative was worse. We use those password policies pretty much everywhere now too.

So it was quite gratifying to see the folks at Dashlane publish a study ranking websites on their password policies — especially because Dashlane’s recommended policies are almost exactly what our policies have been for years.

You can read the PDF press release here or the ArsTechnica article here.

Use a Password Manager Please
One thing the Dashlane press release didn’t explicitly recommend, (doing so would appear to be a little too self-serving…) is to use a unique password for every web site or service; i.e. never use the same password on two or more sites. In that way, if one site gets hacked, you don’t have to worry about changing your password quickly on all of the other sites where you used that same password (like your online banking web site…). To do that, you of course need a password manager (like Dashlane or one of the other myriad password managers out there).

I confess I’m a big proponent of password managers and typically use 24-character passwords everywhere, relying on the password generator function within my password manager to generate good, random passwords with complex characters.  It’s now at the point where I have no idea what my passwords are; I just know the (complex) password which unlocks my password manager.  And I back up the password manager’s database regularly.

If you would like some help with your password challenges, please feel free to call us at (207) 772-5678.

Hope that helps,
Mark
General Manager, Managed and Private/Hybrid Cloud Services
Reliable Networks
A Division of OTT Communications

 

The information provided in this blog is intended for informational and educational purposes only. The views expressed herein are those of Mr. Stone and do not necessarily reflect those of Reliable Networks, OTT Communications or Otelco Inc. The contents of this site are not intended as advice for any purpose and are subject to change without notice. We make no warranties of any kind regarding the accuracy or completeness of any information on this site, and we make no representations regarding whether such information is up-to-date or applicable to any particular situation. All copyrights are reserved by Mr. Stone. Any portion of the material on this site may be used for personal or educational purposes provided appropriate attribution is given to Mr. Stone and this blog.

MPLS, VPNs, BGP Exploits and Maybe Your Data Isn’t as Secure as You Thought? (Zimbra Can Help.)

December 9th, 2013

Wow! Three 3-letter acronyms right off the bat before we even get to the topic at hand: Data Security.

What’s the specific problem here, really?

Well, the problem is that we have stumbled across a few prospective clients using MPLS networks who thought their data traveling across those MPLS networks was secure.

When we showed them how it wasn’t, they freaked. And rightly so, because when you combine MPLS with a BGP hack, you may as well be sitting on an open unencrypted wireless connection in a hacker’s favorite coffee shop accessing your company’s most sensitive data — it’s just about as secure.

Let’s do a quick level set:
If your company’s applications are hosted in multiple data centers, chances are pretty good that either the data center provider, your ISP or both have tried to sell you on the benefits of deploying an MPLS (“Multi-Protocol Label Switching”) network. At the risk of oversimplifying, MPLS let’s you treat multiple, complex networks as if they were one simple network, like in your office. MPLS adds tags (labels) to the data packets to say to the routers “This packet should be routed to the Dallas data center please!”

So that’s nice for application developers and network administrators, who no longer need to worry about complex routing issues when doing things like deploying an Exchange Witness server in one data center, and two replicated mailbox servers in two other data centers.

But MPLS says nothing about the specific route packets are to take between data centers. That responsibility has always fallen to BGP (Border Gateway Protocol), and MPLS does not at all replace BGP. For the non-techies, BGP is kind of like getting directions on Google maps. Sometimes you get a choice of routes, and sometimes you get offered just one route between the starting location and your destination. Same with BGP. Internet backbone providers, major carriers and telecoms “announce” BGP routes most typically as a way to load balance between routes and also to steal market share. AT&T for example might do a deal with a major multinational data center provider to move a bunch of their traffic to AT&T for a week or so for a price break. BGP is what controls how those data packets are routed between here and there. And by some estimates, there are as many 100,000 BGP announce changes every day.

BGP Hacking 101
Going back to our Google Maps example… You know how once in awhile you get offered a route that just looks weird, and you kind of suspect it’s wrong, but you’ve never been to the destination before so you follow the route anyway? Well, BGP hacks work pretty much the same way; they’ll route the traffic someplace where they can examine it. Imagine a car jacking gang who hacked your Google Maps app to route you and your Porsche into an abandoned industrial park you think represents a nice shortcut to your destination. With a car jacking, the traffic stops at the car jacking itself. With a BGP hack, the traffic gets copied but passed on straightaway so neither the sender nor the receiver are likely to notice the traffic has been rerouted.

Since BGP has historically been accessible only to the big boys and sovereign nations, there is explicit trust when one provider’s router advertises BGP routes for certain IP addresses to the entire Internet. Remember when Pakistan unwittingly took YouTube off the air? Yup, that was a bad BGP announce, accidental yes, but you get the idea.

The problem is that doing intentionally bad BGP announces to route “interesting” traffic through your routers let’s you examine all of that traffic. Indeed, Renesys makes a living analyzing the tens of thousands of BGP changes made each day with a view towards protecting their carrier customer base from BGP errors as well as malevolent BGP changes.

In a recent Renesys blog post, they note that this “Let make a BGP announcement so I can look at some interesting traffic!” hack has become more prevalent. We expect this attack vector to increase radically as routers capable of doing BGP announcements come down in price from about $35,000 ten years ago to under $1,000 today (and anyone can buy them).

So What’s The Exposure for MPLS Users?
The Exposure is that we have now seen several companies who have thought that traffic routed over an MPLS network was secure (presuming the MPLS network itself was secure), and so did not encrypt that traffic. That means that anyone on the route between the company’s data centers had full access to that unencrypted data. Subject to HIPAA? Uh oh, you just had a defacto Security Rule violation. Did you sign an NDA or other contract requiring you to keep data confidential and not share it with any third parties. Ooops, if you copied the data between data centers, you just breached your agreement. If you are old enough to remember, this exposure it not so different from listening in on an old phone system party line, without letting anyone know you are on the line.

In Google’s case, this is why they recently announced that they are encrypting all traffic between all of their data centers, even though their data centers are connected via private networks.

What’s The Solution?
First, anytime data has a chance at moving between servers, if you can encrypt that data, we recommend you should.

Well-architected software will do this out of the box. Our email system is Zimbra, and Zimbra has a “secure interprocess communications” setting which is enabled in the default configuration. With Zimbra, this means that each Zimbra component — even when Zimbra itself is installed on only one server — exchanges data with the other components only after encrypting that data. When you go to scale Zimbra to multiple servers, you can then domicile those servers in different data centers and the traffic between them will be encrypted automagically.

If you don’t control the software at that level (or even if you do but you want to be absolutely sure), then we recommend you deploy site-to-site VPNs between your multiple data centers, and not rely on the data center provider to do that for you. VPNs and MPLS play nicely together, so it’s not like you have to choose one or the other.

Third, don’t forget about your remote access users. Yes, Microsoft now encrypts Remote Desktop traffic (128-bit encryption only though…), but deploying a Citrix, SonicWall or other gateway which provides an on-demand VPN and two-factor authentication provides enhanced security with little added usability burden to end users.

Lastly, consider upgrading your encryption to 256 bits (“banking grade”). In early 2014 we will be doing so on our Zimbra Hosting farm, and the marginally higher cost for a 256-bit SSL certificate is really cheap insurance against some of the weak ciphers commonly used for 128-bit encryption.

As always, if you’d like some help wading though any of this, please do not hesitate to give us a call!

All the best,
Mark
CIO

Yahoo and Google Encryption Catch Up

November 18th, 2013

Lately we’ve been reading a lot of articles about how Yahoo and Google are encrypting communications in the wake of the Edward Snowden revelations regarding NSA spying. Here’s one example from today.

Their current efforts are to be applauded, but to be frank, their engineers — and executives — knew better a long time ago.

It used to be that when you went to log in to most email services, your username and password were encrypted but after you logged in all your emails flying between their servers and your laptop in Starbucks were unencrypted. To claim now that these email providers are “Shocked; shocked!” that the NSA was listening in when any thirteen year old sitting in the Starbucks with you could do the same thing with free hacking tools readily available is only slightly more believable than Captain Louis Renault (from the movie Casablanca) being shocked to learn that gambling was taking place in the club. Sure, the scale of what the NSA is alleged to be doing may be shocking, but the tech to do so is pretty rudimentary and widely available.

So kudos to Marissa Mayer, Yahoo and Google for catching up and starting to encrypt all end-user communications coming in and out of their data centers — the way we and our clients have been operating for years.

Want to get your Zimbra email? We have always encrypted both the login and all subsequent transfers of your email; both receiving and sending. Need to get remote access to your corporate Citrix Desktop from Starbucks? Two-factor authentication is required. You have how many mobile devices checking your email? Sorry, we still force password changes every 120 days and yes, you do need to use a “complex” password.

By the way, while Marissa Mayer is getting started at encrypting customer data in and out of Yahoo’s data centers, we will be upgrading our Zimbra encryption to 256-bit banking-grade encryption as part of our next Zimbra upgrade.

Don’t get us wrong, we think there is a valid, valuable role for the NSA to perform. We welcome reasoned debate over how and the extent to which the NSA et. al. should be involved in pre-emptive data harvesting on as large a scale as has been reported and alleged. But at the end of the day, we are more worried about the traditional bad guys and unethical, unscrupulous competitors trying to access your data. After all, we know that your data is valuable. So is ours. We keep our data in the same infrastructure as our clients. We have a SOC 2 Type II audit covering Security, Confidentiality and Availability because, well… we should. Indeed, our data center providers have their own SOC 2 Type II audits too.

So if you are looking for a Zimbra and/or Private/Hybrid Cloud provider who works hard every day to keep your data safe, please give us a call at (207) 772-5678.

Take care!
Mark
CIO

OS X Mavericks Subtle Security Tweak

October 28th, 2013

In Corporate America, a computer’s Lock Screen or screensaver feature is often used as a security feature. Either with a few keystrokes or just enough idle time, a workstation will lock itself, clear the screen of any sensitive, confidential or regulated (think HIPAA ePHI) information, and then wait for a human to come back and unlock the system before any further sensitive information is displayed.

Not so with Mavericks, unfortunately.

As you can see from the screenshot below (System Preferences > Notifications), Apple now sets by default to “ON” the display of Notifications when the system’s screen is locked. So, if you get a text message with some login credentials or a cancer diagnosis, that will show on the screen even when the computer is locked.

The solution (again, as seen in the screenshot below) is to uncheck the tick box “Show notifications on lock screen”. Note that you need to do this for every application which utilizes Notifications.

Further, although we haven’t yet tested to confirm, we suspect that when you install a new application which makes use of Notifications, Mavericks will turn that on by default as well as part of the installation process. In other words, after you install a new application you may need to come back to System Preferences > Notifications and turn off lock screen notifications for that newly installed application.

We like very much the enhanced inter-application integration in Mavericks and the improved usability features — certainly the multi-monitor display enhancements are our personal favorites so far — but we confess we were a little surprised Apple chose to set up Lock Screen Notifications this way. Why use a lock screen in the first instance after all?

If you need help with this, please give us a call at (207) 772-5678.

Mark
CIO

Mavericks Lock Screen Notifications Security Issue

It (Should) Take Two To Tango – Knight Capital Goes Bust

October 23rd, 2013

Slashdot today reported that the forensics on the Knight Capital implosion are now public (see link to story below).

The long and the short of it is that there was no dual-control environment over patching, a multi-server patch wasn’t fully applied and mixed versions of code never intended to work together consequently blew $400 million out the door within 45 minutes.

We too are always worried about change management, and that’s why we deployed more than a year ago an ITIL-compliant CMDB and change management system that requires two people to sign off on changes of this kind of magnitude, and only after a review/approval of the deployment plan. It’s part of our SOC 2 Type II audit process as well.

To be fair, at first we thought the system was a bit of pain. But it forces a team to take the proverbial “deep breath” before doing anything major to a client’s or our own systems, and that’s a good thing. Nowadays we don’t mind it at all, and indeed have come to love it because in one place we get comprehensive change management history.

If you’d like to learn how using ITL-compliant workflow processes might prevent your company from losing $172,222 a second for 45 minutes like Knight Capital did (and no, they weren’t a client of ours…), please give us a call at (207) 772-5678.

Here’s the story link and URL: http://news.slashdot.org/story/13/10/22/2253211/how-to-lose-172222-a-second-for-45-minutes

Take care,
Mark
CIO

Plan To Upgrade Zimbra to 7.2.5/8.0.5 Sooner Rather Than Later

September 3rd, 2013

Zimbra versions 7.2.5 and 8.0.5 look to be released before the end of September, and we have been following the Zimbra bugzilla issue tracking system closely in anticipation.

These companion releases contain a very significant amount of code cleanup (“refactoring” in the agile software development space) with 8.0.5 addressing nearly 200 individual bugzilla entries.

More importantly, these releases contain upgrades to support Outlook 2013, Windows 8 Mail and Internet Explorer 11; improvements in Zimbra’s anti-spam system; better integration with touchscreen interfaces and mobile devices; S/MIME fixes for Firefox 22/23 and a number of social media fixes in addition to a plethora of updates and fixes to correct mostly benign but annoying, productivity-depleting behaviors.

Get Ready For These Upgrades Starting Now
In our view, these releases look to be “go-to” upgrades for most and we are recommending strongly that our clients prepare to upgrade to these releases starting we expect sometime in October.

Two caveats:

First, we and our clients are very, very risk averse (you can call us ‘fraidy-cats if you wish, it’s OK because we are!) so we typically wait a few weeks after a new Zimbra release before applying these updates in production – even after testing in our own labs. With so many fixes for these releases, there is the possibility of an unanticipated regression even with Zimbra’s better-than-most QA. The good news is that the Zimbra community discovers these quickly and Zimbra has been historically quick to release a Patch to resolve these kinds of regressions.

Second, please, please, please read the Release Notes very carefully before upgrading an existing production system. Often there are gems and warnings in the Release Notes one ignores at their peril. If you are a SuSE Linux Enterprise Server shop for example, 8.0.5 requires SLES 11 SP3. Early developer builds installed successfully on SLES 11 SP2 but after the installer-upgrade completed, Apache would not start.

Zimbra is Now Telligent. No, Wait! It’s The Other Way Around…
Probably you have heard by now that VMware has sold Zimbra to Telligent, and Telligent have renamed themselves Zimbra. We have in the past few weeks since the announcement seen a number of encouraging signs that this could be a 1+1 = 3 deal. At the end of September, Telligent (I mean Zimbra) is holding their annual 3-day love-fest (The Big Social) in Dallas and has added a track just for Zimbra Partners. We will be decamping in Dallas to get a first-hand view of things but we really think their strategy of gluing together social with email to enhance internal collaboration and customer engagement is the right way to go. Recall when back in the day Zimbra was actually called the Zimbra Collaboration Suite? Stop by and say hello if you are going!

Hope that helps, and if you need help with these upgrades, give us a call at (207) 772-5678.
Mark
CIO

Zimbra MySQL Tuning for Large Deployments

August 29th, 2013

Zimbra contains a number of databases to speed up things like attachment indexing, but MySQL is one of the most important. In Zimbra, MySQL is a separate instance on every mailbox server used to hold mailbox meta-data. “Meta-data” means things like: what emails are in which mailbox folders. Zimbra’s MySQL is configured to use the transaction-safe InnoDB database engine, which gets a specific amount of memory dedicated to the InnoDB Buffer Pool via a setting in /opt/zimbra/conf/my.cnf.

In most larger Zimbra installations we have historically seen a lack of RAM for Java as the primary bottleneck, but lately we are seeing sub-optimal MySQL settings more frequently. In our view, the root cause of this is virtualization; let me first explain why this is important, how this comes about and then what you can do about it.

What’s The Problem?
Pretty much every MySQL tuning blog/whitepaper/textbook we know says that MySQL’s InnoDB buffer pool should be set to be at least 10% -20% larger than the size of the InnoDB database(s).  The InnoDB buffer pool size is the amount of RAM MySQL allocates for InnoDB database uses. InnoDB creates a lot of temporary objects for manipulating data, which are typically 10% or so of the size of the entire InnoDB databases.  Having some room for growth is also good.  If the InnoDB buffer pool is smaller than the InnoDB databases, then disk swapping happens because MySQL will try to load the entire InnoDB database, plus temporary objects, into RAM.

The InnoDB buffer pool size is configurable via a single parameter within /opt/zimbra/conf/my.cnf.  The Zimbra installer creates my.cnf the first time Zimbra is installed, and sets the InnoDB buffer pool size based on the amount of RAM present at the time Zimbra is first installed.

Most templated virtual machine environments (we use XenServer for our Zimbra Hosting farm) launch newly created virtual machines with a small amount of RAM. If the system admin neglects to increase the amount of RAM before installing Zimbra, the InnoDB buffer pool size will be set way too small.  And here’s the catch: subsequent runnings of the Zimbra installer – even during Zimbra upgrades – make no changes whatsoever to /opt/zimbra/conf/my.cnf.

So the first thing to do is to give your newly created Zimbra server the amount of RAM it should have before you install Zimbra.

Large Deployment Specifics
Even when you allocate sufficient RAM to a Zimbra server before you install Zimbra, we are seeing that dedicated mailbox servers with adequate provisioned RAM but more than a few thousand mailboxes quickly “outgrow” Zimbra’s default InnoDB buffer pool settings. The Zimbra installer’s InnoDB buffer pool tuning algorithm has no clue whether the server is a single multi-function Zimbra server (where there will be a lot of Zimbra daemons clamoring for RAM) or a dedicated mailbox server where more RAM may safely be allocated to MySQL. You get what you get…

How Do I Determine If I Have A Problem With My Zimbra Server?
Easy-Peasy! The mysqltuner.pl script from Major Hayden will do the job for you. Here’s how to download, configure and run it:

  1. Become root and download the mysqltuner.pl script to /opt/zimbra.
  2. Change the ownership and permissions so the zimbra user can run the script.
  3. Run the script and check the results.

Below is from a dedicated mailbox server with ~6,000 mailboxes. As you can see, the Zimbra installer has allocated 3.5GB to the InnoDB buffer pool, but the InnoDB database is already larger at 4GB. We need to fix this!


root@mailboxsrvr5:/zcs/8.0.4-Patch_1# cd /opt/zimbra
root@mailboxsrvr5:/opt/zimbra# wget mysqltuner.pl/mysqltuner.pl
--2013-08-29 09:58:33-- http://mysqltuner.pl/mysqltuner.pl
Resolving mysqltuner.pl... 208.97.148.173, 2607:f298:5:104b::417:6481
Connecting to mysqltuner.pl|208.97.148.173|:80... connected.
HTTP request sent, awaiting response... 302 Found
Location: https://raw.github.com/major/MySQLTuner-perl/master/mysqltuner.pl [following]
--2013-08-29 09:58:33-- https://raw.github.com/major/MySQLTuner-perl/master/mysqltuner.pl
Resolving raw.github.com... 199.27.72.133
Connecting to raw.github.com|199.27.72.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 41847 (41K) [text/plain]
Saving to: `mysqltuner.pl'

100%[=====================================================>] 41,847 --.-K/s in 0.02s

2013-08-29 09:58:34 (1.81 MB/s) - `mysqltuner.pl' saved [41847/41847]

root@mailboxsrvr5:/opt/zimbra# chown zimbra.zimbra mysqltuner.pl
root@mailboxsrvr5:/opt/zimbra# chmod 700 mysqltuner.pl
root@mailboxsrvr5:/opt/zimbra# su - zimbra
zimbra@mailboxsrvr5:~$ ./mysqltuner.pl

>> MySQLTuner 1.2.0 - Major Hayden
>> Bug reports, feature requests, and downloads at http://mysqltuner.com/
>> Run with '--help' for additional options and output filtering
[!!] Successfully authenticated with no password - SECURITY RISK!

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.30-log
[OK] Operating on 64-bit architecture

-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 0B (Tables: 1)
[--] Data in InnoDB tables: 3G (Tables: 1412)
[--] Data in PERFORMANCE_SCHEMA tables: 0B (Tables: 17)
[!!] Total fragmented tables: 302

-------- Security Recommendations -------------------------------------------
[OK] All database users have passwords assigned

-------- Performance Metrics -------------------------------------------------
[--] Up for: 22d 20h 0m 51s (47M q [24.152 qps], 113K conn, TX: 33B, RX: 5B)
[--] Reads / Writes: 43% / 57%
[--] Total buffers: 3.6G global + 2.6M per thread (110 max threads)
[OK] Maximum possible memory usage: 3.8G (32% of installed RAM)
[OK] Slow queries: 0% (12K/47M)
[OK] Highest usage of available connections: 25% (28/110)
[OK] Key buffer size / total MyISAM indexes: 8.0M/101.0K
[OK] Key buffer hit rate: 100.0% (4K cached / 0 reads)
[!!] Query cache is disabled
[OK] Sorts requiring temporary tables: 0% (948 temp sorts / 780K sorts)
[OK] Temporary tables created on disk: 6% (5K on disk / 87K total)
[OK] Thread cache hit rate: 99% (28 created / 113K connections)
[OK] Table cache hit rate: 27% (1K open / 4K opened)
[OK] Open file limit used: 0% (51/524K)
[OK] Table locks acquired immediately: 100% (18M immediate / 18M locks)
[!!] InnoDB data size / buffer pool: 4.0G/3.5G

-------- Recommendations -----------------------------------------------------
General recommendations:
Run OPTIMIZE TABLE to defragment tables for better performance
Variables to adjust:
query_cache_size (>= 8M)
innodb_buffer_pool_size (>= 3G)

zimbra@mailboxsrvr5:~$

The Fix Methodology
The fix is simple: make a backup copy of /opt/zimbra/conf/my.cnf, increase the InnoDB buffer pool parameter in /opt/zimbra/conf/my.cnf and restart Zimbra. Here’s the before/after line in my.cnf:

Before, as set by the Zimbra installer:

innodb_buffer_pool_size = 3773829120

After editing with joe, nano, vi or similar editor to the actual database size of 4GB plus 20% = ~4.8GB so to provide some room for growth we’ll set it at 5GB:

# Increased buffer pool size per results from mysqltuner.pl run on 28 August 2013.
# See Change ticket C00058295.
# innodb_buffer_pool_size = 3773829120
innodb_buffer_pool_size = 5120M

All that’s left is to restart Zimbra and you are good to go!

Caution!
Do NOT change ANY of the other values in my.cnf unless you really, really know what you are doing!

Technical Background
For more detailed reading on the subject, please see the blog at Percona.

P.S. According to this post in the Zimbra forums, Zimbra 9 should be shipping with MariaDB.

Hope that helps,
Mark
CIO

VMware Selling Zimbra to Telligent – Good News In Principle, But…

July 15th, 2013

You may have seen in this morning’s news that VMware have sold Zimbra to Telligent, a Texas-based provider of corporate collaboration, social media and instant messaging solutions.

Some of you know that before I founded Reliable Networks I did media and technology mergers and acquisitions for more than thirteen years, so I’ve been busy this morning analyzing the deal based on my M&A experience as well as our experience as a long-time Zimbra Hosting Partner and Zimbra Solutions Provider.

Bottom Line: This looks to be a good deal for Zimbra and Zimbra’s customers, but early initial signs are the company may not be ready to execute properly.

 

History
When Zimbra was part of Yahoo, Yahoo did not have Marissa Mayer on board yet, and so Yahoo was never able to leverage Zimbra’s strong brand and solid code base. Still, Yahoo left Zimbra pretty much alone (a good thing), so core developers stayed and Zimbra just kept adding millions and millions of mailboxes while adding cash to Yahoo’s bottom line.

VMware was looking to move up from increasingly lower-margin hypervisor software to actual applications, which have higher margins with higher end-user “stickiness” and brand identity. So they bought Zimbra, because, well, email uses a lot of storage (remember EMC effectively owns VMware) and Zimbra scales horizontally by adding (virtual) servers.

Unfortunately, we saw VMware taking a firm hand in directing Zimbra code development to those things which benefitted VMware and not always Zimbra’s broad user base. For example, VMware never released Project Horizon for Zimbra, even after touting it as the corporate file sharing app. VMware dropped Instant Messaging from Zimbra 8 after delaying declaring it as Generally Available (many Zimbra shops had been running Zimbra’s IM for years with no issues whatsoever). We also saw VMware do things like add in Zimlets to support VMware’s High Availability (Site Recovery Manager), while stalling (at least it looked like stalling to us) code development at the application layer to make Zimbra more geographically site resilient on its own.

 

Why We Like This Deal So Far

  • The combined company will be called “Zimbra”. It’s always important to know in a deal like this which end is the dog and which end is the tail.  Calling the combined company “Zimbra” bodes well that the new entity will be investing primarily in growing Zimbra.
  • Telligent “Gets” that Mobile and Social Media complement email but do not replace email.  If they do a good job of integrating their social media/IM and other non-email collaboration products within Zimbra they should have a winner. A number of our clients, especially in marketing, like the integrated Social Media zimlet but wish it could do more.
  • VMware is investing in the deal (as are others).  Normally having to answer to too many masters makes for management issues, but if VMware have no real management say, then their investment provides incentive for them to cooperate later on when/if needed.

 

What We Are Nervous About

  • Well, our clients for one! Our typical client sees email as a mission-critical application. They look to us to host their Zimbra systems in a highly secure, resilient and redundant environment. It’s why we got a SOC 2 Type II audit covering Security, Availability and Confidentiality. We and our clients need New-Zimbra to be focussed 110% on demanding corporate clients where collaboration is key and where clients can self-host (or have us host for them).
  • Clarity.  Telligent’s web site is big on marketing speak but short on technical resources.In our experience, even the non-technical managers who make decisions about Zimbra insist on a kind-of technical due diligence before choosing Zimbra over Exchange.
  • Telligent’s ability to execute.  Sure, all the Telligent execs have great-looking bios, but when I called to get more information about their products, I got stuck and saw some red flags. I wound up calling the company four times this morning.  The first time I called I asked for technical pre-sales and got transferred to mailbox that was full. The next time I called I asked for technical support and got another mailbox that was full. The third time I called and just asked the receptionist if customers could host Telligent’s software themselves. No reply, she just transferred me to an auto-attendant which gave me a way-to-long recording about going to the web site for information, and then finally gave me a choice to enter 1 for sales, 2 for tech support, etc. This time I got an unidentified, generic-sounding voicemailbox in which I could actually leave a message.  Did I mention that all three times I called their VoIP system was clipping and dropping like crazy?  The last call I made was actually clear, to a different receptionist, and this time I identified myself as a Zimbra Partner and Zimbra Forums moderator wanting to speak to someone about the deal.  The receptionist took my name and number and said someone would call me back “shortly”.

 

Hopefully I’ll hear back from Telligent soon, and when I do I’ll update this blog post.

In the interim, we’ll be reading the tea leaves for whatever nuggets we can find and will update here as and when.

All the best,
Mark

How To Stop Hating Your Internet Service Provider

March 26th, 2013

“The Internet’s down again!”

Such is the rallying cry heard in offices across America daily.  And then the words we can’t repeat in polite company are used as adjectives to be paired with the corporate name of the ISP to help vent frustration.  Later that afternoon over beers or whatever we continue to curse the ISP and their incredibly deceitful, incompetent, etc. engineers who caused the problem in the first place in addition to the ISP’s support staff who would never acknowledge the outage as the fault of the ISP (and yes, we already rebooted the modem before calling, thank you).

In a day or two everyone’s systolic blood pressure has eased back down to normal, and no one thinks any longer about the incredibly expensive lost productivity from that X-hour outage – until the same thing happens again in a week or a month or a few months later.

The fact is that technology is fragile; much more fragile than most vendors are willing to admit (“Buy our stuff! It doesn’t fail all that often!” “Really?”).  That’s why servers come with redundant power supplies, redundant disks, error-correcting RAM, and multiple network cards which can be configured into bonds, just like those redundant disks.  Those servers are glued together into virtualization pools (with at least N+1 physical servers) connected to redundant switches, firewalls and routers.  All that hardware comes with hardware warranties from the manufacturers, who staff 24 x 7 engineers because, well, that stuff fails and we don’t want service to be interrupted when something fails.

The good news is that all that redundancy and resiliency architected in really does work most of the time.

Why it is that so many companies resist mightily putting in place comparable amounts of redundancy for their Internet connection is something that boggles us.  ”You have 2TB of mission-critical data?  Great! Let’s get a single 3TB disk for that new server of yours!”  Now, if we said that to a client we’d be shown the door.

But when we say to a prospective client “You have your mission-critical application servers in a good colocation facility? Great! Let’s get you carrier-diverse redundant Internet connectivity for your office so your employees can always have access to your core applications!” all too often we are told that spending an extra $100 – $300/month to avoid several hours of downtime each year is “not in the budget” or some such.

For most of our clients each hour of downtime is a four- or five-figure loss of cash revenue, not to mention the soft costs associated with frustrated employees and customers.  The insurance equation augers strongly for deploying carrier-diverse redundant Internet connectivity in almost all cases as a matter of course.

That’s why we say, if you want to stop hating your Internet service provider, just get another one. Let us deploy a firewall/router for you that will load balance or failover automagically between both Internet connections.

Then when Time-Warner stops carrying ESP packets (as is happening now for one of our clients as I write this blog post) and all the IPsec VPNs go down, the infrastructure can fail over to the other ISP with minimal if any outage perceived by the end users.  Just remember: today it is Time-Warner, but tomorrow it could be your ISP whomever they are!  Are you ready?

If you need help planning for and deploying carrier-diverse, redundant Internet connectivity, please give us a call and we would be glad to help (207) 772-5678.

Hope that helps,
Mark
Founder and Managing Member