High Availability Server Clustering

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
Post Reply
czanec
Posts: 8
Joined: Tue Oct 16, 2018 12:26 pm

High Availability Server Clustering

Post by czanec »

Hi,

Is there anyway I can use two different DCs(It may have different public IP address), for HA clustering.

thank you,
phoenix
Ambassador
Ambassador
Posts: 27272
Joined: Fri Sep 12, 2014 9:56 pm
Location: Liverpool, England

Re: High Availability Server Clustering

Post by phoenix »

What does this question have to do with a Zimbra Mail Server, it's not actually clear from your question what you're trying to achieve and with which product.
Regards

Bill

Rspamd: A high performance spamassassin replacement

Per ardua ad astra
czanec
Posts: 8
Joined: Tue Oct 16, 2018 12:26 pm

Re: High Availability Server Clustering

Post by czanec »

Hi,

Thanks for update. I'm planning to purchase "Professional Edition" to configure HA cluster with Zimbra. My requirement is to configure two servers in two DC, so if there any problem in one DC, i can still have my service from other DC. Before purchasing just want to make sure, if its possible or not.

Thank you
Zane
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2796
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.6 Network Edition
Contact:

Re: High Availability Server Clustering

Post by L. Mark Stone »

czanec wrote:Hi,

Thanks for update. I'm planning to purchase "Professional Edition" to configure HA cluster with Zimbra. My requirement is to configure two servers in two DC, so if there any problem in one DC, i can still have my service from other DC. Before purchasing just want to make sure, if its possible or not.

Thank you
Zane
Zane,

HA is supported at the hypervisor level with VMware; see https://wiki.zimbra.com/wiki/VMware_HA_ ... laboration

Other hypervisors in combination with storage vendors' cross data-center replication functions provide similar functionality, e.g. Citrix's XenServer Disaster Recovery (formerly named Site Recovery).

The kind of application-level HA you are describing does not exist in Zimbra, and IMHO is not needed. I presented at the North American Partners meeting two weeks ago where I covered Zimbra Hosting Best Practices on Amazon Web Services, and covered three different Disaster Recovery scenarios with various RPO/RTO targets. I also explained why running Zimbra across two data centers, while doable, is not recommended for Disaster Recovery.

First, the levels of redundancy and resiliency in most public cloud providers these days is yards ahead of what all but the largest companies can achieve in their own on-premises or colocation environment. So, the frequency of an outage at the data center level is much, much less than you may be estimating.

Second, Zimbra 8.8 includes an entirely new backup engine (it performs continuous backups), which greatly simplifies Disaster Recovery and shortens RPO/RTO targets, like so:

-- Deploy Zimbra Production in one data center, and have a cool/warm set of bare Zimbra servers ready to go in the D/R data center.

-- Using whatever method works for you (Storage replication is best), keep the contents of /opt/zimbra/backup continuously or near-continuously copied from the production data center to the backup data center.

-- When the Production data center fails, perform a "provisioning only" restore in the D/R data center. This takes a few minutes for each thousand or so mailboxes, and results in a fully working Zimbra system with all of the mailboxes provisioned, shares recreated, distribution lists in place and the production Classes of Service applied as in the production environment. Only the mailboxes are empty, but you can now point DNS to the D/R environment and the system will receive new mail and allow users to log in and send emails. This gives you an initial RTO of perhaps 30 minutes or so, and if you use a backup MX service like DNS Made Easy (under $25/year for each domain), you'll never lose any inbound emails.

-- Once you've failed over after the "provisioning only" process, you then run a second restore, which populates the mailboxes with email. Your ultimate RTO is dependent on your mailstore size and the speed of your storage system for playing back the restore. Your RPO is based on how you are syncing /opt/zimbra/backup in the Production data center to the D/R data center. If via SAN replication, your RPO is zero. If via scheduled rsync jobs, your RPO is the schedule interval.

-- If you don't have cross data center SAN replication and you need zero RPOs, then you can get Mimecast (which I sell) or Proofpoint or Email Laundry or similar protection. These services provide inbound and outbound email relaying and caching (as well as spam filtering, DLP and a variety of other services that you likely are purchasing from multiple other vendors)) so you just "play back" into the Zimbra D/R system the emails sent/received after the last successful /opt/zimbra/backup sync.

It may sound complex at first, but it's just different than what you may be used to, and it's straightforward to set up. Plus, it enables you to perform test restores periodically without having to actually fail over, so you can be sure that if/when you do need to fail over, you can and it will all work OK.

Even other mail systems that claim to have the kind of HA you describe (like on-premise Exchange, with DAG groups) don't always fail over cleanly. And it is not pleasant having to manually clean up after a "split-brain" incident from an incomplete/unclean failover.

Having said all that, there are various Zimbra HA scripts out there, and I know some partners are working on trying to build their own HA bolt-ons to Zimbra that are close, but not fully ready at this writing.

Hope that helps,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
tplecko
Posts: 3
Joined: Fri Sep 25, 2015 4:03 am

Re: High Availability Server Clustering

Post by tplecko »

I would disagree.
High availability cluster is exactly that: high availability.
What you are describing has downtime.

And storage replication doesn't account for service fail, with the OS and the VM fully operational.
Post Reply