zxsuite backup NG performance

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
Post Reply
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 901
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

zxsuite backup NG performance

Post by JDunphy »

I have been testing and retesting external restore's with the backup NG and have been confused with what I am seeing in contrast to other methods I have available for restore performance... but saw this article this morning (May 18, 2021) explaining we should expect to see a 20x improvement with 3.1.11

https://community.zextras.com/external- ... imization/

I think we (Zimbra network customers) are currently 2 releases behind this announced version as of today but that looks like solid progress for the future.

A few questions I have run into that I am asking on the zextra's forums.

1) How do you control latest to oldest in restores for an external/disaster recovery where you have already provisioned the accounts. The users most likely want their latest email first as that is probably what they are currently working with and do not care about the older email initially especially for accounts that have existed for a number of years so a priority scheme seems like it might make the new host initially more usable. This is a guess by me as I saw my 2005 email before I saw my latest email during my testing.

2) Given the multiple steps involved in an external restore with backup NG, how have others scripted unattended recovery for disaster recovery. I can do the OS and I can do the initial zimbra install unattended but then zxsuite looks like it is going to require me to parse the output of the zxsuite command to find the log and then parse that log file before I can move to the next command.

3) Is it possible to add other files or directories to be backed up with that customization tar image that is taken with backup NG ... ie) /opt/zimbra/acme.sh

4) concurrency...I gather this is the number of accounts you want to concurrently restore. Is there a best practice to how to guess at this number to maximize the aggregate external restore process based on cpu/memory/io ... or is the bottleneck really per account restoration. In other words, if I have 1 account that I want restored and I set concurrency to 50 will it complete the same as if concurrency was set to 1 provided memory/cpu/io were in excess supply. I need to test this in a VM where I can modify the host parameters to know for sure but thought I would ask if anyone knows.

Jim
Klug
Ambassador
Ambassador
Posts: 2767
Joined: Mon Dec 16, 2013 11:35 am
Location: France - Drôme
ZCS/ZD Version: All of them
Contact:

Re: zxsuite backup NG performance

Post by Klug »

The restore speed increase is real (I've seen it) on huge mailboxes (several hundreds thousands items).
From one item each 8 seconds to 8-12 items per second.

1. You can't control anything.
If you ask for a domain ExternalRestore or several mailboxes at the same time, you can't even control the order in which the mailboxes are restored (based ton internal zextras ID)

2. The steps I used last time:
. recreate the servers (can be batched/scripted), with SSL certificates and so on
. HSM and deduplication doesn't work on ExternalRestore so you need to provision a hell lot of primary storage
. restore the infrastructure specific variable/parameters (you have to have backuped them on your own), such as max message size
. launch ExternalRestore on each server, for all accounts on server, with a concurrency of 4

Disaster restore based on ExternalRestore only is super-über-over-mega-too-long.
I'm talking about several days (several weeks actually) for a server with about 2000 user and, 7 TB of deduplicated storage space, even is destination is fully local SSD (and 4 current vCores and 16GB RAM)
That is because of the way it works, injecting all items one by one.
You can accelerate it by:
. restoring several accounts in parallel (4 to 6, more could counter productive, you have to find the sweet spot for your infrastructure)
. disabling/tuning indexing (because of the way it works, in batch)
. restoring to several servers (setup several servers to restore to several servers in parallel then mailboxmove backup to less server once everything is done) - you need to be able to share the backup volume with several servers
. have smaller servers in the begining (nomber of acounts and overall storage)
. storing the backup on "local" volume and not NFS nor S3 (costs increase)
. using the currently-beta-but-working-nicely "LocalVolume" feature of backup (if you use NFS or S3)

Steps I'll use if it ever happens again - you need to prepare this before it happens:
. don't rely only on ZeXtras Backup/Backup-NG but on storage snapshots (and storage backups)
. restore the storage snapshots (don't recreate servers but restore the storage, OS, primary, HSM, etc)
. ExternalRestore the delta: https://docs.zextras.com/zextras-suite- ... _snapshots
This will cost some backup space (double the space used, volumes and "internal" Zimbra backup) but considering the time lost in ExternalBackup I consider it actually cheap.
If you can/want to use blobless backup (disappeared from ZeXtras 3.1.11 documentation, I don't know if it's a documentation issue or if the feature is gone), the needed space is not so big.

3. I don't think this is possible.
That's another reason to have a double backup (volumes and internal).
You can also add other files, such as zmprov or zxsuite dumps, LDAP dump, SQL dumps, etc.
Don't forget the Connect/Team backup too. And Drive.

4. You have to do some tests on your own setup (CPU, RAM, storage speed), because it depends on this.
It also depends on ExternalRestoring while users are working on the servers (receiving/sending mails, etc) or not.
Klug
Ambassador
Ambassador
Posts: 2767
Joined: Mon Dec 16, 2013 11:35 am
Location: France - Drôme
ZCS/ZD Version: All of them
Contact:

Re: zxsuite backup NG performance

Post by Klug »

I'll have to try a concurrency of 50 like Luca did to see my server explode under load.
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2802
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.7 Network Edition
Contact:

Re: zxsuite backup NG performance

Post by L. Mark Stone »

You may also want to consider using Centralized Storage for the Secondary (HSM) volume, and using an HSM policy that moves emails older than ~2 days to the HSM volume.

In this way, when you do a restore, you need only restore ~2 day's worth of email blobs.

Of course, you'll want to have a D/R strategy for your Centralized Storage, but if using AWS S3, you have a number of options.

Hope that helps,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 901
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: zxsuite backup NG performance

Post by JDunphy »

Klug wrote: Steps I'll use if it ever happens again - you need to prepare this before it happens:
. don't rely only on ZeXtras Backup/Backup-NG but on storage snapshots (and storage backups)
. restore the storage snapshots (don't recreate servers but restore the storage, OS, primary, HSM, etc)
. ExternalRestore the delta: https://docs.zextras.com/zextras-suite- ... _snapshots
This will cost some backup space (double the space used, volumes and "internal" Zimbra backup) but considering the time lost in ExternalBackup I consider it actually cheap.
If you can/want to use blobless backup (disappeared from ZeXtras 3.1.11 documentation, I don't know if it's a documentation issue or if the feature is gone), the needed space is not so big.
That is perfect! We already have frequent backups of /opt/zimbra/backup to a SAN we can attach/detach to different hosts so this should work well and minimize the downtime for this type of DR scenario.

Thanks

Jim
Post Reply