[SOLVED] ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Ask questions about your setup or get help installing ZCS server (ZD section below).
Post Reply
bloom
Posts: 21
Joined: Sat Sep 13, 2014 3:01 am

[SOLVED] ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Post by bloom »

Hello,

I have been running two separate, similar installs of ZCS. They are one-box installs. I started around 2012 with 7.x, then went successfully with all the upgrades to 8.0, 8.5, 8.6 and even to 8.7 after some struggling. And both ZCS installs were running quite smoothly, on CentOS 6 based VM under ESXi.

Last Sunday, I downloaded 8.8.3 from zimbra.org site. It was marked as GA then, now it has changed back to RC2. As usual, I took the steps to upgrade
  • stopped the VM, made snapshot,
  • updated ZCS,
  • after a few tests everything was looking OK and I put the server online for users.
Unfortunately both installs become terribly slow. Users are complaining they cannot send / receive emails. Since the upgrade we have been observing very high cpu usage on java processes. Restarting with ‘zmcontrol restart’ helps for a while, but after some time it starts to be slow again. The problem exists during office hours. When the load is low, there is no indication of the problem.

Both installs are about 150 mailboxes, total storage is about 800-900 GB. When upgrading, I accidentally installed the new zimbra-imapd package on one of the hosts. It has not been configured and is not active. This host is performing terribly wrong. I omitted zimbra-imapd on the second one, which has been perfoming a bit better, but still I had to restart it on Tuesday morning (after 1,5 day since upgrade).

What can I do ? I understand I can
  1. wait for the stable version hoping it will fix it, or
  2. try to downgrade.
Option (1) - fix it. Is it anything we can check or change to investigate the problem? I definitely need some help here.

Option (2). I know downgrading is not possible with install process. However, I have snapshot of the whole drive containing /opt/zimbra taken before upgrade. For one of the hosts I have also snapshot of the whole CentOS VM taken before upgrade.

My thoughts and questions which make my sleepless nights now
  1. Is it possible to save the present state of mailbox store, restore the ZCS install from the state before upgrade and then make ZCS learn the new mailbox store? We cannot lose emails from the last 4 days.
  2. If the above is not so easy, I am thinking about saving all the mailboxes with zmmailbox (learning from https://wiki.zimbra.com/wiki/Zmmailbox) and then restoring them after restoring the previous install from snapshot. However it would probably take ages - am I right ? Is there a way to save only messages from the last 4 or 5 days and then import them to existing mailboxes? If that was possible, it would let me restore the previous install and make users happy.
I am looking for your help and will appreciate every hint towards solving the problem.

With regards
Piotr
Last edited by bloom on Sun Sep 24, 2017 8:12 pm, edited 1 time in total.
bloom
Posts: 21
Joined: Sat Sep 13, 2014 3:01 am

Re: ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Post by bloom »

phoenix wrote:ZCS 8.8.4 is the current version for download. You haven't mentioned the specifications of your VM nor the ESXi server itself and no details of how many users nor what type of connections they're making (IMAP, Web UI, Activesync) and the amount of traffic per/day or hour - and no mention of whether this is an OSS or NE version but I'm guessing an OSS version.

If you've taken a snapshot and still have that it could be causing the slowness on your server, a snapshot is only a good idea in the very short term (24-72 hrs) and it should then be removed, for reference: https://www.veeam.com/blog/why-snapshot ... ckups.html
It is OSS version.
I have upgraded from 8.8.3 to 8.8.4 but it did not help.
There are around 70-90 active users in peak hours. They are using mainly IMAP, a few users may use Web UI.
The VM has now 10 GB RAM, 8 vCPU from Xeon X5650 CPU.
How to measure traffic? Are there any specific stats I could provide?
And the snapshots are not slowing down. The snapshots are on lower level ZFS filesystem which is providing storage to ESXi with NFS. There are no snapshots on the ESXi itself.

Is it possible to backup and restore messages mailboxes for the period of given time? It would do the trick I think.

Regards,
Piotr
phoenix
Ambassador
Ambassador
Posts: 27272
Joined: Fri Sep 12, 2014 9:56 pm
Location: Liverpool, England

Re: ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Post by phoenix »

ZCS 8.8.4 is the current version for download. You haven't mentioned the specifications of your VM nor the ESXi server itself and no details of how many users nor what type of connections they're making (IMAP, Web UI, Activesync) and the amount of traffic per/day or hour - and no mention of whether this is an OSS or NE version but I'm guessing an OSS version.

If you've taken a snapshot and still have that it could be causing the slowness on your server, a snapshot is only a good idea in the very short term (24-72 hrs) and it should then be removed, for reference: https://www.veeam.com/blog/why-snapshot ... ckups.html

If you really wanted to move these users to a new VM then take a look at the ZeXtras Migration Tool, it is by far the easiest method of move from one server to a new server.
Regards

Bill

Rspamd: A high performance spamassassin replacement

Per ardua ad astra
phoenix
Ambassador
Ambassador
Posts: 27272
Joined: Fri Sep 12, 2014 9:56 pm
Location: Liverpool, England

Re: ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Post by phoenix »

bloom wrote:
phoenix wrote:ZCS 8.8.4 is the current version for download. You haven't mentioned the specifications of your VM nor the ESXi server itself and no details of how many users nor what type of connections they're making (IMAP, Web UI, Activesync) and the amount of traffic per/day or hour - and no mention of whether this is an OSS or NE version but I'm guessing an OSS version.

If you've taken a snapshot and still have that it could be causing the slowness on your server, a snapshot is only a good idea in the very short term (24-72 hrs) and it should then be removed, for reference: https://www.veeam.com/blog/why-snapshot ... ckups.html
It is OSS version.
I have upgraded from 8.8.3 to 8.8.4 but it did not help.
There are around 70-90 active users in peak hours. They are using mainly IMAP, a few users may use Web UI.
The VM has now 10 GB RAM, 8 vCPU from Xeon X5650 CPU.
How to measure traffic? Are there any specific stats I could provide?
And the snapshots are not slowing down. The snapshots are on lower level ZFS filesystem which is providing storage to ESXi with NFS. There are no snapshots on the ESXi itself.

Is it possible to backup and restore messages mailboxes for the period of given time? It would do the trick I think.

Regards,
Piotr
It makes no difference where the snapshot is located, if it's created then wherever it's located will still cause you problems after a period of time - they should never be kept on a production server. By 'traffic' I only meant the number of email per day/hour or whatever period you measure them. I also think you have too many vCPUs attached to that VM, I would have though two-four vCPUs and some more RAM, if you have it. There is a wiki article on the subject but the wiki seems not to be available at the moment but here's the link: https://wiki.zimbra.com/wiki/Performanc ... re_vSphere

The point about a snapshot is that once you create it the original (virtual) disk does not get updated, it's the snapshot that gets updated (and grows) and all the reads/writes go to that - further information here (and in the older article that link mentions): http://blog.erben.sk/2015/05/13/perform ... snapshots/ - as also mentioned in those link, snapshots are no good for high transactional VMs such as mail or DB server (guess what's inside ZCS - a mail and DB server). :)
Regards

Bill

Rspamd: A high performance spamassassin replacement

Per ardua ad astra
bloom
Posts: 21
Joined: Sat Sep 13, 2014 3:01 am

Re: ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Post by bloom »

phoenix wrote: It makes no defference where the snapshot is located, if it's created then wherever it's located will still cause you problems after a period of time - they should never be kept on a production server. By 'traffic' I only meant the number of email per day/hour or whatever period you measure them. I also think you have too many vCPUs attached to that VM, I would have though two-four vCPUs and some more RAM, if you have it. There is a wiki article on the subject but the wiki seems not to be available at the moment but here's the link: https://wiki.zimbra.com/wiki/Performanc ... re_vSphere

The point about a snapshot is that once you create it the original (virtual) disk does not get updated, it's the snapshot that gets updated (and grows) and all the reads/writes go to that - further information here (and in the older article that link mentions): http://blog.erben.sk/2015/05/13/perform ... snapshots/ - as also mentioned in those link, snapshots are no good for high transactional VMs such as mail or DB server (guess what's inside ZCS - a mail and DB server). :)
Thanks, but now we are talking about snapshots which are absolutely not the cause of the problem here. I am talking about ZFS snapshots (an overview here http://docs.oracle.com/cd/E19253-01/819 ... index.html ). ZFS is copy-on-write filesystem and snapshots have NO negative impact on performance. I do know what snapshots in VMware are and while ZFS snapshots provide the same idea to be able to restore filesystem to a previous point of time, these two kinds of snapshots really have little in common.

And the vCPU number - yes, we had around 4 before upgrade to 8.8. Now it was increased as we had hoped it would help, but it did not.

Meanwhile I have an idea how to achieve some kind of downgrade.

EDIT: It seems to be possible. I just need to export new emails with proper zmmailbox command:

Code: Select all

zmmailbox -z -m name@domain.com getRestURL "//?fmt=zip&meta=0&query=after:09/16/2017" > backup/backup-name@domain.com.zip
The whole plan is to
  • backup all new mails with above commands
  • move backup files to safe place
  • restore my old ZCS 8.7 (with email store containing emails up to last Sunday)
  • import backups with zmmailbox postRestURL command
I'll post my results.

EDIT2: Done. Backup of 4 GB of email from ~160 accounts with the above query took around 1h. It was easy to restore the backups on the 8.7 install restored from my (zfs) snapshot.

So, my downgrade succeeded.

I treat 8.8.4 as not ready and will wait for success stories after stable is released before I upgrade again.


Regards,
Piotr
Last edited by bloom on Fri Sep 22, 2017 7:23 pm, edited 1 time in total.
bloom
Posts: 21
Joined: Sat Sep 13, 2014 3:01 am

Re: ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Post by bloom »

Replying in my own thread, I've got one more question. I succeeded in downgrading on one of the hosts, where I could revert the whole CentOS to the state when it had ZCS 8.7. On another host I have (zfs) snaphshot of /opt/zimbra , but not the whole system. My plan is to shutdown the system, "exchange" /opt/zimbra with non-upgraded 8.7 version, and then start the system. Will that work? Does ZCS make any system wide changes I have to look into ? I know about cron jobs. Anything else?

EDIT: Downgrade succeeded.

With regards,
Piotr
marco.manenti
Posts: 5
Joined: Tue Dec 09, 2014 5:00 am

Re: [SOLVED] ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Post by marco.manenti »

Hi,

i've got the same problem here. Imap server doesn't work, CPU to maximum %.
Proxmox with LXC container.

I've investigated: may be a kernel error?
Before a kernel dump we have the same issue, just reboot the server and the problem is temporarly resolved.

WARNING: CPU: 0 PID: 647 at net/core/dev.c:2576 skb_warn_bad_offload+0xd1/0x120

now i'll try with ethtools and set gro, lro, and sg "off"
marco.manenti
Posts: 5
Joined: Tue Dec 09, 2014 5:00 am

Re: [SOLVED] ZCS kills CPU after 8.8.3 upgrade / zmmailbox question towards downgrade

Post by marco.manenti »

Yes, SOLVED

Proxmox virtualizer, Zimbra in Centos 6 container LXC

ethtool -k enp1s0f0 sg off
ethtool -K enp1s0f0 lro off
ethtool -K enp1s0f0 gro off

Then imap doesn't hangs!
Post Reply