[SOLVED] Very High CPU Usage

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
Post Reply
Something
Posts: 10
Joined: Wed Aug 24, 2016 5:15 pm

[SOLVED] Very High CPU Usage

Post by Something »

Hello,

I have a 8.8.8 Patch 3 installation, running as a VM with 12GB RAM, 4 cores, and 100GB of disk space, covering ~50 mailboxes (most are light usage, with about 10 users concurrently connected at any given time via IMAP). The OS is CentOS 7.4, with `cat /etc/centos-release` showing "CentOS Linux release 7.5.1804 (Core)", with all OS updates applied through yesterday.

The system had been running fine for about 2 years or so, until one day I rebooted the system (due to host maintenance unrelated to the VM), and when it came back up, the CPU usage was pegged with load average of anywhere between 5-16! Occasionally it'll drop to the high 2's, but then jumps back up again shortly after. This is the only VM on the host pool that has this issue, and many of the other guest VM's are the same OS and version, so it doesn't seem to be a compatibility issue.

The server only has ports 25, 587 and 993 exposed to the public, the web interface and ssh interface are only allowed within the internal network.

Some info about the setup:

Code: Select all

zmcontrol -v
Release 8.8.8_GA_2009.RHEL7_64_20180322150747 RHEL7_64 FOSS edition, Patch 8.8.8_P3.

Code: Select all

zmlocalconfig | grep heap
imapd_java_heap_new_size_percent = ${mailboxd_java_heap_new_size_percent}
imapd_java_heap_size =
mailboxd_java_heap_new_size_percent = 25
mailboxd_java_heap_size = 1920
zimbra_activesync_syncstate_item_cache_heap_size = 10M

Code: Select all

innodb_buffer_pool_size        = 2430609408
When looking at the `top` output, it's almost always java processes that are eating the CPU.

The system used to be 8.7.5 until recently when I upgraded to 8.7.11 and then applied Patch 2 out of desperation (I had read of a bug with an email eating up cpu in the spam/virus filters). After that didn't make things any better (or worse), I upgraded the system to 8.8.8 (knowing that would be suggested by the community) and then used the package manager to apply the latest patch versions. This still had no effect.

Here's Top:

Code: Select all

top - 09:21:15 up 35 min,  1 user,  load average: 8.26, 6.55, 7.56
Tasks: 217 total,   4 running, 213 sleeping,   0 stopped,   0 zombie
%Cpu(s): 43.3 us, 45.0 sy,  0.0 ni,  9.6 id,  0.0 wa,  0.0 hi,  0.7 si,  1.4 st
KiB Mem : 12293780 total,  7562592 free,  3592068 used,  1139120 buff/cache
KiB Swap:  7995388 total,  7995388 free,        0 used.  8411252 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
11672 zimbra    20   0 4873800  13884   7608 S  60.2  0.1   0:02.03 java
11630 zimbra    20   0 4873804  13492   7596 S  59.3  0.1   0:02.52 java
11486 zimbra    20   0  176416  17012   4172 S  55.5  0.1   0:06.71 zmcontrol
 3246 zimbra    20   0 5837524   1.1g  19944 S  27.0  9.6  14:29.36 java
11675 zimbra    20   0  116952    948    772 S  21.1  0.0   0:00.71 sed
11677 zimbra    20   0  113556    996    816 S  18.4  0.0   0:00.62 awk
11676 zimbra    20   0  113556    996    820 S  17.2  0.0   0:00.58 awk
11673 zimbra    20   0  112724    956    796 S  16.9  0.0   0:00.57 grep
 4459 zimbra    20   0  428196 134360   7116 R  15.7  1.1   1:22.62 /opt/zimbra/com
 3118 zimbra    20   0 2985972 512564  10856 S  13.9  4.2   2:01.05 mysqld
11671 zimbra    20   0  113324    812    388 S  13.6  0.0   0:00.46 zmlocalconfig
 8066 zimbra    20   0 6076424 127524  18096 S   8.3  1.0   1:22.78 java
 2577 zimbra    20   0   46464  11204   1192 S   5.6  0.1   0:11.21 unbound
  992 zimbra    20   0   80.2g  75520   4124 S   4.7  0.6   0:28.29 slapd
 5161 zimbra    20   0  397572   7088    744 S   4.7  0.1   0:24.15 httpd
 1016 zimbra    20   0 3715936 319256  19104 S   3.3  2.6   3:11.87 java
11634 root      20   0  162112   2392   1568 R   3.0  0.0   0:03.06 top
   12 root      rt   0       0      0      0 S   2.7  0.0   0:02.99 watchdog/1
 3956 zimbra    20   0  322636   2088    640 S   2.7  0.0   0:04.81 memcached
11688 zimbra    20   0  176416  12896      0 R   2.1  0.1   0:00.07 zmcontrol
 4036 zimbra    20   0   48640   8624   2172 S   1.5  0.1   0:17.10 nginx
 5119 zimbra    20   0  105968  35720   4464 S   1.5  0.3   0:46.78 perl
Here's IOStat:

Code: Select all

Linux 3.10.0-862.2.3.el7.x86_64 (mail.example.net)        05/16/2018      _x86_64_        (4 CPU)

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          42.65    0.00   27.25    1.10    2.76   26.24

Device:            tps    kB_read/s    kB_wrtn/s    kB_read    kB_wrtn
xvda             41.69       588.37       596.26    1270595    1287622
dm-0             40.03       581.42       595.31    1255579    1285574
dm-1              0.20         1.61         0.00       3468          0
In zmmailboxd.out, I have a lot of these:

Code: Select all

2018-05-15 17:14:29.674:WARN:oejm.ThreadMonitor:Thread-36:
org.eclipse.jetty.monitor.thread.ThreadMonitorException: Thread 'qtp998351292-115:https:https://127.0.0.1/service/extension/nginx-lookup'[RUNNABLE,id:115,cpu:73.54%] STACK TRACE
    at java.lang.Class.getDeclaredMethods0(Native Method)
    at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
    at java.lang.Class.getDeclaredMethods(Class.java:1975)
    at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.getDeclaredMethods(ReflectionNavigator.java:280)
    at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.getDeclaredMethods(ReflectionNavigator.java:47)
    at com.sun.xml.internal.bind.v2.model.impl.ClassInfoImpl.collectGetterSetters(ClassInfoImpl.java:1018)
    at com.sun.xml.internal.bind.v2.model.impl.ClassInfoImpl.findGetterSetterProperties(ClassInfoImpl.java:931)
    at com.sun.xml.internal.bind.v2.model.impl.ClassInfoImpl.getProperties(ClassInfoImpl.java:298)
    at com.sun.xml.internal.bind.v2.model.impl.RuntimeClassInfoImpl.getProperties(RuntimeClassInfoImpl.java:171)
    at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:246)
    at com.sun.xml.internal.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:88)
    at com.sun.xml.internal.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:69)
    at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:212)
    at com.sun.xml.internal.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:83)
    at com.sun.xml.internal.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:69)
    at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.getTypeInfo(ModelBuilder.java:360)
Here's a `ps aux | grep <PID of one of those high CPU java processes>

Code: Select all

ps aux | grep 3246
zimbra    3246 43.7  9.6 5837524 1184224 ?     Sl   08:49  16:10 /opt/zimbra/common/bin/java -Dfile.encoding=UTF-8 -server -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 -Djdk.tls.client.protocols=TLSv1,TLSv1.1,TLSv1.2 -Djava.awt.headless=true -Dsun.net.inetaddr.ttl=60 -Dorg.apache.jasper.compiler.disablejsr199=true -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:-OmitStackTraceInFastThrow -Xloggc:/opt/zimbra/log/gc.log -XX:-UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=4096K -Djava.net.preferIPv4Stack=true -Xss256k -Xms1920m -Xmx1920m -Xmn480m -Djava.io.tmpdir=/opt/zimbra/mailboxd/work -Djava.library.path=/opt/zimbra/lib -Djava.endorsed.dirs=/opt/zimbra/mailboxd/common/endorsed -Dzimbra.config=/opt/zimbra/conf/localconfig.xml -Djetty.base=/opt/zimbra/mailboxd -Djetty.home=/opt/zimbra/common/jetty_home -DSTART=/opt/zimbra/mailboxd/etc/start.config -jar /opt/zimbra/common/jetty_home/start.jar --module=zimbra,server,servlet,servlets,jsp,jstl,jmx,resources,websocket,ext,plus,rewrite,monitor,continuation,webapp,setuid jetty.home=/opt/zimbra/common/jetty_home jetty.base=/opt/zimbra/mailboxd /opt/zimbra/mailboxd/etc/jetty.xml
zimbra   12681 98.0  0.0   9092   928 pts/0    S+   09:26   0:00 grep --color=auto 3246
I've looked in the tmp directories on the OS out of fear of a bitcoin miner or something (had it happen a long while back), but nothing looks suspicious. All the processes look legit enough, nothing abnormal looking. I don't see any abnormal zimlets installed, however all the dates are within 30 days now because of the system upgrades.

I removed all logs thinking it might be a logger issue like has happened before to others, but no dice. Restarting the server has no effect, restarting zmcontrol has no effect, nor does restarting mailboxd or whatever. The high CPU usage comes right back.

Even if I do a `zmcontrol shutdown`, the CPU usage gets a lot better (down to mid 2's in system load), but when the zimbra cron jobs fire off, the load jumps around. Anything under the zimbra user it seems.

What else should I try?
Last edited by Something on Wed May 16, 2018 6:57 pm, edited 1 time in total.
Something
Posts: 10
Joined: Wed Aug 24, 2016 5:15 pm

Re: Very High CPU Usage

Post by Something »

Well, I tried running an old kernel just for fun, and it seems to have normalized the system load:

Code: Select all

uname -a
Linux mail.example.net 3.10.0-693.5.2.el7.x86_64 #1 SMP Fri Oct 20 20:32:50 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

Code: Select all

top - 11:22:51 up 17 min,  2 users,  load average: 0.70, 0.74, 0.82
Tasks: 203 total,   1 running, 202 sleeping,   0 stopped,   0 zombie
%Cpu(s): 13.4 us,  7.5 sy,  0.0 ni, 79.0 id,  0.0 wa,  0.0 hi,  0.0 si,  0.1 st
KiB Mem : 12040892 total,  7774996 free,  3389600 used,   876296 buff/cache
KiB Swap:  7995388 total,  7995388 free,        0 used.  8368264 avail Mem

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
 1025 zimbra    20   0 3715940 284352  19136 S   6.0  2.4   1:04.92 java
  487 root      20   0   40212   7948   7536 S   0.7  0.1   0:07.27 systemd-journal
 3192 zimbra    20   0 5609800   1.1g  20172 S   0.7  9.5   3:08.63 java
16872 root      20   0  162112   2380   1568 R   0.3  0.0   0:00.04 top
    1 root      20   0  125516   3872   2500 S   0.0  0.0   0:01.97 systemd
    2 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kthreadd
    3 root      20   0       0      0      0 S   0.0  0.0   0:00.01 ksoftirqd/0
    5 root       0 -20       0      0      0 S   0.0  0.0   0:00.00 kworker/0:0H
    6 root      20   0       0      0      0 S   0.0  0.0   0:00.00 kworker/u30:0
    7 root      rt   0       0      0      0 S   0.0  0.0   0:00.22 migration/0
    8 root      20   0       0      0      0 S   0.0  0.0   0:00.00 rcu_bh
    9 root      20   0       0      0      0 S   0.0  0.0   0:01.70 rcu_sched
Perhaps something changed in zimbra that doesn't play nice with the latest RHEL/CentOS kernels? Or perhaps this isn't a zimbra issue at all :oops:

Here's a list of kernels in the /boot directory:

Code: Select all

vmlinuz-0-rescue-f0f04a8c04ab4ae9ada5e51eeac87959
vmlinuz-3.10.0-514.10.2.el7.x86_64
vmlinuz-3.10.0-514.el7.x86_64
vmlinuz-3.10.0-693.21.1.el7.x86_64
vmlinuz-3.10.0-693.5.2.el7.x86_64
vmlinuz-3.10.0-862.2.3.el7.x86_64
The system was running on `vmlinuz-3.10.0-862.2.3.el7.x86_64`

Maybe this will help someone else!
Himanshu_N
Posts: 4
Joined: Wed Sep 03, 2014 2:24 pm

Re: [SOLVED] Very High CPU Usage

Post by Himanshu_N »

Have Same Behavior Since Many Zimbra Versions. My Kernel as Follows.

3.10.0-693.el7.x86_64 #1 SMP Tue Aug 22 21:09:27 UTC 2017 x86_64 x86_64 x86_64 GNU/Linux

cat /etc/centos-release
CentOS Linux release 7.4.1708 (Core)

Can U let me know How You Down Graded Linux Version ? I have Physical Server .

Release 8.8.8_GA_2009.RHEL7_64_20180322150747 RHEL7_64 FOSS edition, Patch 8.8.8_P4.
kbulgrien
Posts: 10
Joined: Mon Nov 05, 2018 6:51 pm

Re: [SOLVED] Very High CPU Usage

Post by kbulgrien »

I'm not sure how to search for available old kernels, but on RHEL 7, the kernel version he mentioned was usable as-is:

$ sudo yum install kernel-3.10.0-693.5.2.el7.x86_64

Manual installation of a kernel like this set the old one to default for me. All I had to do was reboot, but, see also:

https://access.redhat.com/solutions/1605183

That said, the old kernel did not resolve a java 100% CPU condition I was trying to fix, though we were on Zimbra 8.6.0 (P12), so I uninstalled the old one to revert back to current.
Post Reply