I have a 8.8.8 Patch 3 installation, running as a VM with 12GB RAM, 4 cores, and 100GB of disk space, covering ~50 mailboxes (most are light usage, with about 10 users concurrently connected at any given time via IMAP). The OS is CentOS 7.4, with `cat /etc/centos-release` showing "CentOS Linux release 7.5.1804 (Core)", with all OS updates applied through yesterday.
The system had been running fine for about 2 years or so, until one day I rebooted the system (due to host maintenance unrelated to the VM), and when it came back up, the CPU usage was pegged with load average of anywhere between 5-16! Occasionally it'll drop to the high 2's, but then jumps back up again shortly after. This is the only VM on the host pool that has this issue, and many of the other guest VM's are the same OS and version, so it doesn't seem to be a compatibility issue.
The server only has ports 25, 587 and 993 exposed to the public, the web interface and ssh interface are only allowed within the internal network.
Some info about the setup:
Code: Select all
zmcontrol -v
Release 8.8.8_GA_2009.RHEL7_64_20180322150747 RHEL7_64 FOSS edition, Patch 8.8.8_P3.
Code: Select all
zmlocalconfig | grep heap
imapd_java_heap_new_size_percent = ${mailboxd_java_heap_new_size_percent}
imapd_java_heap_size =
mailboxd_java_heap_new_size_percent = 25
mailboxd_java_heap_size = 1920
zimbra_activesync_syncstate_item_cache_heap_size = 10M
Code: Select all
innodb_buffer_pool_size = 2430609408
The system used to be 8.7.5 until recently when I upgraded to 8.7.11 and then applied Patch 2 out of desperation (I had read of a bug with an email eating up cpu in the spam/virus filters). After that didn't make things any better (or worse), I upgraded the system to 8.8.8 (knowing that would be suggested by the community) and then used the package manager to apply the latest patch versions. This still had no effect.
Here's Top:
Code: Select all
top - 09:21:15 up 35 min, 1 user, load average: 8.26, 6.55, 7.56
Tasks: 217 total, 4 running, 213 sleeping, 0 stopped, 0 zombie
%Cpu(s): 43.3 us, 45.0 sy, 0.0 ni, 9.6 id, 0.0 wa, 0.0 hi, 0.7 si, 1.4 st
KiB Mem : 12293780 total, 7562592 free, 3592068 used, 1139120 buff/cache
KiB Swap: 7995388 total, 7995388 free, 0 used. 8411252 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
11672 zimbra 20 0 4873800 13884 7608 S 60.2 0.1 0:02.03 java
11630 zimbra 20 0 4873804 13492 7596 S 59.3 0.1 0:02.52 java
11486 zimbra 20 0 176416 17012 4172 S 55.5 0.1 0:06.71 zmcontrol
3246 zimbra 20 0 5837524 1.1g 19944 S 27.0 9.6 14:29.36 java
11675 zimbra 20 0 116952 948 772 S 21.1 0.0 0:00.71 sed
11677 zimbra 20 0 113556 996 816 S 18.4 0.0 0:00.62 awk
11676 zimbra 20 0 113556 996 820 S 17.2 0.0 0:00.58 awk
11673 zimbra 20 0 112724 956 796 S 16.9 0.0 0:00.57 grep
4459 zimbra 20 0 428196 134360 7116 R 15.7 1.1 1:22.62 /opt/zimbra/com
3118 zimbra 20 0 2985972 512564 10856 S 13.9 4.2 2:01.05 mysqld
11671 zimbra 20 0 113324 812 388 S 13.6 0.0 0:00.46 zmlocalconfig
8066 zimbra 20 0 6076424 127524 18096 S 8.3 1.0 1:22.78 java
2577 zimbra 20 0 46464 11204 1192 S 5.6 0.1 0:11.21 unbound
992 zimbra 20 0 80.2g 75520 4124 S 4.7 0.6 0:28.29 slapd
5161 zimbra 20 0 397572 7088 744 S 4.7 0.1 0:24.15 httpd
1016 zimbra 20 0 3715936 319256 19104 S 3.3 2.6 3:11.87 java
11634 root 20 0 162112 2392 1568 R 3.0 0.0 0:03.06 top
12 root rt 0 0 0 0 S 2.7 0.0 0:02.99 watchdog/1
3956 zimbra 20 0 322636 2088 640 S 2.7 0.0 0:04.81 memcached
11688 zimbra 20 0 176416 12896 0 R 2.1 0.1 0:00.07 zmcontrol
4036 zimbra 20 0 48640 8624 2172 S 1.5 0.1 0:17.10 nginx
5119 zimbra 20 0 105968 35720 4464 S 1.5 0.3 0:46.78 perl
Code: Select all
Linux 3.10.0-862.2.3.el7.x86_64 (mail.example.net) 05/16/2018 _x86_64_ (4 CPU)
avg-cpu: %user %nice %system %iowait %steal %idle
42.65 0.00 27.25 1.10 2.76 26.24
Device: tps kB_read/s kB_wrtn/s kB_read kB_wrtn
xvda 41.69 588.37 596.26 1270595 1287622
dm-0 40.03 581.42 595.31 1255579 1285574
dm-1 0.20 1.61 0.00 3468 0
Code: Select all
2018-05-15 17:14:29.674:WARN:oejm.ThreadMonitor:Thread-36:
org.eclipse.jetty.monitor.thread.ThreadMonitorException: Thread 'qtp998351292-115:https:https://127.0.0.1/service/extension/nginx-lookup'[RUNNABLE,id:115,cpu:73.54%] STACK TRACE
at java.lang.Class.getDeclaredMethods0(Native Method)
at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
at java.lang.Class.getDeclaredMethods(Class.java:1975)
at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.getDeclaredMethods(ReflectionNavigator.java:280)
at com.sun.xml.internal.bind.v2.model.nav.ReflectionNavigator.getDeclaredMethods(ReflectionNavigator.java:47)
at com.sun.xml.internal.bind.v2.model.impl.ClassInfoImpl.collectGetterSetters(ClassInfoImpl.java:1018)
at com.sun.xml.internal.bind.v2.model.impl.ClassInfoImpl.findGetterSetterProperties(ClassInfoImpl.java:931)
at com.sun.xml.internal.bind.v2.model.impl.ClassInfoImpl.getProperties(ClassInfoImpl.java:298)
at com.sun.xml.internal.bind.v2.model.impl.RuntimeClassInfoImpl.getProperties(RuntimeClassInfoImpl.java:171)
at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:246)
at com.sun.xml.internal.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:88)
at com.sun.xml.internal.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:69)
at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.getClassInfo(ModelBuilder.java:212)
at com.sun.xml.internal.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:83)
at com.sun.xml.internal.bind.v2.model.impl.RuntimeModelBuilder.getClassInfo(RuntimeModelBuilder.java:69)
at com.sun.xml.internal.bind.v2.model.impl.ModelBuilder.getTypeInfo(ModelBuilder.java:360)
Code: Select all
ps aux | grep 3246
zimbra 3246 43.7 9.6 5837524 1184224 ? Sl 08:49 16:10 /opt/zimbra/common/bin/java -Dfile.encoding=UTF-8 -server -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 -Djdk.tls.client.protocols=TLSv1,TLSv1.1,TLSv1.2 -Djava.awt.headless=true -Dsun.net.inetaddr.ttl=60 -Dorg.apache.jasper.compiler.disablejsr199=true -XX:+UseConcMarkSweepGC -XX:SoftRefLRUPolicyMSPerMB=1 -verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:-OmitStackTraceInFastThrow -Xloggc:/opt/zimbra/log/gc.log -XX:-UseGCLogFileRotation -XX:NumberOfGCLogFiles=20 -XX:GCLogFileSize=4096K -Djava.net.preferIPv4Stack=true -Xss256k -Xms1920m -Xmx1920m -Xmn480m -Djava.io.tmpdir=/opt/zimbra/mailboxd/work -Djava.library.path=/opt/zimbra/lib -Djava.endorsed.dirs=/opt/zimbra/mailboxd/common/endorsed -Dzimbra.config=/opt/zimbra/conf/localconfig.xml -Djetty.base=/opt/zimbra/mailboxd -Djetty.home=/opt/zimbra/common/jetty_home -DSTART=/opt/zimbra/mailboxd/etc/start.config -jar /opt/zimbra/common/jetty_home/start.jar --module=zimbra,server,servlet,servlets,jsp,jstl,jmx,resources,websocket,ext,plus,rewrite,monitor,continuation,webapp,setuid jetty.home=/opt/zimbra/common/jetty_home jetty.base=/opt/zimbra/mailboxd /opt/zimbra/mailboxd/etc/jetty.xml
zimbra 12681 98.0 0.0 9092 928 pts/0 S+ 09:26 0:00 grep --color=auto 3246
I removed all logs thinking it might be a logger issue like has happened before to others, but no dice. Restarting the server has no effect, restarting zmcontrol has no effect, nor does restarting mailboxd or whatever. The high CPU usage comes right back.
Even if I do a `zmcontrol shutdown`, the CPU usage gets a lot better (down to mid 2's in system load), but when the zimbra cron jobs fire off, the load jumps around. Anything under the zimbra user it seems.
What else should I try?