Insane cpu usage by zmmailboxd

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
matteo.fracassetti
Posts: 16
Joined: Thu Feb 08, 2018 3:56 pm

Insane cpu usage by zmmailboxd

Post by matteo.fracassetti »

Zimbra Release 8.7.6_GA_1776.RHEL7_64_20170326144124 RHEL7_64 FOSS edition.
CentOS Linux release 7.4.1708 (Core)
2 servers (VMs on ESXi - HP DL380G6). Mailstore on VM with 6vCPU and 16Gb RAM. 650 Users, and more or less 350 always active sessions.

Since few months we are facing an huge CPU usage by zmmailboxd and in the last days it becames unsustainable.
CPU and Load average grows up to up over 40/60 due to java starts many threads that nevers ends. CPU usage still grows also during nightime or in the weekend, when the server usage is likely to be near null. The ESXi hosts has 2 socket with 4 cores (8 cores and 16 with HT) but Zimbra is eating all the CPU time.

We followed the guidelines for large deployment and increased the "zimbraHttpNumThreads" up to 1000: It gives us just few hours before the maillog fills up again of messages like this:

2018-02-07 09:23:35,497 WARN [qtp1286783232-91802:http://webmail.ourdomain.it/service/soap/] [ip=172.16.39.222;port=43978;] misc - Exceeded the max requests limit. Suspending org.eclipse.jetty.continuation.Servlet3Continuation@5ce9070e

At this point we can only restart the zmmailboxd: The LA goes down and log became clear but it starts grows again. We grow from 4/5 of LA to > 40 in about 15 hours.

The other machine (4vCPU, 6Gb RAM on another ESXi host with same hardware) own the proxy and MTA role but is absolutley idle (Load average around 0,40 for 15 minutes).

The proxy machine:
zmcontrol status
amavis Running
antispam Running
antivirus Running
dnscache Running
memcached Running
mta Running
proxy Running
snmp Running
stats Running
zmconfigd Running

The mailstore/LDAP server:
zmcontrol status
ldap Running
logger Running
mailbox Running
service webapp Running
snmp Running
spell Running
stats Running
zimbra webapp Running
zimbraAdmin webapp Running
zimlet webapp Running
zmconfigd Running


Please, any suggestions?
Klug
Ambassador
Ambassador
Posts: 2747
Joined: Mon Dec 16, 2013 11:35 am
Location: France - Drôme
ZCS/ZD Version: All of them
Contact:

Re: Insane cpu usage by zmmailboxd

Post by Klug »

Are the users using the web interface or IMAP or a mix?
Is NIO enabled (in case they're using IMAP)?
matteo.fracassetti
Posts: 16
Joined: Thu Feb 08, 2018 3:56 pm

Re: Insane cpu usage by zmmailboxd

Post by matteo.fracassetti »

A mix, users are using Web interface and IMAP. Someone (less then 40 users) also uses ActiveSync through Z-Push but this should not be the issues, because CPU usage grows also with the Z-Push server turned off.
IMAP NIO is turned ON since a couple of days: I found it was disabled but nothing changes (or, at least, nothing of noticeable) after enabling it.
matteo.fracassetti
Posts: 16
Joined: Thu Feb 08, 2018 3:56 pm

Re: Insane cpu usage by zmmailboxd

Post by matteo.fracassetti »

Despite the huge CPU usage, server statistics seems to be not useful...

$ iostat -x
Linux 3.10.0-693.17.1.el7.x86_64 (mailprod01vm.cup2000.priv) 02/09/18 _x86_64_ (6 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
95.29 0.10 0.67 0.36 0.00 3.58

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
sda 0.00 2.48 0.19 1.80 3.96 21.70 25.86 0.08 41.94 194.43 26.22 3.19 0.63
sdb 0.01 14.59 34.08 14.98 994.26 269.29 51.52 0.53 10.82 9.01 14.95 1.83 8.98
sdd 0.00 0.02 7.50 2.16 553.55 30.29 120.84 0.12 12.34 10.74 17.92 4.71 4.55
sdc 0.00 0.00 5.03 0.67 335.32 8.74 120.70 0.05 8.72 7.90 14.79 3.65 2.08
sde 0.00 0.01 12.92 1.73 866.36 26.57 121.85 0.15 10.14 9.43 15.42 5.20 7.62
sdf 0.00 0.04 3.68 3.47 282.58 97.11 106.10 0.11 15.97 11.94 20.24 4.06 2.91
dm-0 0.00 0.00 0.00 0.00 0.01 0.01 11.63 0.00 13.32 4.80 14.59 6.27 0.00
dm-1 0.00 0.00 0.18 4.28 3.93 21.66 11.47 0.14 31.53 195.18 24.47 1.41 0.63
dm-2 0.00 0.00 63.23 37.68 3032.05 432.00 68.66 1.08 10.74 9.39 13.00 2.02 20.35

$ vmstat
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
r b swpd free buff cache si so bi bo in cs us sy id wa st
53 0 3948 526352 427512 4624668 0 0 508 76 7 7 95 1 4 0 0
7224jobe
Outstanding Member
Outstanding Member
Posts: 283
Joined: Sat Sep 13, 2014 1:55 am
ZCS/ZD Version: 8.8.15_FOSS Patch38

Re: Insane cpu usage by zmmailboxd

Post by 7224jobe »

Did you take a look in /opt/zimbra/log on store node? Especially zmmailboxd.out? Maybe there are some errors in those files.
Check also logs in the same folder on the proxy node.
Try also zmmytop command as zimbra user to see how the database is performing.
matteo.fracassetti
Posts: 16
Joined: Thu Feb 08, 2018 3:56 pm

Re: Insane cpu usage by zmmailboxd

Post by matteo.fracassetti »

Here attacched some screenshoot.

zmmytop seems to be normal
Image

There are a lot of thread from jetty (/opt/zimbra/common/bin/java) which are eating all the CPU resource (I restarted the mailboxd some hours ago during nightime and, actually, I have less the 50 active sessions):
Image
$ top
top - 10:34:11 up 1 day, 17:22, 1 user, load average: 19.35, 18.49, 18.31
Tasks: 159 total, 1 running, 158 sleeping, 0 stopped, 0 zombie
%Cpu(s): 97.5 us, 2.3 sy, 0.0 ni, 0.2 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st
KiB Mem : 16264040 total, 325856 free, 8927440 used, 7010744 buff/cache
KiB Swap: 8425468 total, 8425376 free, 92 used. 6618092 avail Mem

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
19554 zimbra 20 0 8530524 2.217g 21488 S 381.4 14.3 887:39.84 java
There are 130 active thread from jetty and it always increase.
$ ps huH p 19554 | wc -l
131
Very interesting is the content of zmmailboxd.out:
2018-02-11 06:30:42.347:WARN:oejut.QueuedThreadPool: Activity Thread: qtp1286783232{STOPPING,10<=161<=1000,i=0,q=40} rejected HttpChannelOverHttp@56bfabe1{r=1,c=false,a=ASYNC_WOKEN,uri=//webmail.mydomain.tld/service/soap/ZxChatRequest}
2018-02-11 06:30:42.347:WARN:oejut.QueuedThreadPool: Activity Thread: qtp1286783232{STOPPING,10<=161<=1000,i=0,q=40} rejected HttpChannelOverHttp@136923ac{r=1,c=false,a=ASYNC_WOKEN,uri=//webmail.mydomain.tld/service/soap/ZxChatRequest}
2018-02-11 06:30:58.759:WARN:oejut.QueuedThreadPool:Thread-12: qtp1286783232{STOPPING,10<=161<=1000,i=0,q=40} Couldn't stop Thread[qtp1286783232-24350:http://webmail.mydomain.tld/service/soap/,5,main]
2018-02-11 06:30:58.759:WARN:oejut.QueuedThreadPool:Thread-12: qtp1286783232{STOPPING,10<=161<=1000,i=0,q=40} Couldn't stop Thread[qtp1286783232-7333:http://webmail.mydomain.tld/service/soap/,5,main]
2018-02-11 06:30:58.759:WARN:oejut.QueuedThreadPool:Thread-12: qtp1286783232{STOPPING,10<=161<=1000,i=0,q=40} Couldn't stop Thread[qtp1286783232-1424:http://webmail.mydomain.tld/service/soap/,5,main]
2018-02-11 06:30:58.759:WARN:oejut.QueuedThreadPool:Thread-12: qtp1286783232{STOPPING,10<=161<=1000,i=0,q=40} Couldn't stop Thread[qtp1286783232-33563:http://webmail.mydomain.tld/service/soap/,5,main]
2018-02-11 06:30:58.760:WARN:oejut.QueuedThreadPool:Thread-12: qtp1286783232{STOPPING,10<=161<=1000,i=0,q=40} Couldn't stop Thread[qtp1286783232-41794:http://webmail.mydomain.tld/service/soap/,5,main]
Why those threads couldn't be stopped???
It contains also a lot of those messages:
2018-02-11 06:34:21.888:WARN:oejm.ThreadMonitor:Thread-37:
org.eclipse.jetty.monitor.thread.ThreadMonitorException: Thread 'qtp1286783232-150:http://webmail.mydomain.tld/service/soap/'[RUNNABLE,id:150,cpu:71.15%] STACK TRACE
at java.util.regex.Pattern$BmpCharProperty.match(Pattern.java:3797)
at java.util.regex.Pattern$Branch.match(Pattern.java:4604)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$GroupHead.match(Pattern.java:4658)
at java.util.regex.Pattern$NotBehind.match(Pattern.java:5202)
at java.util.regex.Pattern$Curly.match0(Pattern.java:4272)
at java.util.regex.Pattern$Curly.match(Pattern.java:4234)
at java.util.regex.Pattern$Start.match(Pattern.java:3461)
at java.util.regex.Matcher.search(Matcher.java:1248)
at java.util.regex.Matcher.find(Matcher.java:637)
at java.util.regex.Matcher.replaceAll(Matcher.java:951)
at com.zimbra.cs.html.DefangFilter.sanitizeStyleValue(DefangFilter.java:469)
at com.zimbra.cs.html.DefangFilter.characters(DefangFilter.java:450)
at org.cyberneko.html.filters.DefaultFilter.characters(DefaultFilter.java:152)
at org.cyberneko.html.filters.Purifier.characters(Purifier.java:299)
at org.cyberneko.html.HTMLScanner$SpecialScanner.scanCharacters(HTMLScanner.java:3292)
at org.cyberneko.html.HTMLScanner$SpecialScanner.scan(HTMLScanner.java:3211)
at org.cyberneko.html.HTMLScanner.scanDocument(HTMLScanner.java:907)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:499)
at org.cyberneko.html.HTMLConfiguration.parse(HTMLConfiguration.java:452)
at com.zimbra.cs.html.HtmlDefang.defang(HtmlDefang.java:86)
at com.zimbra.cs.html.HtmlDefang.defang(HtmlDefang.java:45)
at com.zimbra.cs.service.mail.ToXML.addContent(ToXML.java:2784)
at com.zimbra.cs.service.mail.ToXML.addPart(ToXML.java:2679)
at com.zimbra.cs.service.mail.ToXML.addParts(ToXML.java:2537)
at com.zimbra.cs.service.mail.ToXML.encodeMessageAsMP(ToXML.java:1559)
at com.zimbra.cs.service.mail.ToXML.encodeMessageAsMP(ToXML.java:1358)
at com.zimbra.cs.service.mail.GetMsg.handle(GetMsg.java:116)
at com.zimbra.soap.SoapEngine.dispatchRequest(SoapEngine.java:607)
at com.zimbra.soap.SoapEngine.dispatch(SoapEngine.java:440)
at com.zimbra.soap.SoapEngine.dispatch(SoapEngine.java:273)
at com.zimbra.soap.SoapServlet.doWork(SoapServlet.java:303)
at com.zimbra.soap.SoapServlet.doPost(SoapServlet.java:213)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:707)
at com.zimbra.cs.servlet.ZimbraServlet.service(ZimbraServlet.java:206)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:790)
at org.eclipse.jetty.servlet.ServletHolder.handle(ServletHolder.java:821)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1685)
at com.zimbra.cs.servlet.CsrfFilter.doFilter(CsrfFilter.java:169)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at com.zimbra.cs.servlet.RequestStringFilter.doFilter(RequestStringFilter.java:54)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at com.zimbra.cs.servlet.SetHeaderFilter.doFilter(SetHeaderFilter.java:59)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at com.zimbra.cs.servlet.ETagHeaderFilter.doFilter(ETagHeaderFilter.java:47)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at com.zimbra.cs.servlet.ContextPathBasedThreadPoolBalancerFilter.doFilter(ContextPathBasedThreadPoolBalancerFilter.java:107)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at com.zimbra.cs.servlet.ZimbraQoSFilter.doFilter(ZimbraQoSFilter.java:116)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at com.zimbra.cs.servlet.ZimbraInvalidLoginFilter.doFilter(ZimbraInvalidLoginFilter.java:117)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at org.eclipse.jetty.servlets.DoSFilter.doFilterChain(DoSFilter.java:473)
at org.eclipse.jetty.servlets.DoSFilter.doFilter(DoSFilter.java:318)
at org.eclipse.jetty.servlets.DoSFilter.doFilter(DoSFilter.java:288)
at org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1158)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1090)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:109)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
at org.eclipse.jetty.rewrite.handler.RewriteHandler.handle(RewriteHandler.java:318)
at org.eclipse.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:437)
at org.eclipse.jetty.server.handler.DebugHandler.handle(DebugHandler.java:84)
at org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:119)
at org.eclipse.jetty.server.Server.handle(Server.java:517)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:306)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:242)
at org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:261)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:75)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:213)
at org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:147)
at org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)
User avatar
manfred.gipp
Advanced member
Advanced member
Posts: 51
Joined: Tue Feb 03, 2015 7:08 am
Location: Germany
ZCS/ZD Version: Zimbra 8.8.6_GA_1906 (build 2017113
Contact:

Re: Insane cpu usage by zmmailboxd

Post by manfred.gipp »

I have nearly the same Problem. After two days one java Process eats CPU with 100%
the zmmailboxd.out says
2018-02-10 10:38:06.368:WARN:oejs.HttpChannel:qtp998351292-15859: https://lxmail.ova.local/service/home/~ ... t=2&disp=a java.io.IOException: Broken pipe
Only a Reboot of the Mailserver helps to fix this Problem.
Restart with
zmcontrol restart
does not help
Most of our users are using Webmail.
The last time it consumes 100% CPU was on last Sunday. it started at exact 10 o'clock AM.
At this time no one was online. And there is no cron entry to start at 10 AM.
My zimbra installtion is 8.8.6 on centos 7 in vmware environment.
4 CPU with 12GB RAM.
67GB free disk space (260GB disk)
7224jobe
Outstanding Member
Outstanding Member
Posts: 283
Joined: Sat Sep 13, 2014 1:55 am
ZCS/ZD Version: 8.8.15_FOSS Patch38

Re: Insane cpu usage by zmmailboxd

Post by 7224jobe »

Try with zmthrdump, to have a dump of those threads, as suggested here: https://forums.zextras.com/general-info ... l#post5784

Code: Select all

for i in {1..10}; do /opt/zimbra/bin/zmthrdump > /tmp/thrdump.$i.log ; sleep 1 ; done
I also noticed that you are running Centos 7.4, which is the latest release...Did you do a 'yum update' recently? Maybe the problems started after the CentOS release update?

Since I do not have a certain answer for this problem, I'd suggest to ask in the #zimbra IRC channel on freenode...I solved a couple of problems in the past asking the guys there.
matteo.fracassetti
Posts: 16
Joined: Thu Feb 08, 2018 3:56 pm

Re: Insane cpu usage by zmmailboxd

Post by matteo.fracassetti »

Digging into system logs, I notice we upgraded the CentOS release about a month ago on Jan 12 and updated it again on 02 Feb.

CPU issues has started on Fri Jan 19 around 14.00: Load average for 15min start increasing from usual average value near 0,8 up to 22 at 18.30 and it never decrease up to Sat Jan 27, when CPU Load average 15min start decreasing since >20 at 16.00 to <5 at 19.00 and its average values remain less than 4 for all the subsequent week.
On Web 31 Jan at 20.00 average value of "CPU Load average 15min" start increasing again and goes over 20 at midnight. Since that moment the average value of "CPU Load Average 15min" always remain > 20 with peak values around 40, with some drop due to "zmmailboxdctl restart" when the system became unusable.

Image
Andre81
Posts: 9
Joined: Sat Sep 13, 2014 2:59 am

Re: Insane cpu usage by zmmailboxd

Post by Andre81 »

We have a similar problem, and we have noticed this:

Code: Select all

[root@mail01 ~]# top | grep -E "PID|java"
   PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND
  8155 zimbra    20   0 10.050g 1.968g  22276 S   6.2 17.0   5:32.90 java
  8155 zimbra    20   0 10.050g 1.968g  22276 S   2.0 17.0   5:32.96 java
 44893 zimbra    20   0 2962536 117704  17556 S  55.1  1.0   0:01.66 java
  8155 zimbra    20   0 10.050g 1.968g  22276 S   6.3 17.0   5:33.15 java
  1372 zimbra    20   0 3996616 494816  18252 S   2.7  4.1  21:12.85 java
  8155 zimbra    20   0 10.050g 1.968g  22276 S   6.6 17.0   5:33.35 java
  1372 zimbra    20   0 3996616 494816  18252 S   3.3  4.1  21:12.95 java
  8155 zimbra    20   0 10.050g 1.968g  22276 S   9.3 17.0   5:33.63 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S  32.9 17.0   5:34.62 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S  44.9 17.0   5:35.97 java
  1372 zimbra    20   0 3996616 494816  18252 S   0.3  4.1  21:12.96 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S  12.6 17.0   5:36.35 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   7.6 17.0   5:36.58 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   2.3 17.0   5:36.65 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   2.6 17.0   5:36.73 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   3.3 17.0   5:36.83 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S  12.3 17.0   5:37.20 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S  15.6 17.0   5:37.67 java
  1372 zimbra    20   0 3996616 494816  18252 S   0.3  4.1  21:12.97 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   3.0 17.0   5:37.76 java
  1372 zimbra    20   0 3996616 494816  18252 S   0.3  4.1  21:12.98 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   1.0 17.0   5:37.79 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   0.3 17.0   5:37.80 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   1.0 17.0   5:37.83 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   1.0 17.0   5:37.86 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   1.3 17.0   5:37.90 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   0.7 17.0   5:37.92 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   5.0 17.0   5:38.07 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   2.0 17.0   5:38.13 java
  1372 zimbra    20   0 3996616 494816  18252 S   0.3  4.1  21:12.99 java
 45338 zimbra    20   0 2933744  94980  17392 S  41.9  0.8   0:01.26 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   4.7 17.0   5:38.27 java
  1372 zimbra    20   0 3996616 494816  18252 S   2.7  4.1  21:13.07 java
  8155 zimbra    20   0 10.050g 1.969g  22284 S   8.6 17.0   5:38.53 java
  1372 zimbra    20   0 3996616 494816  18252 S   3.7  4.1  21:13.18 java
  8155 zimbra    20   0 10.050g 1.969g  22308 S   3.7 17.0   5:38.64 java

[root@mail01 ~]#

As you can see, sometimes, zimbra starts a process with high PID (44893 and 45338) that using high CPU resource.

A.
Post Reply