Stopping zimlet webapp... takes 10-15 minutes

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
PaperAdvocate
Posts: 23
Joined: Tue Oct 11, 2016 9:28 pm

Stopping zimlet webapp... takes 10-15 minutes

Post by PaperAdvocate »

When shutting Zimbra down either via a system reboot or "zmcontrol stop", the process hangs at "stopping zimlet webapp" for 10-15 minutes. If I close all firewall ports (except SSH) and restart Zimbra it will take the same 10-15 minutes to shutdown, however on all following attempts it takes only 1 minute. This shows to me that it's established connections which are related to the issue somehow.

A netstat shows 20-30 IMAPS connections before the firewall is closed and Zimbra is restarted, afterward it shows only the LDAP and other internal connections, and SSH. The logs show E-Mail being received as well so SMTP connections were obviously active.

Single server Zimbra install, version is 8.7.11 FOSS on CentOS 7x64 with Firewalld and Selinux enabled.

The only thing that I think points to something in the logs are these (taken when firewalld allows traffic):

Dec 4 20:56:37 mail zmmailboxdmgr[9641]: stop requested
Dec 4 21:07:14 mail amavis-mc[26756]: Process [26757] exited (/opt/zimbra/common/sbin/amavis-services msg-forwarder) after 332559.5 s: KILLED, signal 9 (0009)

Dec 4 21:29:49 mail zmmailboxdmgr[27589]: stop requested
Dec 4 21:40:28 mail amavis-mc[23759]: Process [23760] exited (/opt/zimbra/common/sbin/amavis-services msg-forwarder) after 867.0 s: KILLED, signal 9 (0009)

All other attempts to shutdown with the firewall blocking traffic takes about 1 minute.

Any ideas on other items I might try to narrow this down further?

Thank you,

Jacob
cpaul_carling
Posts: 33
Joined: Sat Sep 13, 2014 1:36 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by cpaul_carling »

I am having the same issue, on 8.7.9. Reached out to support, but no solution so far.
PaperAdvocate
Posts: 23
Joined: Tue Oct 11, 2016 9:28 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by PaperAdvocate »

Still trying to resolve this issue. I've compared logs of a server with same host OS and Zimbra version.

Release 8.7.11_GA_1854.RHEL7_64_20170531151956 RHEL7_64 FOSS edition.

The difference found was from Zimbra.log
Server with delay:

Code: Select all

Apr  3 23:37:24 mail zmconfigd[24046]: Shutting down. Received signal 15
Apr  3 23:37:25 mail zimbramon[14031]: 14031:info: Stopping zimlet via zmcontrol
Apr  3 23:37:25 mail zmmailboxdmgr[14122]: stop requested
Apr  3 23:37:25 mail zmmailboxdmgr[14122]: waiting for manager process 26108 to die
Apr  3 23:37:25 mail zmmailboxdmgr[26108]: shutdown requested, sending TERM signal to 26109
Apr  3 23:37:26 mail zmmailboxdmgr[26108]: mailboxd/JVM process exited (waitpid expected 26109 got 26109)
Apr  3 23:37:26 mail zmmailboxdmgr[26108]: manager woke up from wait on mailboxd/JVM with pid 26109
Apr  3 23:37:27 mail zmmailboxdmgr[14122]: manager process 26108 died, shutdown completed
Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: disconnect from localhost[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 commands=4
...
Apr  3 23:47:44 mail zimbramon[14031]: 14031:info: Stopping zimbraAdmin via zmcontrol
Server without delay:

Code: Select all

Apr 17 21:18:54 mail zimbramon[491]: 491:info: Stopping zimlet via zmcontrol
Apr 17 21:18:55 mail slapd[20055]: slap_queue_csn: queueing 0xf6c2e40 20180418041855.422478Z#000000#000#000000
Apr 17 21:18:55 mail slapd[20055]: slap_graduate_commit_csn: removing 0xf6c2e40 20180418041855.422478Z#000000#000#000000
Apr 17 21:18:55 mail zmmailboxdmgr[592]: stop requested
Apr 17 21:18:55 mail zmmailboxdmgr[592]: waiting for manager process 21758 to die
Apr 17 21:18:55 mail zmmailboxdmgr[21758]: shutdown requested, sending TERM signal to 21759
Apr 17 21:18:57 mail zmmailboxdmgr[21758]: mailboxd/JVM process exited (waitpid expected 21759 got 21759)
Apr 17 21:18:57 mail zmmailboxdmgr[21758]: manager woke up from wait on mailboxd/JVM with pid 21759
Apr 17 21:18:58 mail zmmailboxdmgr[592]: manager process 21758 died, shutdown completed
Apr 17 21:19:02 mail zimbramon[491]: 491:info: Stopping zimbraAdmin via zmcontrol
Notice on the server with the delay it takes 10 minutes between "stopping zimlet" and "stopping zimbraAdmin" where the other takes 8 seconds.

Both servers are on CentOS7, same patch level and running same underlying hardware, similar # of users. The server with the delay has been updated from 6.x versions where the other started at 8.5+.

On the server with a delay the 2 extra lines that have me wondering are:

Code: Select all

Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: disconnect from localhost[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 commands=4
Any ideas on what I can do next to try to drill down further?
zulhairiseman
Posts: 3
Joined: Wed May 24, 2017 8:34 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by zulhairiseman »

Same issue and still not find any solution. :(
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2796
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.6 Network Edition
Contact:

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by L. Mark Stone »

PaperAdvocate wrote:Still trying to resolve this issue. I've compared logs of a server with same host OS and Zimbra version.

Release 8.7.11_GA_1854.RHEL7_64_20170531151956 RHEL7_64 FOSS edition.

The difference found was from Zimbra.log
Server with delay:

Code: Select all

Apr  3 23:37:24 mail zmconfigd[24046]: Shutting down. Received signal 15
Apr  3 23:37:25 mail zimbramon[14031]: 14031:info: Stopping zimlet via zmcontrol
Apr  3 23:37:25 mail zmmailboxdmgr[14122]: stop requested
Apr  3 23:37:25 mail zmmailboxdmgr[14122]: waiting for manager process 26108 to die
Apr  3 23:37:25 mail zmmailboxdmgr[26108]: shutdown requested, sending TERM signal to 26109
Apr  3 23:37:26 mail zmmailboxdmgr[26108]: mailboxd/JVM process exited (waitpid expected 26109 got 26109)
Apr  3 23:37:26 mail zmmailboxdmgr[26108]: manager woke up from wait on mailboxd/JVM with pid 26109
Apr  3 23:37:27 mail zmmailboxdmgr[14122]: manager process 26108 died, shutdown completed
Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: disconnect from localhost[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 commands=4
...
Apr  3 23:47:44 mail zimbramon[14031]: 14031:info: Stopping zimbraAdmin via zmcontrol
Server without delay:

Code: Select all

Apr 17 21:18:54 mail zimbramon[491]: 491:info: Stopping zimlet via zmcontrol
Apr 17 21:18:55 mail slapd[20055]: slap_queue_csn: queueing 0xf6c2e40 20180418041855.422478Z#000000#000#000000
Apr 17 21:18:55 mail slapd[20055]: slap_graduate_commit_csn: removing 0xf6c2e40 20180418041855.422478Z#000000#000#000000
Apr 17 21:18:55 mail zmmailboxdmgr[592]: stop requested
Apr 17 21:18:55 mail zmmailboxdmgr[592]: waiting for manager process 21758 to die
Apr 17 21:18:55 mail zmmailboxdmgr[21758]: shutdown requested, sending TERM signal to 21759
Apr 17 21:18:57 mail zmmailboxdmgr[21758]: mailboxd/JVM process exited (waitpid expected 21759 got 21759)
Apr 17 21:18:57 mail zmmailboxdmgr[21758]: manager woke up from wait on mailboxd/JVM with pid 21759
Apr 17 21:18:58 mail zmmailboxdmgr[592]: manager process 21758 died, shutdown completed
Apr 17 21:19:02 mail zimbramon[491]: 491:info: Stopping zimbraAdmin via zmcontrol
Notice on the server with the delay it takes 10 minutes between "stopping zimlet" and "stopping zimbraAdmin" where the other takes 8 seconds.

Both servers are on CentOS7, same patch level and running same underlying hardware, similar # of users. The server with the delay has been updated from 6.x versions where the other started at 8.5+.

On the server with a delay the 2 extra lines that have me wondering are:

Code: Select all

Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: disconnect from localhost[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 commands=4
Any ideas on what I can do next to try to drill down further?
It's not clear to me from your posts, but are you saying that the Zimbra server which is running firewalld has the delay, and the server which is not running firewalld does not have the shutdown delay?

FWIW Zimbra typically suggests not running a local firewall on the Zimbra servers themselves, but instead to run a firewall in front of the Zimbra server(s).

Hope that helps,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
PaperAdvocate
Posts: 23
Joined: Tue Oct 11, 2016 9:28 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by PaperAdvocate »

Thank you for your reply Mark. It wasn't that firewalld was present on one and not the other, it was that if the firewall (at the network gateway) was used to block traffic to the Zimbra server then the shutdown delay didn't occur. Once I opened the ports back up (allowing normal traffic) the delay returned.

I also found that when running an upgrade with the firewall blocking all ports the issue occurs; so regular traffic and the upgrade process both trigger something that when zmcontrol stop is issued, it takes 15 minutes to stop.

I've contacted support (paid OSS support) about this and they just said it was a bug but wouldn't give anymore detail (this will be the topic of another post...).

Thank you,

Jacob
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2796
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.6 Network Edition
Contact:

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by L. Mark Stone »

So the server that's been upgraded in place has the issue, but the clean install server does not?

And the server that has the issue doesn't have the issue if the firewall in front of it is turned off?

Is the firewall stateful or stateless? And do you have any outbound traffic rules other than ALLOW ALL?

Last question... does the server's network interface have both an IPv4 and an IPv6 MAC address?

Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
PaperAdvocate
Posts: 23
Joined: Tue Oct 11, 2016 9:28 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by PaperAdvocate »

Yes, for the logs that I had posted earlier, the server with a delay is a server that's been in operation and updated through about 6 years; the server without a delay is 2 years old.

The firewall at the gateway is stateful and the only outbound rule is allow all.

The server has both an IPV4 and an IPV6 mac address.

This forum topic appears similar, but the original post states that "stopping zimlet webapp" eventually fails where mine doesn't (hence my post): viewtopic.php?f=15&t=60903

Someone also opened a bug report about the issue from the other forum post: https://bugzilla.zimbra.com/show_bug.cgi?id=107669

I haven't yet tried one of the solutions in the bug report yet
The problem comes from this script: /opt/zimbra/bin/zmstorectl

As part of the shutdown process, it does a "set global innodb_max_dirty_pages_pct=0" in MySQL, then starts a 600-second loop waiting for the number of dirty pages in MySQL to become 0, before it shuts down MySQL. The number of dirty pages never hits 0, so the loop always runs full duration.

You can modify the number of seconds on line 50 of the script from 600 to something else, then you won't have 10 minute shutdowns/restarts, at least until support gives a "proper" fix.
Will report back after trying.
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2796
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.6 Network Edition
Contact:

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by L. Mark Stone »

Ah well ~/conf/my.cnf never gets updated ever after the initial install, so I’ll speculate that if you compare the versions on the fresh server and the old legacy server you’ll see the dirty pages variable is set in the new server.

Years ago there was discussion abou whether setting this was “safe”. It is.

Please compare both my.cnf files and let us know the differences.

Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
PaperAdvocate
Posts: 23
Joined: Tue Oct 11, 2016 9:28 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by PaperAdvocate »

Both files are the same except for 2 things, that the order of the line items are different and the value of innodb_buffer_pool_size is different; 4163895296 for the server with the delay and 2511535718 for the server that is fine.

The 4163895296 value was given to me by Zimbra support when I sent them my logs, so this is why it's different, but the issue was present prior to changing this value.

On both the innodb_max_dirty_pages_pct = 30.
Post Reply