Stopping zimlet webapp... takes 10-15 minutes

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
zmcontrol
Posts: 34
Joined: Fri Jul 24, 2020 12:43 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by zmcontrol »

liverpoolfcfan wrote: Tue Nov 28, 2023 2:47 pm I implemented an additional timer in the loop. I count the number of loops at a steady state, and if that number gets to 60, I terminate the loop at that point (asuming that it is going to remain at that level until the 600 second timeout would occur.) This allows the majority of cleanup to occur uninterrupted.
liverpoolfcfan,
Thanks for sharing.
For years I set the timeout to 10 seconds without issue, it will be interesting to see the results of your script.
In your experience what is the longest amount of time before reaching a steady amount of dirty pages?
liverpoolfcfan
Elite member
Elite member
Posts: 1122
Joined: Sat Sep 13, 2014 12:47 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by liverpoolfcfan »

I have only bothered to look at the actual timings on a test system with low traffic. On that the longest duration to reach steady state was ~20 seconds. The count went up for about 10 seconds, then dropped over the next 10 seconds, then remained static.
zmcontrol
Posts: 34
Joined: Fri Jul 24, 2020 12:43 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by zmcontrol »

For those interested in determining how long mysql is taking to flush all dirty pages check /opt/zimbra/log/mysql_error.log during shutdown.
For example it took 4 seconds here.

Code: Select all

2023-11-13  1:36:54 140242111273728 [Note] InnoDB: Starting shutdown...
2023-11-13  1:36:55 140242111273728 [Note] InnoDB: Waiting for page_cleaner to finish flushing of buffer pool
2023-11-13  1:36:58 140242111273728 [Note] InnoDB: Shutdown completed; log sequence number
Then edit zmstorectl with a more accurate timeout during shutdown.
Check shutdown times with

Code: Select all

grep -A2 'Starting shutdown' /opt/zimbra/log/mysql_error.log
liverpoolfcfan
Elite member
Elite member
Posts: 1122
Joined: Sat Sep 13, 2014 12:47 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by liverpoolfcfan »

zmcontrol wrote: Fri Dec 01, 2023 12:07 am For those interested in determining how long mysql is taking to flush all dirty pages check /opt/zimbra/log/mysql_error.log during shutdown.
For example it took 4 seconds here.
That's the time mysql actually took to flush and stop. But zimbra doesn't ask mysql to stop until it is finished it's wait for zero dirty buffer pages loop. That is what I was looking to shortcut.
User avatar
joho
Advanced member
Advanced member
Posts: 74
Joined: Tue Apr 26, 2016 9:24 am
ZCS/ZD Version: Release 8.8.15.GA.4177.UBUNTU20.64

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by joho »

rainer_d wrote: Tue Nov 28, 2023 10:15 am I've also got this problem on 8.8.15P44.

I don't quite remember but it also exists on my 10.0.5 test-upgrade.

For a very long time, we ran the mail store on NFS - and Zimbra was happily suggesting this may be the cause of the delay.

After migrating everything to local storage, nothing changed ;-)
Still about 12 minutes delay.
I don't understand how something as annoying as this can survive at least TWO major versions of ANY software?! :?
stubbzord
Posts: 1
Joined: Mon Jan 22, 2024 4:50 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by stubbzord »

Thank you for your reply Mark. It wasn't that firewalld was present on one and not the other, it was that if the firewall (at the network gateway) was used to block traffic to the Zimbra server then the shutdown delay didn't occur. Once I opened the ports back up (allowing normal traffic) the delay returned.
mocha
Posts: 3
Joined: Wed Oct 19, 2022 10:16 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by mocha »

Hello all,
I've contacted Zimbra support about this issue and have been given the number ZBUG-4074.

I am not a database expert, but as far as I have been able to analyse this, it seems to me that the problem is simply due to the old and outdated MariaDB version which, despite setting innodb_max_dirty_pages_pct to 0, will not reduce innodb_buffer_pool_pages_dirty to 0, which is the part of the if statement in the zmstorectl script. It will just lower it to 1 percent and keep it there. And if the number of modified db pages is higher than 0, but at the same time the percentage of dirty pages (LRU & free pages) is lower than 1, the script will very often wait those 600 seconds for no reason, causing unnecessary Zimbra downtime for users.

For MySQL, a similar bug is described here: https://bugs.mysql.com/bug.php?id=62534
I suspect that something like this also happens in MariaDB 10.1, which is used in Zimbra (3.5 years after EoL...).

Also according to the information I've found here:
https://bugzilla.zimbra.com/show_bug.cgi?id=37231
https://www.percona.com/blog/how-to-dec ... own-times/

the whole point of flushing dirty pages is to reduce the time it takes to *actually* shut down and restart the server, by *preparing* it for shutdown. One can keep the application online and responsive for some of that time while dirty pages are flushed from the buffer pool. But instead zmstorectl first stops the application (mailbox) and then does the flushing, so there's no reduction in downtime at all.

Correct me if I am wrong in that but I think the correct order should be
1. Set global innodb_max_dirty_pages_pct=0; to flush dirty pages
2. Monitor the number of dirty pages. If the number approaches 0 or is zero, stop the mailbox.
3. Stop the mysql

instead of the following, which is the current order
1. Stop the mailbox (and from that moment Zimbra is unavailable to users)
2. Set global innodb_max_dirty_pages_pct=0; to flush dirty pages
3. Stop the mysql
Post Reply