Stopping zimlet webapp... takes 10-15 minutes

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2806
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.7 Network Edition
Contact:

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by L. Mark Stone »

PaperAdvocate wrote:Both files are the same except for 2 things, that the order of the line items are different and the value of innodb_buffer_pool_size is different; 4163895296 for the server with the delay and 2511535718 for the server that is fine.

The 4163895296 value was given to me by Zimbra support when I sent them my logs, so this is why it's different, but the issue was present prior to changing this value.

On both the innodb_max_dirty_pages_pct = 30.
Dunno, but the InnoDB Buffer Pool size should ideally be 1.25x or larger the size of the InnoDB databases. If the databases are bigger than the buffer pool, some portions of the databases get paged out to disk. If that's what's happening in your case, then likely MariaDB needs to pull that data in from swap and write it to disk before allowing itself to shut down, and that can add to the shutdown time (users will also notice periodic "stalls" of UI responsiveness).

You can use a tool like mysqltuner.pl to get the InnoDB database size, and you can also use the M parameter to set pool size in MB, rather than bytes. Much easier to read and less likely to cause a typo.

So here you can see on one system where, given the fullness of time, more and larger mailboxes, I needed to increase the size of the buffer pool eventually to 7GB. A number of clients have buffer pools 20GB or greater in size, so it's a good thing to check periodically. Be sure to add more RAM to your server if needed too!

Code: Select all

zimbra@zimbra:~$ cat conf/my.cnf | grep -i innodb_buffer
# innodb_buffer_pool_size        = 5047721164
# innodb_buffer_pool_size        = 6144M
innodb_buffer_pool_size        = 7168M
zimbra@zimbra:~$ 
Hope that helps,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
PaperAdvocate
Posts: 23
Joined: Tue Oct 11, 2016 9:28 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by PaperAdvocate »

Thank you. I will check those values.

I haven't timed it perfectly, but it seems it's timing out at 10 minutes which is a common timeout variable. And due to the "timeout" listed in the shutdown process:

Code: Select all

Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: timeout after END-OF-MESSAGE from localhost[127.0.0.1]
Apr  3 23:37:49 mail postfix/amavisd/smtpd[11477]: disconnect from localhost[127.0.0.1] ehlo=1 mail=1 rcpt=1 data=1 commands=4
On researching "timeout after END-OF-MESSAGE from localhost" I found several posts linking the behavior to possibly spamassasin or amavis-new. So I looked through my conf folder for differences in those configs and found only that the server with the delay was missing the IPV6 address entries in @mynetworks. I will change this just in case and try.

On the server with the timeout, there are additional files related to spamassasin that aren't present on the server without issues. These are: a directory named ~/conf/sa with a file inside name salocal.cf, and in the root of the conf folder salocal.cf and salocal.cf.blacklist. I'm going to try and clean those up and see if they do anything.
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2806
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.7 Network Edition
Contact:

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by L. Mark Stone »

FWIW I always remove every trace of IPv6 before deploying Zimbra on an operating system. When IPv6 is ubiquitous, I'll remove every trace of IPv4 before deploying Zimbra on an operating system.

In my experience, Zimbra works well on either an IPv4 OR an IPv6 system, but I have always found "something" (as Rosanne Rosanadana would say...) when both are present on a Zimbra server.

Hope that helps,
Marks
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
andrey.ivanov
Advanced member
Advanced member
Posts: 50
Joined: Wed Aug 08, 2018 8:44 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by andrey.ivanov »

I've seen sometimes the shutdowns that take a lot of time, usually it's because some connections are established from proxy to backend and they are active or in 'TIME_WAIT' state. What i usually do to avoid it:
* stop proxy and memcached services
* on mailbox server, i monitor the connections coming from the proxy using iptstate -t. I've found that the connections in the state 'TIME_WAIT' cause the shutdown delays (at least for me). So i wait until they disappear (usually about 2 minutes).
* then i stop all the other zimbra services.

Using this sequence of stopping zimbra mailbox i have no problems with delays that i had in the past.
PaperAdvocate
Posts: 23
Joined: Tue Oct 11, 2016 9:28 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by PaperAdvocate »

@ L. Mark Stone

I cleaned up the extra spamassasin conf files and other misc items to make it match the server without issue, all without result. Set the vm.swappiness=1 (as 0 now means to disable the swap altogether), without result. Ran the mysqltuner.pl and got these results (which are the same as the results on the server without the shutdown delay:

Code: Select all

[OK] InnoDB File per table is activated
[OK] InnoDB buffer pool / data size: 3.9G/661.9M
[OK] Ratio InnoDB log file size / InnoDB Buffer pool size: 500.0M * 2/3.9G should be equal 25%
[!!] InnoDB buffer pool instances: 8
[--] InnoDB Buffer Pool Chunk Size not used or defined in your version
[OK] InnoDB Read buffer efficiency: 100.00% (754767404 hits/ 754803238 total)
[!!] InnoDB Write Log efficiency: 85.19% (206209 hits/ 242065 total)
[OK] InnoDB log waits: 0.00% (0 waits / 35856 writes)
I'm still curious about the amavis interaction in the delay... do you know if there is a way to bypass/disable amavis for troubleshooting?

@andrey.ivanov
Thank you for the info, I'll look into it. My concern is automating this for a graceful shutdown in the event of a power outage (which means I won't be logged on to do the steps manually).
PaperAdvocate
Posts: 23
Joined: Tue Oct 11, 2016 9:28 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by PaperAdvocate »

Posting it here as well... I wanted to post what ended up fixing it for me (I've posted about this issue in another thread). Zimbra support had an open ticket with me for a year and never resolved it (I still love Zimbra anyway), for an unrelated reason I had to do a database repair and ran the steps found here: https://wiki.zimbra.com/wiki/Mysql_Crash_Recovery

After that, no more huge delay on shutdown/restart. Hopefully this helps others...
Ace Suares
Advanced member
Advanced member
Posts: 63
Joined: Thu Aug 07, 2014 7:26 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by Ace Suares »

When looking at the log, the amount of 'Modified db pages' stays the same in all Buffer Pools.

This Flush Dirty Pages code does not seem to work all and just sits waiting for the timeout.
User avatar
joho
Advanced member
Advanced member
Posts: 74
Joined: Tue Apr 26, 2016 9:24 am
ZCS/ZD Version: Release 8.8.15.GA.4177.UBUNTU20.64

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by joho »

"Stopping zimlet webapp" still takes 10-15 minutes, and this is 8.8.15.P44.

I've had endless discussions with Zimbra support about this, and it seems like it's just not on their list of things to do any time soon.

Since Z8 is EOL this year, I wouldn't hold my breath 8-)
rainer_d
Advanced member
Advanced member
Posts: 97
Joined: Fri Sep 12, 2014 11:40 pm

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by rainer_d »

I've also got this problem on 8.8.15P44.

I don't quite remember but it also exists on my 10.0.5 test-upgrade.

For a very long time, we ran the mail store on NFS - and Zimbra was happily suggesting this may be the cause of the delay.

After migrating everything to local storage, nothing changed ;-)
Still about 12 minutes delay.
liverpoolfcfan
Elite member
Elite member
Posts: 1121
Joined: Sat Sep 13, 2014 12:47 am

Re: Stopping zimlet webapp... takes 10-15 minutes

Post by liverpoolfcfan »

In monitoring the dirty pages during a shutdown, the number generally goes up shortly after issuing the shutdown request, then may or may not drop down again before setting at some level and staying there until the 600 second timeout occurs. You can monitor the number of dirty pages by opening a shell and as the zimbra user running

Code: Select all

[zimbra@mail ~]$ mysqladmin ext -i1 | grep Innodb_buffer_pool_pages_dirty
while issuing the shutdown in another shell.

Based on this behaviour, combined with the comment in the original bugzilla report on this that someone had experienced no issues with reducing the timeout to 60 seconds, I implemented an additional timer in the loop. I count the number of loops at a steady state, and if that number gets to 60, I terminate the loop at that point (asuming that it is going to remain at that level until the 600 second timeout would occur.) This allows the majority of cleanup to occur uninterrupted.

This code is provided as-is. Use at your own risk.

The change is to /opt/zimbra/bin/zmstorectl - from line 48 (as of zcs 8.8.15_p44). Make sure to keep a copy of the original zmstorectl before modifying.

Code: Select all

  # wait for 600 seconds or until there are no more dirty pages
  local i=0
  local settled_dirty_pages=0
  local seconds_at_settled_dirty_pages=0
  while [ $i -lt 600 ]; do
    local pages=$(${MYSQL} -e "show engine innodb status\G" | grep '^Modified db pages' | grep -Eo '[0-9]+$')
    local total_pages=$(echo $pages | sed 's/ / + /g')
    total_pages=$(( $total_pages ))

    if [ "$total_pages" = "0" ]; then
      break
    fi
	
    #echo "Settled at: $settled_dirty_pages"
    if [ "$total_pages" = "$settled_dirty_pages" ]; then
      ((seconds_at_settled_dirty_pages++))
      #echo "$seconds_at_settled_dirty_pages seconds at Settled Dirty Pages: $settled_dirty_pages."
      if [ "$seconds_at_settled_dirty_pages" = "60" ]; then
        echo "60 seconds at Settled Dirty Pages: $settled_dirty_pages - exiting!"
        break
      fi
    else
      settled_dirty_pages=$total_pages
      echo "New Settled Dirty Pages established at $i seconds: $settled_dirty_pages "
    fi

    #echo -ne "$pages\r"
    i=$((i+1))
    sleep 1
  done
  echo "$seconds_at_settled_dirty_pages seconds at Settled Dirty Pages: $settled_dirty_pages."
You can test this by simply stopping zmstorectl. It is not necessary to shutdown zimbra entirely.

Sample output from running the modified code

Code: Select all

[zimbra@mail ~]$ zmstorectl stop
Stopping mailboxd...done.
New Settled Dirty Pages established at 0 seconds: 169
New Settled Dirty Pages established at 1 seconds: 144
60 seconds at Settled Dirty Pages: 144 - exiting!
60 seconds at Settled Dirty Pages: 144.
Stopping mysqld... done.
[zimbra@mail ~]$ 
Obviously if you want to log an entire 600 second time period to see all the changes (or lack of them) in Settled Dirty Pages, just change the 60 to 600 in line

Code: Select all

      if [ "$seconds_at_settled_dirty_pages" = "60" ]; then
Post Reply