Should it happen, you would see this on running zmcontrol restart
Code: Select all
su - zimbra
zmcontrol restart
...
Stopping mta...Done.
...
Starting mta...Done.
..
Code: Select all
su - zimbra
...
mta Running
...
More specifically it can happen during a reboot or restart and it can provide you with the impression that the MTA is running. It is a result of zimbra using the "kill -0 master.pid" pattern to determine if the MTA is running. If the return call from this is successful, then it will not start the MTA and assumes the MTA is running which appears to be a good optimization by design. Given this master.pid file remains in the filesystem and contains a pid of a previous postfix master process, it can be a problem should another process reclaim the same pid value used in this file after you have stopped postfix... which while super unlikely does happen from time to time given reports in these forums, bugzilla and what I observed this week.
How can a pid have the same pid as a previous running master pid? Easy, the kernel wraps and reuses pid's depending on this value.
Code: Select all
cat /proc/sys/kernel/pid_max
32768
Code: Select all
# echo $$
245
# cat /opt/zimbra/data/postfix/spool/pid/master.pid
350
Code: Select all
grep 'starting the Postfix' /var/log/zimbra.log
Dec 10 07:39:44 relay3 /postfix-script[24222]: starting the Postfix mail system
Why does it happen? Instead of using postfix's status which tests a lock on master.pid, zmmtastatus uses the "kill -0 master.pid" pattern which returns successful because some process is running - just not the MTA. The fix appears simple.
Modify /opt/zimbra/libexec/zmmtastatus:
Code: Select all
% grep -A 2 kill /opt/zimbra/libexec/zmmtastatus
#system("kill -0 $pid 2> /dev/null");
#JAD 12/14/2018 (/opt/zimbra/libexec/zmmtastatus)
system("/opt/zimbra/common/sbin/postfix status 2> /dev/null");
Note: I have reported this to zimbra with my workaround which may not be the proper fix nor has this been confirmed as a bug.
How to workaround this without a patch and keeping the kill -0 pattern. If the reclaimed process is associated with zimbra.... another zmcontrol restart will solve it. Otherwise, rm the master.pid file which will change the logic and force zimbra to always restart the MTA on a start. Similarly, a reboot would likely have the same effect. As would killing the process associated with the master.pid and restarting the MTA ... then restarting the non zimbra process that you killed.