Greetings,
I am stumped. This is a centos 6.8 patched to the latest with 8.7.1 running rsyslogd.
% zmcontrol -v
Release 8.7.1_GA_1670.RHEL6_64_20161025035121 RHEL6_64 NETWORK edition.
I did an upgrade from a working 8.6 system to 8.7.1 and found that the zimbra administrative interface thinks all the services are down. That 'red' x.
I spent most of the day tracking down false leads with similar problems from the tons of entries about this problem and many with bad links to the old forums.. re-initialized logger, zmstat, looking at zmstat-fd
As an aside... I first went to 8.7 and noticed the problem and then went directly to 8.7.1 hoping for a fix.
% zmcontrol status
reports the correct results and everything appears to work irregardless of what the management interface says with everything down.
I think I should be seeing STATUS messages in /var/log/zimbra-stats.log but I do not. That is what I am currently focusing on.
% zmsoap -z GetServiceStatusRequest
shows all the services down with a '0' value. I don't know how this is implemented but am guessing somewhere they are parsing the logs.
/var/log/zimbra-stats.log is populated by other zimbra processes so rsyslog appears to be setup correctly.
I have a working 8.6 system to compare against so using that to determine why this 8.7 upgrade failed me. I can't find any errors or permission problems in the logs.
I have a support ticket open but not making much progress with that so asking here also. We have been focusing on the zmloggerhostmap and logger database mostly. I can use the advanced statistics in the administrative interfaces under the monitoring function. I don't know if working statistics is related to the service status messages but they are focused on that.
Anyone have any ideas why I might be getting all '0's when I run
% zmsoap -z GetServiceStatusRequest
for all the services? I am running out of ideas where to look next. I am going to focus on zimbramon next.
Thanks in advance.
8.6 to 8.7 status shows stopped (SOLVED)
- JDunphy
- Outstanding Member
- Posts: 889
- Joined: Fri Sep 12, 2014 11:18 pm
- Location: Victoria, BC
- ZCS/ZD Version: 9.0.0_P39 NETWORK Edition
8.6 to 8.7 status shows stopped (SOLVED)
Last edited by JDunphy on Fri Oct 28, 2016 12:32 am, edited 1 time in total.
- JDunphy
- Outstanding Member
- Posts: 889
- Joined: Fri Sep 12, 2014 11:18 pm
- Location: Victoria, BC
- ZCS/ZD Version: 9.0.0_P39 NETWORK Edition
Re: 8.6 to 8.7 status shows stopped
Here is the solution.
The STATUS lines in /var/log/zimbra-stats.log come from zmstatuslog which is kicked off from cron.
That program runs 'zmcontrol status' and parses this output and writes via syslog to the hostname provided as part of the zmcontrol command.
I ran zmstatuslog and saw the STATUS lines in /var/log/zimbra-stats.log finally.
ps auxw |grep cron
showed it had crashed or wasn't running. I started it by hand and waited 5-10mins and it works perfectly. All green.
If I had just rebooted the machine, it would have worked perfectly. I don't know if the update process is attempts to restart cron but I will be checking that cron is running from now on.
The STATUS lines in /var/log/zimbra-stats.log come from zmstatuslog which is kicked off from cron.
That program runs 'zmcontrol status' and parses this output and writes via syslog to the hostname provided as part of the zmcontrol command.
I ran zmstatuslog and saw the STATUS lines in /var/log/zimbra-stats.log finally.
ps auxw |grep cron
showed it had crashed or wasn't running. I started it by hand and waited 5-10mins and it works perfectly. All green.
If I had just rebooted the machine, it would have worked perfectly. I don't know if the update process is attempts to restart cron but I will be checking that cron is running from now on.
Re: 8.6 to 8.7 status shows stopped (SOLVED)
I had a similar issue and I've tried what you described here but didn't worked for me.
I analysed the file "zmstatuslog" from (/opt/zimbra/libexec/zmstatuslog) and found there that there is a TIMEOUT set for 60 seconds;
The time executaion of "/opt/zimbra/bin/zmcontrol status" took around 90 seconds; Because of that, the code from this file was not executed properly
and this is the reason of "not sync status".
My solution was:
1. open and edit the file: #nano /opt/zimbra/libexec/zmstatuslog
2. change the line: "my $TIMEOUT=60;" to "my $TIMEOUT=360;"
3. Save the file and wait around 5 minutes.
This worked for me; Hopefully will work also for other people
If you type: "tail -f /var/log/zimbra-stats.log" you should see the status like, in the log:
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: antispam: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: antivirus: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: dnscache: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: ldap: Running
....
JDunphy, thank you for your post; This led me to solve my issue;
I analysed the file "zmstatuslog" from (/opt/zimbra/libexec/zmstatuslog) and found there that there is a TIMEOUT set for 60 seconds;
The time executaion of "/opt/zimbra/bin/zmcontrol status" took around 90 seconds; Because of that, the code from this file was not executed properly
and this is the reason of "not sync status".
My solution was:
1. open and edit the file: #nano /opt/zimbra/libexec/zmstatuslog
2. change the line: "my $TIMEOUT=60;" to "my $TIMEOUT=360;"
3. Save the file and wait around 5 minutes.
This worked for me; Hopefully will work also for other people
If you type: "tail -f /var/log/zimbra-stats.log" you should see the status like, in the log:
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: antispam: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: antivirus: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: dnscache: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: ldap: Running
....
JDunphy, thank you for your post; This led me to solve my issue;
- JDunphy
- Outstanding Member
- Posts: 889
- Joined: Fri Sep 12, 2014 11:18 pm
- Location: Victoria, BC
- ZCS/ZD Version: 9.0.0_P39 NETWORK Edition
Re: 8.6 to 8.7 status shows stopped (SOLVED)
Interesting. That TIMEOUT works as follows.linuradu wrote:I had a similar issue and I've tried what you described here but didn't worked for me.
My solution was:
1. open and edit the file: #nano /opt/zimbra/libexec/zmstatuslog
2. change the line: "my $TIMEOUT=60;" to "my $TIMEOUT=360;"
3. Save the file and wait around 5 minutes.
This worked for me; Hopefully will work also for other people ;
1) set alarm for TIMEOUT seconds
2) run zmcontrol status and parse output
3) write to syslog to put those STATUS lines in the log file
4) turn alarm off
The alarm forces the code to stop what it is doing if it's blocked and not finished. That way you don't have a bunch of zmstatuslog scripts running in the background before the next interval to be run by cron.
So you are finding it takes longer than 1 min to perform this??? Given that you probably have run zmcontrol status from the command line and notices it takes 20-30 seconds. It must be blocking on the write to syslogd. Very odd indeed.
I just upgraded from 8.7.1 to 8.7.9 and had stats failed... Saw your comment and looked at the timeout variable you set but zmcontrol was returning fast so I restarted rsyslogd and waited a few minutes.
Thanks for the tip on TIMEOUT which led me to restarting syslog.
Code: Select all
/etc/init.d/rsyslog restart
-
- Posts: 5
- Joined: Fri Feb 17, 2017 7:00 am
Re: 8.6 to 8.7 status shows stopped (SOLVED)
Just wanted to post that this solution also worked for me in a recent fresh install of zimbra 8.8.15 on ubuntu 18.04.
Re: 8.6 to 8.7 status shows stopped (SOLVED)
Hello
I did all mentioned above with no luck.
Then i found that the script /opt/zimbra/libexec/zmstatuslog was running well if i executed manually (it writes STATUS lines in the log). But when running in crontab it wasn't writing STATUS lines to the log (/var/log/zimbra-stats.log).
I inspected the code inside the script and compared with another zimbra server that was working and found some diferences. So i copied the script from the OK server to the server with the Red X in status web page and everithing worked well again ( Green status on webpage and STATUS lines got logged on /var/log/zimbra-stats.log ).
This is the "good script" content in case someone needs to use it.
I did all mentioned above with no luck.
Then i found that the script /opt/zimbra/libexec/zmstatuslog was running well if i executed manually (it writes STATUS lines in the log). But when running in crontab it wasn't writing STATUS lines to the log (/var/log/zimbra-stats.log).
I inspected the code inside the script and compared with another zimbra server that was working and found some diferences. So i copied the script from the OK server to the server with the Red X in status web page and everithing worked well again ( Green status on webpage and STATUS lines got logged on /var/log/zimbra-stats.log ).
This is the "good script" content in case someone needs to use it.
Code: Select all
#!/usr/bin/perl
#
# ***** BEGIN LICENSE BLOCK *****
# Zimbra Collaboration Suite Server
# Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2013, 2014, 2015, 2016 Synacor, Inc.
#
# This program is free software: you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software Foundation,
# version 2 of the License.
#
# This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
# without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# See the GNU General Public License for more details.
# You should have received a copy of the GNU General Public License along with this program.
# If not, see <https://www.gnu.org/licenses/>.
# ***** END LICENSE BLOCK *****
#
use strict;
use lib "/opt/zimbra/common/lib/perl5";
use Zimbra::Util::Common;
use Zimbra::Mon::Logger;
# Exit if software-only node.
exit(0) unless (-f "/opt/zimbra/conf/localconfig.xml");
$SIG{ALRM} = \&catchAlarm;
my $platform=qx(/opt/zimbra/libexec/get_plat_tag.sh);
chomp $platform;
my $pidFile="/opt/zimbra/log/zmstatuslog.pid";
my $TIMEOUT=60;
my $MNTCMD;
my $DFCMD;
if ($platform eq "MACOSX") {
$MNTCMD = "mount -t hfs";
$DFCMD = "df -ml ";
} else {
$MNTCMD = "mount -t ext3";
$DFCMD = "df -ml ";
}
my $dt = qx(date "+%Y-%m-%d %H:%M:%S");
chomp $dt;
my $hostname;
checkPid();
logStatus();
clearPid();
exit 0;
sub logStatus {
my @status = ();
alarm($TIMEOUT);
open STATUS, "/opt/zimbra/bin/zmcontrol status |" or die "Can't get status: $!";
@status = <STATUS>;
close STATUS;
foreach my $s (@status) {
if ($s =~ /is not/) {
next;
}
chomp $s;
if ($s =~ /^Host (.*)/) {
$hostname = $1;
next;
}
$s =~ s/ webapp//;
my ($service, $stat) = split ' ', $s, 2;
Zimbra::Mon::Logger::LogStats( "info", "$dt, STATUS: ${hostname}: $service: $stat" );
}
alarm(0);
}
sub checkPid {
if (-f "$pidFile") {
my $P = qx(cat $pidFile);
chomp $P;
if ($P ne "") {
system("kill -0 $P 2> /dev/null");
if ($? == 0) {
print "$0 already running with pid $P\n";
exit 0;
}
}
}
qx(echo $$ > "$pidFile");
}
sub clearPid {
unlink($pidFile);
}
sub catchAlarm {
Zimbra::Mon::Logger::LogStats( "info", "zmstatuslog timeout after $TIMEOUT seconds");
exit 1;
}