8.6 to 8.7 status shows stopped (SOLVED)

Ask questions about your setup or get help installing ZCS server (ZD section below).
Post Reply
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

8.6 to 8.7 status shows stopped (SOLVED)

Post by JDunphy »

Greetings,

I am stumped. This is a centos 6.8 patched to the latest with 8.7.1 running rsyslogd.

% zmcontrol -v
Release 8.7.1_GA_1670.RHEL6_64_20161025035121 RHEL6_64 NETWORK edition.

I did an upgrade from a working 8.6 system to 8.7.1 and found that the zimbra administrative interface thinks all the services are down. That 'red' x. ;-)
I spent most of the day tracking down false leads with similar problems from the tons of entries about this problem and many with bad links to the old forums.. re-initialized logger, zmstat, looking at zmstat-fd
As an aside... I first went to 8.7 and noticed the problem and then went directly to 8.7.1 hoping for a fix.

% zmcontrol status

reports the correct results and everything appears to work irregardless of what the management interface says with everything down. :-)

I think I should be seeing STATUS messages in /var/log/zimbra-stats.log but I do not. That is what I am currently focusing on.

% zmsoap -z GetServiceStatusRequest

shows all the services down with a '0' value. I don't know how this is implemented but am guessing somewhere they are parsing the logs.

/var/log/zimbra-stats.log is populated by other zimbra processes so rsyslog appears to be setup correctly.

I have a working 8.6 system to compare against so using that to determine why this 8.7 upgrade failed me. I can't find any errors or permission problems in the logs.

I have a support ticket open but not making much progress with that so asking here also. We have been focusing on the zmloggerhostmap and logger database mostly. I can use the advanced statistics in the administrative interfaces under the monitoring function. I don't know if working statistics is related to the service status messages but they are focused on that.

Anyone have any ideas why I might be getting all '0's when I run

% zmsoap -z GetServiceStatusRequest

for all the services? I am running out of ideas where to look next. I am going to focus on zimbramon next.

Thanks in advance.
Last edited by JDunphy on Fri Oct 28, 2016 12:32 am, edited 1 time in total.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: 8.6 to 8.7 status shows stopped

Post by JDunphy »

Here is the solution.

The STATUS lines in /var/log/zimbra-stats.log come from zmstatuslog which is kicked off from cron.

That program runs 'zmcontrol status' and parses this output and writes via syslog to the hostname provided as part of the zmcontrol command.

I ran zmstatuslog and saw the STATUS lines in /var/log/zimbra-stats.log finally.

ps auxw |grep cron

showed it had crashed or wasn't running. I started it by hand and waited 5-10mins and it works perfectly. All green.

If I had just rebooted the machine, it would have worked perfectly. I don't know if the update process is attempts to restart cron but I will be checking that cron is running from now on.
linuradu
Posts: 4
Joined: Mon Apr 24, 2017 6:18 pm

Re: 8.6 to 8.7 status shows stopped (SOLVED)

Post by linuradu »

I had a similar issue and I've tried what you described here but didn't worked for me.

I analysed the file "zmstatuslog" from (/opt/zimbra/libexec/zmstatuslog) and found there that there is a TIMEOUT set for 60 seconds;
The time executaion of "/opt/zimbra/bin/zmcontrol status" took around 90 seconds; Because of that, the code from this file was not executed properly
and this is the reason of "not sync status".

My solution was:
1. open and edit the file: #nano /opt/zimbra/libexec/zmstatuslog
2. change the line: "my $TIMEOUT=60;" to "my $TIMEOUT=360;"
3. Save the file and wait around 5 minutes.

This worked for me; Hopefully will work also for other people :)

If you type: "tail -f /var/log/zimbra-stats.log" you should see the status like, in the log:
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: antispam: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: antivirus: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: dnscache: Running
Apr 24 11:41:07 mail zimbramon[28337]: 28337:info: 2017-04-24 11:40:02, STATUS: mail.test.com: ldap: Running
....

JDunphy, thank you for your post; This led me to solve my issue;
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: 8.6 to 8.7 status shows stopped (SOLVED)

Post by JDunphy »

linuradu wrote:I had a similar issue and I've tried what you described here but didn't worked for me.
My solution was:
1. open and edit the file: #nano /opt/zimbra/libexec/zmstatuslog
2. change the line: "my $TIMEOUT=60;" to "my $TIMEOUT=360;"
3. Save the file and wait around 5 minutes.

This worked for me; Hopefully will work also for other people :);
Interesting. That TIMEOUT works as follows.

1) set alarm for TIMEOUT seconds
2) run zmcontrol status and parse output
3) write to syslog to put those STATUS lines in the log file
4) turn alarm off

The alarm forces the code to stop what it is doing if it's blocked and not finished. That way you don't have a bunch of zmstatuslog scripts running in the background before the next interval to be run by cron.

So you are finding it takes longer than 1 min to perform this??? Given that you probably have run zmcontrol status from the command line and notices it takes 20-30 seconds. It must be blocking on the write to syslogd. Very odd indeed.

I just upgraded from 8.7.1 to 8.7.9 and had stats failed... Saw your comment and looked at the timeout variable you set but zmcontrol was returning fast so I restarted rsyslogd and waited a few minutes.

Thanks for the tip on TIMEOUT which led me to restarting syslog.

Code: Select all

/etc/init.d/rsyslog restart
PS...this problem was self induced by me in this case for syslog because I have some rsyslog filters and order is important so I did a vi /etc/rsyslog.conf while the update was happening since zimbra updates this file. Once it was updated, I just forced my vi session to write the original back out. Not Zimbra's issue and I failed to notice the status issue since I don't use the monitor much. I had assumed that rsyslog was smart enough to know that the date stamp had changed and reread its config. I know better to think like this because syslog needed a reload. :-)
adrastos2006
Posts: 5
Joined: Fri Feb 17, 2017 7:00 am

Re: 8.6 to 8.7 status shows stopped (SOLVED)

Post by adrastos2006 »

Just wanted to post that this solution also worked for me in a recent fresh install of zimbra 8.8.15 on ubuntu 18.04.
rpocamilo
Posts: 2
Joined: Wed Jul 01, 2020 5:18 pm

Re: 8.6 to 8.7 status shows stopped (SOLVED)

Post by rpocamilo »

Hello

I did all mentioned above with no luck.

Then i found that the script /opt/zimbra/libexec/zmstatuslog was running well if i executed manually (it writes STATUS lines in the log). But when running in crontab it wasn't writing STATUS lines to the log (/var/log/zimbra-stats.log).

I inspected the code inside the script and compared with another zimbra server that was working and found some diferences. So i copied the script from the OK server to the server with the Red X in status web page and everithing worked well again ( Green status on webpage and STATUS lines got logged on /var/log/zimbra-stats.log ).

This is the "good script" content in case someone needs to use it.

Code: Select all

#!/usr/bin/perl
#
# ***** BEGIN LICENSE BLOCK *****
# Zimbra Collaboration Suite Server
# Copyright (C) 2005, 2006, 2007, 2008, 2009, 2010, 2013, 2014, 2015, 2016 Synacor, Inc.
#
# This program is free software: you can redistribute it and/or modify it under
# the terms of the GNU General Public License as published by the Free Software Foundation,
# version 2 of the License.
#
# This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY;
# without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# See the GNU General Public License for more details.
# You should have received a copy of the GNU General Public License along with this program.
# If not, see <https://www.gnu.org/licenses/>.
# ***** END LICENSE BLOCK *****
#

use strict;

use lib "/opt/zimbra/common/lib/perl5";
use  Zimbra::Util::Common;
use Zimbra::Mon::Logger;

# Exit if software-only node.
exit(0) unless (-f "/opt/zimbra/conf/localconfig.xml");

$SIG{ALRM} = \&catchAlarm;

my $platform=qx(/opt/zimbra/libexec/get_plat_tag.sh);
chomp $platform;

my $pidFile="/opt/zimbra/log/zmstatuslog.pid";

my $TIMEOUT=60;
my $MNTCMD;
my $DFCMD;
if ($platform eq "MACOSX") {
        $MNTCMD = "mount -t hfs";
        $DFCMD = "df -ml ";
} else {
        $MNTCMD = "mount -t ext3";
        $DFCMD = "df -ml ";
}

my $dt = qx(date "+%Y-%m-%d %H:%M:%S");
chomp $dt;

my $hostname;

checkPid();
logStatus();
clearPid();

exit 0;

sub logStatus {
        my @status = ();
  alarm($TIMEOUT);
        open STATUS, "/opt/zimbra/bin/zmcontrol status |" or die "Can't get status: $!";
        @status = <STATUS>;
        close STATUS;
        foreach my $s (@status) {
                if ($s =~ /is not/) {
                        next;
                }
                chomp $s;
                if ($s =~ /^Host (.*)/) {
                        $hostname = $1;
                        next;
                }
                $s =~ s/ webapp//;
                my ($service, $stat) = split ' ', $s, 2;
                Zimbra::Mon::Logger::LogStats( "info", "$dt, STATUS: ${hostname}: $service: $stat" );
        }
  alarm(0);
}

sub checkPid {
  if (-f "$pidFile") {
    my $P = qx(cat $pidFile);
    chomp $P;
    if ($P ne "") {
      system("kill -0 $P 2> /dev/null");
      if ($? == 0) {
        print "$0 already running with pid $P\n";
        exit 0;
      }
    }
  }
  qx(echo $$ > "$pidFile");
}

sub clearPid {
  unlink($pidFile);
}

sub catchAlarm {
                Zimbra::Mon::Logger::LogStats( "info", "zmstatuslog timeout after $TIMEOUT seconds");
    exit 1;
}
Post Reply