Broken 8.6 install

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
Post Reply
illydone
Posts: 4
Joined: Fri Jan 24, 2020 1:53 pm

Broken 8.6 install

Post by illydone »

Hello

Till yesterday I had a fully functional hosted zimbra server. It was a custom install, Ubuntu 14.04, ZCS 8.6.0.

Thanks to my provider who decided to reboot my hosted server, I now have a broken install: after reboot, I don't have mailbox nor mailbox_manager services anymore.

Last zmmailboxd.out is the following. Service never restarts although I reboot server and lauch zmcontrol restart for services.

Code: Select all

# tail /opt/zimbra/log/zmmailboxd.out

zmthrdump: Requested thread dump [PID 26172] at Thu Jan 23 12:22:30 2020
2020-01-23 12:22:32.229:INFO:oejs.ServerConnector:Thread-12: Stopped ServerConnector@59ec2012{HTTP/1.1}{localhost:8080}
2020-01-23 12:22:32.252:INFO:oejs.ServerConnector:Thread-12: Stopped ServerConnector@4b952a2d{SSL-http/1.1}{0.0.0.0:8443}
2020-01-23 12:22:32.258:INFO:oejs.ServerConnector:Thread-12: Stopped ServerConnector@73846619{SSL-http/1.1}{0.0.0.0:7071}
2020-01-23 12:22:32.260:INFO:oejs.ServerConnector:Thread-12: Stopped ServerConnector@29ca901e{HTTP/1.1}{0.0.0.0:7072}
2020-01-23 12:22:32.288:INFO:oejsh.ContextHandler:Thread-12: Stopped o.e.j.w.WebAppContext@5e25a92e{/zimlet,[file:/opt/zimbra/jetty-distribution-9.1.5.v20140505/webapps/zimlet/, file:/opt/zimbra/zimlets-deployed/],UNAVAILABLE}{/zimlet}
2020-01-23 12:22:32.383:INFO:oejsh.ContextHandler:Thread-12: Stopped o.e.j.w.WebAppContext@76329302{/zimbraAdmin,file:/opt/zimbra/jetty-distribution-9.1.5.v20140505/webapps/zimbraAdmin/,UNAVAILABLE}{/zimbraAdmin}
2020-01-23 12:22:32.503:INFO:oejsh.ContextHandler:Thread-12: Stopped o.e.j.w.WebAppContext@71a794e5{/,file:/opt/zimbra/jetty-distribution-9.1.5.v20140505/webapps/zimbra/,UNAVAILABLE}{/zimbra}
2020-01-23 12:22:35.752:INFO:oejsh.ContextHandler:Thread-12: Stopped o.e.j.w.WebAppContext@4e41089d{/service,file:/opt/zimbra/jetty-distribution-9.1.5.v20140505/webapps/service/,UNAVAILABLE}{/service}
All services seem ok but it's wrong:

Code: Select all

$ zmcontrol status
Host myhost.mydomain.com
        amavis                  Running
        dnscache                Running
        ldap                    Running
        logger                  Running
        mailbox                 Running
        memcached               Running
        mta                     Running
        opendkim                Running
        proxy                   Running
        service webapp          Running
        snmp                    Running
        spell                   Running
        stats                   Running
        zimbra webapp           Running
        zimbraAdmin webapp      Running
        zimlet webapp           Running
        zmconfigd               Running
Some services aren't started (webapp):

Code: Select all

# netstat -tlnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name
tcp        0      0 127.0.0.1:53            0.0.0.0:*               LISTEN      6504/unbound    
tcp        0      0 0.0.0.0:22              0.0.0.0:*               LISTEN      829/sshd        
tcp        0      0 0.0.0.0:25              0.0.0.0:*               LISTEN      7361/master     
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      7011/nginx.conf 
tcp        0      0 127.0.0.1:23232         0.0.0.0:*               LISTEN      7040/perl       
tcp        0      0 127.0.0.1:23233         0.0.0.0:*               LISTEN      7042/perl       
tcp        0      0 0.0.0.0:993             0.0.0.0:*               LISTEN      7011/nginx.conf 
tcp        0      0 127.0.0.1:7171          0.0.0.0:*               LISTEN      16191/java      
tcp        0      0 0.0.0.0:995             0.0.0.0:*               LISTEN      7011/nginx.conf 
tcp        0      0 195.96.97.98:389        0.0.0.0:*               LISTEN      27136/slapd     
tcp        0      0 127.0.0.1:10663         0.0.0.0:*               LISTEN      6929/zmlogger: zmrr
tcp        0      0 127.0.0.1:10024         0.0.0.0:*               LISTEN      7068/amavisd (maste
tcp        0      0 127.0.0.1:10025         0.0.0.0:*               LISTEN      7361/master     
tcp        0      0 127.0.0.1:10026         0.0.0.0:*               LISTEN      7068/amavisd (maste
tcp        0      0 127.0.0.1:7306          0.0.0.0:*               LISTEN      6913/mysqld     
tcp        0      0 127.0.0.1:10027         0.0.0.0:*               LISTEN      7361/master     
tcp        0      0 0.0.0.0:587             0.0.0.0:*               LISTEN      7361/master     
tcp        0      0 0.0.0.0:11211           0.0.0.0:*               LISTEN      6995/memcached  
tcp        0      0 127.0.0.1:10028         0.0.0.0:*               LISTEN      7361/master     
tcp        0      0 127.0.0.1:10029         0.0.0.0:*               LISTEN      7361/master     
tcp        0      0 127.0.0.1:10030         0.0.0.0:*               LISTEN      7361/master     
tcp        0      0 0.0.0.0:110             0.0.0.0:*               LISTEN      7011/nginx.conf 
tcp        0      0 0.0.0.0:143             0.0.0.0:*               LISTEN      7011/nginx.conf 
tcp        0      0 127.0.0.1:10032         0.0.0.0:*               LISTEN      7068/amavisd (maste
tcp        0      0 0.0.0.0:465             0.0.0.0:*               LISTEN      7361/master     
tcp        0      0 127.0.0.1:8465          0.0.0.0:*               LISTEN      7111/opendkim   
tcp6       0      0 :::22                   :::*                    LISTEN      829/sshd        
tcp6       0      0 :::7780                 :::*                    LISTEN      7138/httpd      
tcp6       0      0 ::1:10024               :::*                    LISTEN      7068/amavisd (maste
tcp6       0      0 ::1:10026               :::*                    LISTEN      7068/amavisd (maste
tcp6       0      0 :::11211                :::*                    LISTEN      6995/memcached  
tcp6       0      0 ::1:10032               :::*                    LISTEN      7068/amavisd (maste
Start and stop aren't working any more. A few days ago, the stop action stopped all services, now, it ends just after webapp.

Code: Select all

$ zmcontrol stop; zmcontrol start
Host myhost.mydomain.com
        Stopping vmware-ha...skipped.
                /opt/zimbra/bin/zmhactl missing or not executable.
        Stopping zmconfigd...Done.
        Stopping zimlet webapp...Terminated
Host myhost.mydomain.com
        Starting zmconfigd...Done.
        Starting dnscache...Done.
        Starting logger...Done.
        Starting mailbox...Done.
        Starting memcached...Done.
        Starting proxy...Done.
        Starting amavis...Done.
        Starting opendkim...Done.
        Starting snmp...Done.
        Starting spell...Done.
        Starting mta...Done.
        Starting stats...Done.
        Starting service webapp...Done.
        Starting zimbra webapp...Done.
        Starting zimbraAdmin webapp...Done.
        Starting zimlet webapp...Done.
I tried zmfixperms and an upgrade 8.6->8.6 (same version) to reset all elements and it doesn't change anything.
During this "upgrade", my ldap db and messages db seems ok (no error encountered during process).

I also checks this: https://wiki.zimbra.com/wiki/Zimbra_Adm ... t_is_blank
But enabled services are ok.

I don't know where to search. Any help appreciated.
Last edited by illydone on Sun Jan 26, 2020 6:59 pm, edited 5 times in total.
User avatar
DualBoot
Elite member
Elite member
Posts: 1326
Joined: Mon Apr 18, 2016 8:18 pm
Location: France - Earth
ZCS/ZD Version: ZCS FLOSS - 8.8.15 Mutli servers
Contact:

Re: Broken 8.6 install

Post by DualBoot »

Hello,

as far I can understand your post, your Zimbra is running well. The VMware HA service is not important.
What are the symptoms of your problem and did you investigate into mailbox.log ?

Regards
illydone
Posts: 4
Joined: Fri Jan 24, 2020 1:53 pm

Re: Broken 8.6 install

Post by illydone »

Before this problem, i didn't have any vmware-ha warning and all services appeared in the "zmcontrol stop" output.. Now, I only have the vmware-ha, the zmconfigd and the zimlet webapp lines and "terminated" as the command ends brutally. And after zmcontrol stop, only zmconfigd is really stopped.

When I start the services (after reboot or killing them), everything seems ok but no zimbra webapp, zimbraAdmin webapp, zimlet webapp although they all are marked as running in the status. :7071 admin isn't working at all. And on the site, I get the page:
HTTP ERROR 502

Problem accessing ZCS upstream server. Cannot connect to the ZCS upstream server. Connection is refused.
Possible reasons:

upstream server is unreachable
upstream server is currently being upgraded
upstream server is down

Please contact your ZCS administrator to fix the problem.


Powered by Nginx-Zimbra://
And I've no mailbox.log nor zmmailboxd.out since the reboot on thursday. I tried some zmXXXctl start (AFAIK it is zmstorectl) with no effect.

/var/log/zimbra.log has lots of errors. Here are the most common:

Code: Select all

Jan 25 13:17:17 zimbra zmmailboxdmgr[17310]: file /opt/zimbra/log/zmmailboxd_manager.pid does not exist
Jan 25 13:17:17 zimbra zmmailboxdmgr[17310]: assuming no other instance is running
And sometimes:

Code: Select all

Jan 25 13:16:20 zimbra postfix/lmtp[17209]: connect to zimbra.mydom.com[195.96.97.98]:7025: Connection refused
Jan 25 13:16:20 zimbra postfix/lmtp[17209]: D2526203235: to=<admin@zimbra.mydom.com>, orig_to=<root@zimbra.mydom.com>, relay=none, delay=84599, delays=84599/0.03/0/0, dsn=4.4.1, status=deferred (connect to zimbra.mydom.com[195.96.97.98]:7025: Connection refused)
I also tried connexion to IMAP services, but it doesn't work:

Code: Select all

2020/01/26 19:53:33 [info] 7012#0: *54843 client 1.2.3.4:22436 connected to 0.0.0.0:993
2020/01/26 19:53:33 [error] 7012#0: *54843 All nginx lookup handlers are unavailable while in mail zmauth state, client: 1.2.3.4:22436, server: 0.0.0.0:993, login: "mylogin@mydom.com"
2020/01/26 19:53:33 [error] 7012#0: *54843 zm lookup: all lookup handlers exhausted while in mail zmauth state, client: 1.2.3.4:22436, server: 0.0.0.0:993, login: "mylogin@mydom.com"
2020/01/26 19:53:33 [error] 7012#0: *54843 An error occurred in mail zmauth: no valid lookup handlers while in mail zmauth state, client: 1.2.3.4:22436, server: 0.0.0.0:993, login: "mylogin@mydom.com"
I don't know why services don't start any more. I don't know how to force them to start, to have at least some logs to understand. I don't know what's wrong and how to investigate the problem as it seems there's nothing in the logs. It's as if they were disabled after the brutal reboot. It's the first time I have such a problem with an install.
illydone
Posts: 4
Joined: Fri Jan 24, 2020 1:53 pm

Re: Broken 8.6 install

Post by illydone »

I don't believe it but I just deleted the /opt/zimbra/log/zmmailboxd.pid and everything seems to work again. I have to investigate further to be sure everything is ok again but a corrupted pid file can stop everything... :shock:
Post Reply