Issue with multi-server and multi-master replica

Ask questions about your setup or get help installing ZCS server (ZD section below).
Post Reply
Dark-Vex
Posts: 3
Joined: Tue Feb 27, 2018 1:32 pm

Issue with multi-server and multi-master replica

Post by Dark-Vex »

Hello @all!

I'm testing on a test environment zimbra collaboration 8.8.6 multi server with multi master replica on Centos 7 by following those tutorials (https://zimbra.github.io/installguides/ ... multi.html) and (https://wiki.zimbra.com/wiki/LDAP_Multi ... eplication) but I have some issues. Let's try to explain my setup.

I have three nodes:
- 1 Node with Zimbra proxy+memcached+mta (zimbraproxy.domain.local)
- 1 Node with Zimbra LDAP(Master)+mta+mailbox (zimbra1.domain.local)
- 1 Node with Zimbra LDAP(Second Master)+mta+mailbox (zimbra2.domain.local)

Image

When I poweroff or stop the zimbra services on the primary Master LDAP (zimbra1.domain.local) the Zimbra Proxy will return in the user browser an error messages that is not able to contact the upstream servers but the secondary server is up (zimbra2.domain.local).

From SSH of the Zimbra proxy if I try to see all reverse proxy urls I receive the following error:

Code: Select all

[zimbra@zimbraproxy ~]$ zmprov  -garpu 
[] INFO: master is down, falling back to replica...
[] FATAL: failed to initialize LDAP client
com.zimbra.cs.ldap.LdapException: LDAP error: : An error occurred while attempting to connect to server zimbra1.domain.local:389:  java.io.IOException: An error occurred while attempting to establish a connection to server zimbra1.domain.local:389:  java.net.ConnectException: Connection refused (Connection refused)
ExceptionId:main:1519145568985:0c603f46c564b918
Code:ldap.LDAP_ERROR
	at com.zimbra.cs.ldap.LdapException.LDAP_ERROR(LdapException.java:90)
	at com.zimbra.cs.ldap.unboundid.UBIDLdapException.mapToLdapException(UBIDLdapException.java:74)
	at com.zimbra.cs.ldap.unboundid.UBIDLdapException.mapToLdapException(UBIDLdapException.java:40)
	at com.zimbra.cs.ldap.unboundid.LdapConnectionPool.createConnPool(LdapConnectionPool.java:117)
	at com.zimbra.cs.ldap.unboundid.LdapConnectionPool.createConnectionPool(LdapConnectionPool.java:63)
	at com.zimbra.cs.ldap.unboundid.UBIDLdapContext.init(UBIDLdapContext.java:111)
	at com.zimbra.cs.ldap.unboundid.UBIDLdapClient.init(UBIDLdapClient.java:39)
	at com.zimbra.cs.ldap.LdapClient.getInstanceIfLDAPavailable(LdapClient.java:62)
	at com.zimbra.cs.ldap.LdapClient.getInstance(LdapClient.java:69)
	at com.zimbra.cs.ldap.LdapClient.initialize(LdapClient.java:94)
	at com.zimbra.cs.account.ldap.LdapProv.<init>(LdapProv.java:47)
	at com.zimbra.cs.account.ldap.LdapProvisioning.<init>(LdapProvisioning.java:279)
	at com.zimbra.cs.account.ldap.LdapProvisioning.<init>(LdapProvisioning.java:276)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
	at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
	at java.lang.Class.newInstance(Class.java:442)
	at com.zimbra.cs.account.Provisioning.getInstance(Provisioning.java:341)
	at com.zimbra.cs.account.Provisioning.getInstance(Provisioning.java:297)
	at com.zimbra.cs.account.ProvUtil.initProvisioning(ProvUtil.java:1004)
	at com.zimbra.cs.account.ProvUtil.main(ProvUtil.java:3955)
Caused by: LDAPException(resultCode=91 (connect error), errorMessage='An error occurred while attempting to connect to server zimbra1.domain.local:389:  java.io.IOException: An error occurred while attempting to establish a connection to server zimbra1.domain.local:389:  java.net.ConnectException: Connection refused (Connection refused)')
	at com.unboundid.ldap.sdk.LDAPConnection.connect(LDAPConnection.java:754)
	at com.unboundid.ldap.sdk.LDAPConnection.connect(LDAPConnection.java:686)
	at com.unboundid.ldap.sdk.LDAPConnection.<init>(LDAPConnection.java:518)
	at com.unboundid.ldap.sdk.SingleServerSet.getConnection(SingleServerSet.java:229)
	at com.unboundid.ldap.sdk.ServerSet.getConnection(ServerSet.java:98)
	at com.unboundid.ldap.sdk.LDAPConnectionPool.createConnection(LDAPConnectionPool.java:938)
	at com.unboundid.ldap.sdk.LDAPConnectionPool.<init>(LDAPConnectionPool.java:876)
	at com.unboundid.ldap.sdk.LDAPConnectionPool.<init>(LDAPConnectionPool.java:779)
	at com.unboundid.ldap.sdk.LDAPConnectionPool.<init>(LDAPConnectionPool.java:726)
	at com.zimbra.cs.ldap.unboundid.LdapConnectionPool.createConnPool(LdapConnectionPool.java:114)
	... 18 more
Caused by: java.io.IOException: An error occurred while attempting to establish a connection to server zimbra1.domain.local:389:  java.net.ConnectException: Connection refused (Connection refused)
	at com.unboundid.ldap.sdk.LDAPConnectionInternals.<init>(LDAPConnectionInternals.java:137)
	at com.unboundid.ldap.sdk.LDAPConnection.connect(LDAPConnection.java:744)
	... 27 more
On the secondary server I have also this error:

Code: Select all

Feb 20 17:55:08 zimbra2 zmconfigd[12775]: Fetching All configs
Feb 20 17:55:08 zimbra2 zmconfigd[12775]: All configs fetched in 0.04 seconds
Feb 20 17:55:12 zimbra2 zmconfigd[12775]: Watchdog: service antivirus status is OK.
Feb 20 17:55:12 zimbra2 zmconfigd[12775]: All rewrite threads completed in 0.00 sec
Feb 20 17:55:12 zimbra2 zmconfigd[12775]: All restarts completed in 0.00 sec
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: Fetching All configs
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: Skipping Global system configuration update.
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: gacf <function getglobal at 0x3>(None) 1 com.zimbra.common.service.ServiceException: system failure: unable to get config ExceptionId:gc:1519145716210:c0714e0e2394485d Code:service.FAILURE
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: Skipping Configuration for server  update.
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: gs <function getserver at 0x5>(None) 1 com.zimbra.common.service.ServiceException: system failure: unable to lookup server by name: zimbraproxy.domain.local message: LDAP error:  - unable to get connection: ldap host=: An error occurred while attempting to connect to server zimbra1.domain.local:389:  java.io.IOException: An error occurred while attempting to establish a connection to server zimbra1.domain.local:389:  java.net.ConnectException: Connection refused (Connection refused) ExceptionId:sc:1519145716218:c0714e0e2394485d Code:service.FAILURE
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: Skipping All Reverse Proxy URLs update.
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: getAllReverseProxyURLs <function garpu at 0x4>(None) 1 com.zimbra.common.service.ServiceException: system failure: unable to list all servers ExceptionId:mc:1519145716221:c0714e0e2394485d Code:service.FAILURE
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: Skipping All Reverse Proxy Backends update.
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: getAllReverseProxyBackends <function garpb at 0x6>(None) 1 com.zimbra.common.service.ServiceException: system failure: unable to list all servers ExceptionId:mc:1519145716224:c0714e0e2394485d Code:service.FAILURE
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: Skipping All Memcached Servers update.
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: getAllMemcachedServers <function gamcs at 0x7>(None) 1 com.zimbra.common.service.ServiceException: system failure: unable to list all servers ExceptionId:mc:1519145716231:c0714e0e2394485d Code:service.FAILURE
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: Skipping All MTA Authentication Target URLs update.
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: getAllMtaAuthURLs <function gamau at 0x8>(None) 1 com.zimbra.common.service.ServiceException: system failure: unable to list all servers ExceptionId:mc:1519145716235:c0714e0e2394485d Code:service.FAILURE
Feb 20 17:55:16 zimbraproxy zmconfigd[11768]: All configs fetched in 0.04 seconds
Feb 20 17:55:18 zimbraproxy zmconfigd[11768]: Watchdog: service antivirus status is OK.
Feb 20 17:55:18 zimbraproxy zmconfigd[11768]: All rewrite threads completed in 0.00 sec
Feb 20 17:55:18 zimbraproxy zmconfigd[11768]: All restarts completed in 0.00 sec
I think I forgot do to something else in order to get it working.. what else I can check?
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2800
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.7 Network Edition
Contact:

Re: Issue with multi-server and multi-master replica

Post by L. Mark Stone »

One part of this is working as designed...

When a user contacts the proxy, until the user authenticates, the proxy has no idea on which mailbox server the user's mailbox resides. So, the proxy asks a random mailbox server to please paint the login page. After the user logs in, the proxy then knows to which mailbox server all future requests should be directed.

So, if you have two mailbox servers, and one is down, 50% of the login attempts will fail to paint the login screen. And yes, the proxy does zmprov garpu to Get All Reverse Proxy Upstream servers.

The other part is to check the localconfig variables ldap_url and ldap_master_url on the proxy server and be sure that both LDAP servers are listed in both variables (if you are using LDAP MMR), or, that at least both LDAP servers are listed in ldap_url if you are using a traditional master-replica LDAP configuration.

Hope that helps,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
Dark-Vex
Posts: 3
Joined: Tue Feb 27, 2018 1:32 pm

Re: Issue with multi-server and multi-master replica

Post by Dark-Vex »

Hello Mark

Thank you for your reply!
L. Mark Stone wrote:One part of this is working as designed...

When a user contacts the proxy, until the user authenticates, the proxy has no idea on which mailbox server the user's mailbox resides. So, the proxy asks a random mailbox server to please paint the login page. After the user logs in, the proxy then knows to which mailbox server all future requests should be directed.

So, if you have two mailbox servers, and one is down, 50% of the login attempts will fail to paint the login screen. And yes, the proxy does zmprov garpu to Get All Reverse Proxy Upstream servers.
ok that's clear, it acts like a normal reverse proxy :)
L. Mark Stone wrote: The other part is to check the localconfig variables ldap_url and ldap_master_url on the proxy server and be sure that both LDAP servers are listed in both variables (if you are using LDAP MMR), or, that at least both LDAP servers are listed in ldap_url if you are using a traditional master-replica LDAP configuration.

Hope that helps,
Mark
This is a big hint! and I think that I have figures out the problem, on the proxy server there is only one LDAP server specified in ldap_url and ldap_master_url, both server have been specified during the configuration of MMR on both LDAP Server but I didn't this during the proxy server configuration. I wrongly supposed that the installer was capable to do this automatically (and stupid me for didn't check it)

Code: Select all

[zimbra@zimbraproxy root]$ zmlocalconfig | grep -Ei "ldap_url|master_url"
ldap_master_url = ldap://zimbra1.domain.local:389
ldap_url = ldap://zimbra1.domain.local:389
Tomorrow I will fix it and give a retry. Thank you again
Daniele
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2800
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.7 Network Edition
Contact:

Re: Issue with multi-server and multi-master replica

Post by L. Mark Stone »

Glad that gets you on the right track!

The installer really only configures the server on which Zimbra is being installed, though as you add servers, the installer does update LDAP with the the new server info. But that's it.

So, as you added servers recall you needed to run zmsshkeygen and zmupdateauthkeys on all the servers post-install? Same for when you add things like LDAP servers, the installer doesn't update the ldap_master_url and ldap_url localconfig variables on the other servers. :-)

Glad we could help and please post again once you get it sorted and/or if you have more questions!

All the best,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
Dark-Vex
Posts: 3
Joined: Tue Feb 27, 2018 1:32 pm

Re: Issue with multi-server and multi-master replica

Post by Dark-Vex »

Ok, after fixing the ldap_master_url and ldap_url variables and checking the SSH key on each server it works! :-)
Obviously with my setup I can access only to the mailboxes currently available on the "online" server.
I have only one question, the tool "/opt/zimbra/libexec/zmreplchk" will return the status of the synchronization only in case there is at least one LDAP master and one replica, right?
User avatar
L. Mark Stone
Ambassador
Ambassador
Posts: 2800
Joined: Wed Oct 09, 2013 11:35 am
Location: Portland, Maine, US
ZCS/ZD Version: 10.0.7 Network Edition
Contact:

Re: Issue with multi-server and multi-master replica

Post by L. Mark Stone »

Awesome that it’s now working!!

Yes, zmreplchk requires at least two entries in the ldap_url localconfig variable to run successfully.

Ideally run it on all ldap hosts and if all is well all hosts will report the same status, number of CSNs, etc.

All the best,
Mark
___________________________________
L. Mark Stone
Mission Critical Email - Zimbra VAR/BSP/Training Partner https://www.missioncriticalemail.com/
AWS Certified Solutions Architect-Associate
Post Reply