[Resolved]DNS cache seems corrupt

Discuss your pilot or production implementation with other Zimbra admins or our engineers.
davidkillingsworth
Outstanding Member
Outstanding Member
Posts: 251
Joined: Sat Sep 13, 2014 2:26 am
ZCS/ZD Version: 8.8.15.GA.3869.UBUNTU14.64-Patch 24

[Resolved]DNS cache seems corrupt

Post by davidkillingsworth »

Hello,

I have a very weird problem that I have just noticed. It all started because mail from a very reputable international insurance company (aetna.com) keeps going into our junk folders for all of our users. At first I was trying to whitelist it, but today I realized that the problem was this.

Server details:

Code: Select all

zimbra@zimbra:~$ cat /etc/issue
Ubuntu 14.04.6 LTS \n \l

zimbra@zimbra:~$ zmcontrol -v
Release 8.8.11.GA.3737.UBUNTU14.64 UBUNTU14_64 FOSS edition, Patch 8.8.11_P4.

Code: Select all

Authentication-Results: mail.mydomain.com (amavisd-new); dkim=neutral
	reason="invalid (public key: DNS query timeout for Mar2018._domainkey.aetna.com at /opt/zimbra/common/lib/perl5/Mail/DKIM/DNS.pm line 156, <GEN16> line 2304.)"
	header.d=aetna.com header.b=SZqPtx4l; dkim=fail (1024-bit key)
The part that originally didn't catch my attention was that there was a DNS query timeout. This causes the spam score in Spamassassin to go above the spam threashold since DKIM fails.

So I ran this:

Code: Select all

dig -t txt Mar2018._domainkey.aetna.com
Lo and behold it failed.

Code: Select all

 <<>> DiG 9.9.5-3ubuntu0.19-Ubuntu <<>> -t txt Mar2018._domainkey.aetna.com
;; global options: +cmd
;; connection timed out; no servers could be reached
I ran the same query from my own computer and it worked fine. I ran the same query from other almost identical Zimbra servers that I manage and it was successful, but for some reason this server cannot run a dig command against aetna.com.

/etc/resolv.conf

Code: Select all

# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
#     DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 127.0.0.1
/etc/network/interfaces

Code: Select all

dns-nameservers 8.8.8.8 8.8.4.4
zimbra@zimbra:~$ zmprov getServer `zmhostname` | grep DNSMasterIP

Code: Select all

zimbraDNSMasterIP: 8.8.8.8
zimbraDNSMasterIP: 8.8.4.4
I have tried flushing the cache, but it doesn't help.
I have tried changing the DNS servers to the ISP DNS servers instead of Google, and that doesn't help.

If I shut down the dnscache service using the following command - the query starts working correctly.

Code: Select all

/opt/zimbra/bin/zmdnscachectl stop
The same type of queries to other domains work perfectly fine, so it's not a firewall blocking issue.

Any ideas on what might be going on here? This is really really weird.

Only thing I can possibly think of is the virtual nic card type in VMware guest settings.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: DNS cache seems corrupt

Post by JDunphy »

davidkillingsworth wrote:

Code: Select all

Authentication-Results: mail.mydomain.com (amavisd-new); dkim=neutral
	reason="invalid (public key: DNS query timeout for Mar2018._domainkey.aetna.com at /opt/zimbra/common/lib/perl5/Mail/DKIM/DNS.pm line 156, <GEN16> line 2304.)"
	header.d=aetna.com header.b=SZqPtx4l; dkim=fail (1024-bit key)
The part that originally didn't catch my attention was that there was a DNS query timeout. This causes the spam score in Spamassassin to go above the spam threashold since DKIM fails.

So I ran this:

Code: Select all

dig -t txt Mar2018._domainkey.aetna.com
Lo and behold it failed.
Wild guess in case you have a cached AAAA NS record present for aetna ...Do you have an ipv6 address on that nic? Does this work any better on the problem machine in failure mode?

Code: Select all

dig -4 -t -txt Mar2018._domainkey.aetna.com
It seems like unbound supports extra debugging and verbose modes... just not sure how to do it the zimbra way. One way is to add the verbose flag when $unbound starts in zmdnscachectl ... line 87 (sudo $unbound) ... You would think there must be a way to send it a signal or method via a control interface to enable logging/debugging on an already running instance. I am only seeing flags on start however. Another thing you could do the next time your dig fails... change /etc/resolv.conf to 1.1.1.1 (cloudflare) or google (8.8.8.8) and run your dig command again. If that fails, it points us away from unbound and to a FW/Networking issue. I gather the failure remains from the command prompt (dig) once it happens or does it come and go? I run BIND on my zimbra servers and have no experience with unbound. Out of ideas at this time.

Ref: https://wiki.zimbra.com/wiki/DNS_cachin ... (dnscache)
davidkillingsworth
Outstanding Member
Outstanding Member
Posts: 251
Joined: Sat Sep 13, 2014 2:26 am
ZCS/ZD Version: 8.8.15.GA.3869.UBUNTU14.64-Patch 24

Re: DNS cache seems corrupt

Post by davidkillingsworth »

JDunphy wrote: Wild guess in case you have a cached AAAA NS record present for aetna ...Do you have an ipv6 address on that nic? Does this work any better on the problem machine in failure mode?
I double checked and IPV6 was disabled. Though when I did an $ifconfig, it did show a IPv6 address on Eth0. I used these instructions https://askubuntu.com/questions/440649/ ... untu-14-04 to remove IPV6 altogether.

Afterwards, it still showed that IPv6 was disabled and the only real difference is that when I did an $ifconfig, there wasn't an IPv6 address listed with Eth0.

Can you elaborate what you mean by "failure mode?"
JDunphy wrote: It seems like unbound supports extra debugging and verbose modes... just not sure how to do it the zimbra way. One way is to add the verbose flag when $unbound starts in zmdnscachectl ... line 87 (sudo $unbound) ... You would think there must be a way to send it a signal or method via a control interface to enable logging/debugging on an already running instance. I am only seeing flags on start however. Another thing you could do the next time your dig fails... change /etc/resolv.conf to 1.1.1.1 (cloudflare) or google (8.8.8.8) and run your dig command again. If that fails, it points us away from unbound and to a FW/Networking issue. I gather the failure remains from the command prompt (dig) once it happens or does it come and go? I run BIND on my zimbra servers and have no experience with unbound. Out of ideas at this time.
If I change /etc/resolv.conf to 8.8.8.8 instead of 127.0.0.1 and issue the $dig -t -txt Mar2018._domainkey.aetna.com, it IS SUCCESSFUL.

However, $dig -t txt aetna.com is not.

That's at least some progress. Changing /etc/resolv.conf instantly causes $dig -t -txt Mar2018._domainkey.aetna.com to not work.

This is what my Zimbra DNS settings look like.

Code: Select all

zimbra@zimbra:/var/log$ zmprov gs `zmhostname` | grep -i dns
zimbraDNSMasterIP: 8.8.8.8
zimbraDNSTCPUpstream: no
zimbraDNSUseTCP: yes
zimbraDNSUseUDP: yes
zimbraMtaDnsLookupsEnabled: TRUE
zimbraMtaLmtpHostLookup: dns
zimbraMtaPostscreenDnsblAction: enforce
zimbraMtaPostscreenDnsblMaxTTL: ${postscreen_dnsbl_ttl?{$postscreen_dnsbl_ttl}:{1}}h
zimbraMtaPostscreenDnsblMinTTL: 60s
zimbraMtaPostscreenDnsblSites: b.barracudacentral.org*3
zimbraMtaPostscreenDnsblSites: zen.spamhaus.org*3
zimbraMtaPostscreenDnsblSites: bl.spamcop.net
zimbraMtaPostscreenDnsblSites: cbl.abuseat.org*3
zimbraMtaPostscreenDnsblSites: ubl.unsubscore.com*2
zimbraMtaPostscreenDnsblSites: ix.dnsbl.manitu.net*2
zimbraMtaPostscreenDnsblTTL: 1h
zimbraMtaPostscreenDnsblThreshold: 5
zimbraMtaPostscreenDnsblTimeout: 10s
zimbraMtaPostscreenDnsblWhitelistThreshold: 0
zimbraMtaSmtpDnsSupportLevel: enabled
zimbraReverseProxyDnsLookupInServerEnabled: TRUE
zimbraServiceEnabled: dnscache
zimbraServiceInstalled: dnscache
davidkillingsworth
Outstanding Member
Outstanding Member
Posts: 251
Joined: Sat Sep 13, 2014 2:26 am
ZCS/ZD Version: 8.8.15.GA.3869.UBUNTU14.64-Patch 24

Re: DNS cache seems corrupt

Post by davidkillingsworth »

More progress on this. I figured out that the dns caching server that is built into Zimbra is called unbound.

I went into /opt/zimbra/conf/unbound.conf.in

and changed the log level to from 1 (default) to 3.

Restarted the DNScache service with

Code: Select all

/opt/zimbra/bin/zmdnscachectl restart
I then grep'd the zimbra.log for unbound.

Code: Select all

May  9 01:25:44 zimbra unbound: [10484:0] debug: validator[module 0] operate: extstate:module_state_initial event:module_event_new
May  9 01:25:44 zimbra unbound: [10484:0] info: validator operate: query Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:44 zimbra unbound: [10484:0] debug: iterator[module 1] operate: extstate:module_state_initial event:module_event_pass
May  9 01:25:44 zimbra unbound: [10484:0] info: resolving Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:44 zimbra unbound: [10484:0] info: processQueryTargets: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:44 zimbra unbound: [10484:0] info: sending query: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:44 zimbra unbound: [10484:0] debug: sending to target: <.> 8.8.8.8#53
May  9 01:25:44 zimbra unbound: [10484:0] debug: cache memory msg=87188 rrset=79880 infra=2929 val=66344
May  9 01:25:46 zimbra unbound: [10484:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_noreply
May  9 01:25:46 zimbra unbound: [10484:0] info: iterator operate: query Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:46 zimbra unbound: [10484:0] info: processQueryTargets: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:46 zimbra unbound: [10484:0] info: sending query: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:46 zimbra unbound: [10484:0] debug: sending to target: <.> 8.8.8.8#53
May  9 01:25:46 zimbra unbound: [10484:0] debug: cache memory msg=87188 rrset=79880 infra=2929 val=66344
May  9 01:25:49 zimbra unbound: [10484:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_noreply
May  9 01:25:49 zimbra unbound: [10484:0] info: iterator operate: query Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:49 zimbra unbound: [10484:0] info: processQueryTargets: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:49 zimbra unbound: [10484:0] info: sending query: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:49 zimbra unbound: [10484:0] debug: sending to target: <.> 8.8.8.8#53
May  9 01:25:49 zimbra unbound: [10484:0] debug: cache memory msg=87188 rrset=79880 infra=2929 val=66344
May  9 01:25:49 zimbra unbound: [10484:0] debug: cache memory msg=87188 rrset=79880 infra=2929 val=66344
May  9 01:25:54 zimbra unbound: [10484:0] debug: cache memory msg=87188 rrset=79880 infra=2929 val=66344
May  9 01:25:55 zimbra unbound: [10484:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_noreply
May  9 01:25:55 zimbra unbound: [10484:0] info: iterator operate: query Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:55 zimbra unbound: [10484:0] info: processQueryTargets: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:55 zimbra unbound: [10484:0] info: sending query: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:25:55 zimbra unbound: [10484:0] debug: sending to target: <.> 8.8.8.8#53
May  9 01:25:55 zimbra unbound: [10484:0] debug: cache memory msg=87188 rrset=79880 infra=2929 val=66344
May  9 01:26:02 zimbra unbound: [10484:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_noreply
May  9 01:26:02 zimbra unbound: [10484:0] info: iterator operate: query Mar2018._domainkey.aetna.com. TXT IN
May  9 01:26:02 zimbra unbound: [10484:0] info: processQueryTargets: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:26:02 zimbra unbound: [10484:0] info: sending query: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:26:02 zimbra unbound: [10484:0] debug: sending to target: <.> 8.8.8.8#53
May  9 01:26:02 zimbra unbound: [10484:0] debug: cache memory msg=87188 rrset=79880 infra=2929 val=66344
May  9 01:26:15 zimbra unbound: [10484:0] debug: iterator[module 1] operate: extstate:module_wait_reply event:module_event_noreply
May  9 01:26:15 zimbra unbound: [10484:0] info: iterator operate: query Mar2018._domainkey.aetna.com. TXT IN
May  9 01:26:15 zimbra unbound: [10484:0] info: processQueryTargets: Mar2018._domainkey.aetna.com. TXT IN
May  9 01:26:15 zimbra unbound: [10484:0] debug: configured forward servers failed -- returning SERVFAIL
May  9 01:26:15 zimbra unbound: [10484:0] debug: return error response SERVFAIL
May  9 01:26:15 zimbra unbound: [10484:0] debug: validator[module 0] operate: extstate:module_wait_module event:module_event_moddone
May  9 01:26:15 zimbra unbound: [10484:0] info: validator operate: query Mar2018._domainkey.aetna.com. TXT IN
May  9 01:26:15 zimbra unbound: [10484:0] debug: cache memory msg=87188 rrset=79880 infra=2929 val=66344
I'm not sure if this really tells me much more than I knew before.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: DNS cache seems corrupt

Post by JDunphy »

Interesting... I don't know if you noticed but this is what I see here:

Code: Select all

% dig -t txt aetna.com
;; Truncated, retrying in TCP mode.

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.68.rc1.el6_10.1 <<>> -t txt aetna.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 10312
;; flags: qr rd ra; QUERY: 1, ANSWER: 9, AUTHORITY: 9, ADDITIONAL: 10

;; QUESTION SECTION:
;aetna.com.			IN	TXT
;; ANSWER SECTION:
aetna.com.		3600	IN	TXT	"y38-mb7-j09"
aetna.com.		3600	IN	TXT	"facebook-domain-verification=mazliy0tvw578mlupibfih0oepqbkn"
aetna.com.		3600	IN	TXT	"J7t16QKXJnyXbLXmPu8O8KaN3WKTxPYFpwgH9+fOEsY9IwRj0i5T+zzBMtr5dPWMXyK4fFIdJ0M0tLNZyyyjow=="
aetna.com.		3600	IN	TXT	"v=spf1 mx include:_spf.aetna.com include:_spf2.aetna.com include:_spf.salesforce.com include:spf.constantcontact.com -all"
aetna.com.		3600	IN	TXT	"ciscocidomainverification=52d925c925a91c5299f48666e2cf74b5eede2b1a3263751b7af8a602d3e7cf14"
aetna.com.		3600	IN	TXT	"MS=ms71494333"
aetna.com.		3600	IN	TXT	"YICqsmFu4GmgvJIb5ZknybzvTg/4rj/2qCsxv8XASZPSNUy7ho+IwjyD5KoJc20qIitO3t21Z/ZddTfKMICQoQ=="
aetna.com.		3600	IN	TXT	"google-site-verification=3DJ6OBtvnHf9rT1-hmo4rE591yHbci3FW1sh7HmEg8"
aetna.com.		3600	IN	TXT	"adobe-idp-site-verification=7fb30cc6-25b8-47fc-b112-a898bb9aa6e9"

;; AUTHORITY SECTION:
aetna.com.		172800	IN	NS	a9-67.akam.net.
aetna.com.		172800	IN	NS	a8-67.akam.net.
aetna.com.		172800	IN	NS	a12-65.akam.net.
aetna.com.		172800	IN	NS	ns2.aetna.com.
aetna.com.		172800	IN	NS	ns7.aetna.com.
aetna.com.		172800	IN	NS	a16-67.akam.net.
aetna.com.		172800	IN	NS	a1-155.akam.net.
aetna.com.		172800	IN	NS	a13-65.akam.net.
aetna.com.		172800	IN	NS	ns1.aetna.com.

;; ADDITIONAL SECTION:
a9-67.akam.net.		48648	IN	A	184.85.248.67
a8-67.akam.net.		73207	IN	A	2.16.40.67
a8-67.akam.net.		172625	IN	AAAA	2600:1403:a::43
a1-155.akam.net.	90000	IN	A	193.108.91.155
a12-65.akam.net.	90000	IN	A	184.26.160.65
a13-65.akam.net.	9851	IN	A	2.22.230.65
a16-67.akam.net.	90000	IN	A	23.211.132.67
ns1.aetna.com.		172800	IN	A	206.213.251.100
ns2.aetna.com.		172800	IN	A	206.213.209.100
ns7.aetna.com.		172800	IN	A	12.10.217.80

;; Query time: 0 msec
;; SERVER: 127.0.0.1#53(127.0.0.1)
;; WHEN: Wed May  8 10:10:22 2019
;; MSG SIZE  rcvd: 1103
They have a lot of data... so switching to tcp for dns queries.. See below for cause. Double check FW that you are allowing DNS tcp queries and/or limits on large UDP packets via DNS extensions.

As for failure mode... It sounded like you said aetna.com would work after a restart but then fail and stay failed until you restarted unbound. So one of the things I was investigating was what happens when the cache expires for that RR (Mar2018._domainkey.aetna.com). They are set to expire in 1hr so your resolver would use cached NS records to query the next answer... vs going out to the root servers again for the NS records for aetna... The NS records TTL is in days in comparison. This is why I was hoping to enable some debugging so we could just look in the logs for the reason and was it an odd NS that was involved. Perhaps add +trace +dnssec options with your failed dig command. It would be helpful to see the entire output also. Specifically this to see if any extensions are firing in success/fail mode with dig.

Code: Select all

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;aetna.com.			IN	TXT
I am leaning more toward FW at this point because you said dig failed for aetna.com with google as a resolver when you went after ALL the TXT records for aetna.com... TCP port 53.
https://serverfault.com/questions/404840/when-do-dns-queries-use-tcp-instead-of-udp wrote: DNS uses TCP when the size of the request or the response is greater than a single packet such as with responses that have many records or many IPv6 responses or most DNSSEC responses.
The maximum size was originally 512 bytes but there is an extension to the DNS protocol that allows clients to indicate that they can handle UDP responses of up to 4096 bytes.
DNSSEC responses are usually larger than the maximum UDP size.
Transfer requests are usually larger than the maximum UDP size and hence will also be done over TCP.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: DNS cache seems corrupt

Post by JDunphy »

I need to study this debugging log... perhaps -v option might be better option.
davidkillingsworth
Outstanding Member
Outstanding Member
Posts: 251
Joined: Sat Sep 13, 2014 2:26 am
ZCS/ZD Version: 8.8.15.GA.3869.UBUNTU14.64-Patch 24

Re: DNS cache seems corrupt

Post by davidkillingsworth »

It could be something on the firewall. It's an older Cisco ASA 5505.

However, I'm running a terminal monitor on it at the same time as doing a dig query and I don't see any traffic being blocked from the mail server.

The failure occurs when I have zimbra's dnscache running. If I turn zimbra dnscache off, the dig queries actually work.

The only other thing I can think of that might cause very strange issues that I have seen is the virtual network card type. This is a VMware 6.5 server and the virtual network card for this zimbra server is VMXNET 3. I have seen problems with virtual network card types. I may also try to change the network card type to E1000 and see if that has any affect.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: DNS cache seems corrupt

Post by JDunphy »

I thought this was a caching DNS resolver... nope. From the documentation - "dnscache adds into the MTA servers a local DNS cache server that can keep all the external DNS request". Anyway, here is the root cause.

"configured forward servers failed -- returning SERVFAIL"

So you are looking at unbound + external resolver for possible cause. If it was me, I would use /etc/resolv.conf and figure out why the external resolver is failing at times with dig. After you rule out your FW then investigate if the external resolver has any limits that you might be hitting. You are close to figuring this out.

Another thing is how to handle some RBL's that could fail in unexpected ways because they have limits on the number of queries per day per by resolver IPs.
User avatar
JDunphy
Outstanding Member
Outstanding Member
Posts: 889
Joined: Fri Sep 12, 2014 11:18 pm
Location: Victoria, BC
ZCS/ZD Version: 9.0.0_P39 NETWORK Edition

Re: DNS cache seems corrupt

Post by JDunphy »

Hey David,

You might want to play with dnsping,dnseval and dnstraceroute. Will definitely show oddities like transparent proxying (ISP/NSP interception), throttling, FW, slowness, etc.
Something like this for udp and then tcp might shine a light.

Code: Select all

# dnsping -t TXT Mar2018._domainkey.aetna.com
# dnsping -T -t TXT Mar2018._domainkey.aetna.com

Code: Select all

% dnsping -h
dnsping version 1.6.4
usage: dnsping [-ehqv] [-s server] [-p port] [-P port] [-S address] [-c count] [-t type] [-w wait] hostname
  -h  --help      Show this help
  -q  --quiet     Quiet
  -v  --verbose   Print actual dns response
  -s  --server    DNS server to use (default: first entry from /etc/resolv.conf)
  -p  --port      DNS server port number (default: 53)
  -T  --tcp       Use TCP instead of UDP
  -4  --ipv4      Use IPv4 as default network protocol
  -6  --ipv6      Use IPv6 as default network protocol
  -P  --srcport   Query source port number (default: 0)
  -S  --srcip     Query source IP address (default: default interface address)
  -c  --count     Number of requests to send (default: 10)
  -w  --wait      Maximum wait time for a reply (default: 2 seconds)
  -i  --interval  Time between each request (default: 1 seconds)
  -t  --type      DNS request record type (default: A)
  -e  --edns      Disable EDNS0 (default: Enabled)
dnseval at work showing my resolvers from home forcing TCP only. Glad tcp isn't the default. :-)

Code: Select all

% dnseval -T -t TXT aetna.com
server              avg(ms)     min(ms)     max(ms)     stddev(ms)  lost(%)  ttl        flags
------------------------------------------------------------------------------------------------------------
X.X.X.1       93.341      74.294      203.824     39.061      %0       3444       QR -- -- RD RA -- --
X.X.X.2       43.587      35.005      75.768      11.668      %0       3443       QR -- -- RD RA -- --
X.X.X.3       84.904      78.863      103.839     7.160       %0       3443       QR -- -- RD RA -- --
X.X.X.4       41.856      36.096      47.739      4.131       %0       3443       QR -- -- RD RA -- --
Be on the look out for any lost queries.
And now for the question of who has a faster resolver from your location - cloudflare or google with TCP and UDP queries.

Code: Select all

mimir:~/src:267> dnsping -s 8.8.4.4 -t TXT Mar2018._domainkey.aetna.com
dnsping DNS: 8.8.4.4:53, hostname: Mar2018._domainkey.aetna.com, rdatatype: TXT
457 bytes from 8.8.4.4: seq=0   time=22.635 ms
457 bytes from 8.8.4.4: seq=1   time=35.038 ms
457 bytes from 8.8.4.4: seq=2   time=51.156 ms
457 bytes from 8.8.4.4: seq=3   time=168.659 ms
457 bytes from 8.8.4.4: seq=4   time=23.281 ms
457 bytes from 8.8.4.4: seq=5   time=32.023 ms
457 bytes from 8.8.4.4: seq=6   time=25.748 ms
457 bytes from 8.8.4.4: seq=7   time=22.588 ms
457 bytes from 8.8.4.4: seq=8   time=51.474 ms
457 bytes from 8.8.4.4: seq=9   time=24.094 ms

--- 8.8.4.4 dnsping statistics ---
10 requests transmitted, 10 responses received, 0% lost
min=22.588 ms, avg=45.670 ms, max=168.659 ms, stddev=44.617 ms
mimir:~/src:268> dnsping -T -s 8.8.4.4 -t TXT Mar2018._domainkey.aetna.com
dnsping DNS: 8.8.4.4:53, hostname: Mar2018._domainkey.aetna.com, rdatatype: TXT
457 bytes from 8.8.4.4: seq=0   time=48.638 ms
457 bytes from 8.8.4.4: seq=1   time=41.929 ms
457 bytes from 8.8.4.4: seq=2   time=43.149 ms
457 bytes from 8.8.4.4: seq=3   time=36.503 ms
457 bytes from 8.8.4.4: seq=4   time=44.185 ms
457 bytes from 8.8.4.4: seq=5   time=36.796 ms
457 bytes from 8.8.4.4: seq=6   time=49.508 ms
457 bytes from 8.8.4.4: seq=7   time=49.160 ms
457 bytes from 8.8.4.4: seq=8   time=43.010 ms
457 bytes from 8.8.4.4: seq=9   time=44.224 ms

--- 8.8.4.4 dnsping statistics ---
10 requests transmitted, 10 responses received, 0% lost
min=36.503 ms, avg=43.710 ms, max=49.508 ms, stddev=4.617 ms
mimir:~/src:269> dnsping -T -s 1.1.1.1 -t TXT Mar2018._domainkey.aetna.com
dnsping DNS: 1.1.1.1:53, hostname: Mar2018._domainkey.aetna.com, rdatatype: TXT
457 bytes from 1.1.1.1: seq=0   time=43.067 ms
457 bytes from 1.1.1.1: seq=1   time=35.887 ms
457 bytes from 1.1.1.1: seq=2   time=41.554 ms
457 bytes from 1.1.1.1: seq=3   time=37.241 ms
457 bytes from 1.1.1.1: seq=4   time=41.034 ms
457 bytes from 1.1.1.1: seq=5   time=46.829 ms
457 bytes from 1.1.1.1: seq=6   time=45.495 ms
457 bytes from 1.1.1.1: seq=7   time=38.116 ms
457 bytes from 1.1.1.1: seq=8   time=41.857 ms
457 bytes from 1.1.1.1: seq=9   time=37.187 ms

--- 1.1.1.1 dnsping statistics ---
10 requests transmitted, 10 responses received, 0% lost
min=35.887 ms, avg=40.827 ms, max=46.829 ms, stddev=3.687 ms
mimir:~/src:270> dnsping -s 1.1.1.1 -t TXT Mar2018._domainkey.aetna.com
dnsping DNS: 1.1.1.1:53, hostname: Mar2018._domainkey.aetna.com, rdatatype: TXT
457 bytes from 1.1.1.1: seq=0   time=24.935 ms
457 bytes from 1.1.1.1: seq=1   time=19.436 ms
457 bytes from 1.1.1.1: seq=2   time=17.202 ms
457 bytes from 1.1.1.1: seq=3   time=16.838 ms
457 bytes from 1.1.1.1: seq=4   time=17.535 ms
457 bytes from 1.1.1.1: seq=5   time=18.322 ms
457 bytes from 1.1.1.1: seq=6   time=16.487 ms
457 bytes from 1.1.1.1: seq=7   time=18.229 ms
457 bytes from 1.1.1.1: seq=8   time=17.798 ms
457 bytes from 1.1.1.1: seq=9   time=18.551 ms
--- 1.1.1.1 dnsping statistics ---
10 requests transmitted, 10 responses received, 0% lost
min=16.487 ms, avg=18.533 ms, max=24.935 ms, stddev=2.411 ms
Now the question ... what does a local resolver look like on our zimbra servers for the same query.

Code: Select all

# dnsping -t TXT Mar2018._domainkey.aetna.com
dnsping DNS: 127.0.0.1:53, hostname: Mar2018._domainkey.aetna.com, rdatatype: TXT
457 bytes from 127.0.0.1: seq=0   time=0.562 ms
457 bytes from 127.0.0.1: seq=1   time=0.427 ms
457 bytes from 127.0.0.1: seq=2   time=0.487 ms
457 bytes from 127.0.0.1: seq=3   time=0.459 ms
457 bytes from 127.0.0.1: seq=4   time=0.507 ms
457 bytes from 127.0.0.1: seq=5   time=0.515 ms
457 bytes from 127.0.0.1: seq=6   time=0.608 ms
457 bytes from 127.0.0.1: seq=7   time=0.448 ms
457 bytes from 127.0.0.1: seq=8   time=0.517 ms
457 bytes from 127.0.0.1: seq=9   time=0.513 ms

--- 127.0.0.1 dnsping statistics ---
10 requests transmitted, 10 responses received, 0% lost
min=0.427 ms, avg=0.504 ms, max=0.608 ms, stddev=0.054 ms
Your numbers should be similar to mine above for unbound after that initial latency for the first fetch to the external resolver. Looks like we are in the same datacenter on my test case as one of .Akamai's NS for aetna. Not as lucky on a Toronto datacenter where it took 82ms for the initial fetch but after that was < 0.5ms for the subsequent 9. If there was ever any doubt what a caching dns server can do it should be gone now. :-)

Ref:https://github.com/farrokhi/dnsdiag
installed with a single command: pip3 install dnsdiag

Jim
davidkillingsworth
Outstanding Member
Outstanding Member
Posts: 251
Joined: Sat Sep 13, 2014 2:26 am
ZCS/ZD Version: 8.8.15.GA.3869.UBUNTU14.64-Patch 24

Re: DNS cache seems corrupt

Post by davidkillingsworth »

I went back to looking at this. I'm still not sure what's going on.

I changed the VMWare virtual network card type from VMX3 to E1000, but that didn't help anything.

I ended up having to just disable DNSCache in Zimbra to get DNS queries working properly.

The specific domainkey lookup is now working, but generic dig queries for all txt records for aetna.com does not work unless I specify +tcp in the dig command.

Note, the commands below are both with Zimbra's DNS cache disabled.

Code: Select all

david@zimbra:~$ dig -t txt aetna.com

; <<>> DiG 9.9.5-3ubuntu0.19-Ubuntu <<>> -t txt aetna.com
;; global options: +cmd
;; connection timed out; no servers could be reached

Code: Select all

david@zimbra:~$ dig +tcp -t txt aetna.com
;; Connection to 202.130.97.65#53(202.130.97.65) for aetna.com failed: connection refused.

; <<>> DiG 9.9.5-3ubuntu0.19-Ubuntu <<>> +tcp -t txt aetna.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23949
;; flags: qr rd ra ad; QUERY: 1, ANSWER: 9, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;aetna.com.			IN	TXT

;; ANSWER SECTION:
aetna.com.		900	IN	TXT	"MS=ms71494333"
aetna.com.		900	IN	TXT	"google-site-verification=3DJ6OBtvnHf9rT1-hmo4rE591yHbci3FW1sh7HmEg8"
aetna.com.		900	IN	TXT	"adobe-idp-site-verification=7fb30cc6-25b8-47fc-b112-a898bb9aa6e9"
aetna.com.		900	IN	TXT	"ciscocidomainverification=52d925c925a91c5299f48666e2cf74b5eede2b1a3263751b7af8a602d3e7cf14"
aetna.com.		900	IN	TXT	"v=spf1 mx include:_spf.aetna.com include:_spf2.aetna.com include:_spf.salesforce.com include:spf.constantcontact.com -all"
aetna.com.		900	IN	TXT	"y38-mb7-j09"
aetna.com.		900	IN	TXT	"facebook-domain-verification=mazliy0tvw578mlupibfih0oepqbkn"
aetna.com.		900	IN	TXT	"J7t16QKXJnyXbLXmPu8O8KaN3WKTxPYFpwgH9+fOEsY9IwRj0i5T+zzBMtr5dPWMXyK4fFIdJ0M0tLNZyyyjow=="
aetna.com.		900	IN	TXT	"YICqsmFu4GmgvJIb5ZknybzvTg/4rj/2qCsxv8XASZPSNUy7ho+IwjyD5KoJc20qIitO3t21Z/ZddTfKMICQoQ=="

;; Query time: 1457 msec
;; SERVER: 202.130.97.66#53(202.130.97.66)
;; WHEN: Fri May 31 02:48:11 HKT 2019
;; MSG SIZE  rcvd: 756
I suspect that maybe there is something wrong with the configuration of the Cisco ASA 5505 we have as a firewall at that site. We are not blocking outbound UDP or TCP packets. I did view the terminal monitor on the ASA to look for any blocked traffic, but couldn't see any errors through all of this testing.

I have a couple of other Ubuntu servers at this location (10.04 and 18.04) that don't seem to have the issues DNS query issues though. Perhaps, I just need to upgrade from Ubuntu 14.04 to 16.04.

Totally stumped.
Post Reply