Operating Systems

Webmin Cluster Name Servers (Bind) Troubleshooting

Table of Contents

Background

Here are some Bind troubleshooting tips. These tips are related to Webmin’s Cluster name servers which are used by Virtualmin when creating a name server hierarchy.

failed to determine a valid IP address for this system (got 127.0.1.1)

Adding new DNS zone ..
.. done

Adding secondary zone on ns1.example.com ns2.example.com ..
.. failed to determine a valid IP address for this system (got 127.0.1.1). You will need to set the IP in the BIND DNS Server module.

The above error might also go with this:

refused notify from non-primary: 1.2.3.4

Jul 18 04:23:12 ns2 named[110656]: client @0x7f91d106cb18 41.203.6.203#56022: received notify for zone 'host.example.com'
Jul 18 04:23:12 ns2 named[110656]: zone host.example.com/IN: refused notify from non-primary: 1.2.3.4#56022

Go here to fix:

Webmin -> Servers -> BIND DNS Server -> Cogwheel -> Cluster slave servers

Change to below:

Default master server IP for remote slave zones

IP address of hostname 1.2.3.4

This could happens when a server’s IP address has changed.

When creating a new DNS record, REFUSED from existing secondaries

This situation might occur when you’ve just added a new slave of which the IP address has changed.

The error means the existing secondaries does not know about IP address of this new slave and they have to be updated. If you look at your existing zone files you’ll see this:

zone "example.com" {
   type slave;
   masters {
      A.B.C.D;
      E.F.G.H;
      I.J.K.L;
      M.N.O.P;
   };
   allow-transfer {
      A.B.C.D;
      E.F.G.H;
      I.J.K.L;
      M.N.O.P;
   };
   file "/var/lib/bind/example.com.hosts";
};

Essentially you don’t transfer if the IP address if your new IP address server isn’t in that list. You now need to update /etc/bind/named.conf.local with the new IP address.

/etc/bind# cp named.conf.local named.conf.local.backup
sed -i 's/A.B.C.D/W.X.Y.Z/g' named.conf.local

When doing these changed on CentOS, the path is /etc and the file is /etc/named.conf.

sed -i replaces the text, if you use sed without -i the text will be output.

Reference: https://phoenixnap.com/kb/sed-replace

refused notify from non-master

When adding a new DNS record, you might this problem:

zone example.com/IN: refused notify from non-master: A.B.C.D#42935

A.B.C.D is the new IP address of the new slave server. You might experience this on your 4th name server, so orientate yourself first properly.

A good place to look for a consistent setup is in:

/etc/bind/named.conf.options

You will find allow-transfer and also-notify settings there. Are these values consistent across all your slaves?

bad zone transfer request / non-authoritative zone

Scenario:

You existing name server is complaining about bad zone transfer request / non authoritative zone.

One possible solution:

Look for the zone. Perhaps it was cancelled and never properly removed from the old server?

You can just delete it from the primary name server, Webmin cluster should offer to remove it from the slaves also.

Restarting Slave Fails

Re-starting slave DNS servers ..
.. some slave servers failed

nsx.example.com :

This probably means that rndc is failing somewhere. You can check it on nsx.example.com like so:

rndc status

If you get this, it means something is wrong:

# rndc status
rndc: connection to remote host closed.
* This may indicate that the
* remote server is using an older
* version of the command protocol,
* this host is not authorized to connect,
* the clocks are not synchronized,
* the key signing algorithm is incorrect
* or the key is invalid.

rndc -V status

This can happen because the rndc key is defined in multiple places, and also if you have 3rd party DNS integration (e.g. Plesk), you have to be extra careful. See here for some guidance:

https://unix.stackexchange.com/questions/489748/bind-9-9-4-rndc-connection-to-remote-host-closed

Permissions Problem in Cluster Slaves

This problem appears sometimes on slaves, even though the master is working properly. It’s best to use tail in combination with grep.

Variation 1

You may see this error when tail -f /var/log/syslog | grep domain:

zone example.com/IN: refresh: could not set file modification time of '/var/lib/bind/example.co.za.hosts': permission denied

Variation 2

Jul 18 03:40:12 ns2 named[110656]: zone example.com/IN: loading from master file /var/lib/bind/example.com.hosts failed: end of file
Jul 18 03:40:12 ns2 kernel: [540002.553467] audit: type=1400 audit(1721266812.060:70): apparmor="DENIED" operation="link" profile="named" name="/var/lib/bind/db-tPmcdazf" pid=110656 comm="isc-net-0000" requested_mask="l" denied_mask="l" fsuid=113 ouid=0 target="/var/lib/bind/example.com.hosts"
Jul 18 03:40:13 ns2 named[110656]: zone example.com/IN: Transfer started.
Jul 18 03:40:13 ns2 named[110656]: transfer of 'example.com/IN' from 129.232.252.163#53: connected using 129.232.252.163#53
Jul 18 03:40:13 ns2 named[110656]: zone example.com/IN: transferred serial 1621244761
Jul 18 03:40:13 ns2 named[110656]: zone example.com/IN: transfer: could not set file modification time of '/var/lib/bind/example.com.hosts': permission denied

Variation 2 warns about apparmor.

Variation 1 Solution and Explanation

cd /var/lib/bind/

Check the owner permissions. Some owners might be:

bind:bind and rw-r--r--

These will be working. Other owners might be:

root:bind and rwxrwxr-x

In spite of the second lot having a lot of permissions, the second lot didn’t transfer. The solution is an obscure button in Webmin / Servers / Bind DNS Server / cogwheel small icon to left. Click it and navigate to Zone file options and set the permissions right here:

Variation 2 Solution and Explanation

Variation 2 might throw you off and make you think it’s a app armor issue. In fact, it was the secondary which also didn’t have bind:bind permissions assigned.