Webmin Cluster Name Servers (Bind) Troubleshooting

Background

Here are some Bind troubleshooting tips. These tips are mostly related to Webmin’s Cluster name servers which we use with Virtualmin.

When creating a new DNS record, REFUSED from existing secondaries

This situation might occur when you’ve just added a new slave of which the IP address has changed.

The error means the existing secondaries does not know about IP address of this new slave and they have to be updated. If you look at your existing zone files you’ll see this:

zone "example.com" {
   type slave;
   masters {
      A.B.C.D;
      E.F.G.H;
      I.J.K.L;
      M.N.O.P;
   };
   allow-transfer {
      A.B.C.D;
      E.F.G.H;
      I.J.K.L;
      M.N.O.P;
   };
   file "/var/lib/bind/example.com.hosts";
};

Essentially you don’t transfer if the IP address if your new IP address server isn’t in that list. You now need to update /etc/bind/named.conf.local with the new IP address.

/etc/bind# cp named.conf.local named.conf.local.backup
sed -i 's/A.B.C.D/W.X.Y.Z/g' named.conf.local

When doing these changed on CentOS, the path is /etc and the file is /etc/named.conf.

sed -i replaces the text, if you use sed without -i the text will be output.

Reference: https://phoenixnap.com/kb/sed-replace

refused notify from non-master

When adding a new DNS record, you might this problem:

zone example.com/IN: refused notify from non-master: A.B.C.D#42935

A.B.C.D is the new IP address of the new slave server. You might experience this on your 4th name server, so orientate yourself first properly.

A good place to look for a consistent setup is in:

/etc/bind/named.conf.options

You will find allow-transfer and also-notify settings there. Are these values consistent across all your slaves?

bad zone transfer request / non-authoritative zone

Scenario:

You existing name server is complaining about bad zone transfer request / non authoritative zone.

One possible solution:

Look for the zone. Perhaps it was cancelled and never properly removed from the old server?

You can just delete it from the primary name server, Webmin cluster should offer to remove it from the slaves also.

Restarting Slave Fails

Re-starting slave DNS servers ..
.. some slave servers failed

nsx.example.com :

This probably means that rndc is failing somewhere. You can check it on nsx.example.com like so:

rndc status

If you get this, it means something is wrong:

# rndc status
rndc: connection to remote host closed.
* This may indicate that the
* remote server is using an older
* version of the command protocol,
* this host is not authorized to connect,
* the clocks are not synchronized,
* the key signing algorithm is incorrect
* or the key is invalid.
rndc -V status

This can happen because the rndc key is defined in multiple places, and also if you have 3rd party DNS integration (e.g. Plesk), you have to be extra careful. See here for some guidance:

https://unix.stackexchange.com/questions/489748/bind-9-9-4-rndc-connection-to-remote-host-closed

Permissions Problem in Cluster Slaves

You may see this error when tail -f /var/log/syslog:

zone example.com/IN: refresh: could not set file modification time of '/var/lib/bind/example.co.za.hosts': permission denied

Solution and Explanation

cd /var/lib/bind/

Check the owner permissions. Some owners might be:

bind:bind and rw-r--r--

These will be working. Other owners might be:

root:bind and rwxrwxr-x

In spite of the second lot having a lot of permissions, the second lot didn’t transfer. The solution is an obscure button in Webmin / Servers / Bind DNS Server / cogwheel small icon to left. Click it and navigate to Zone file options and set the permissions right here:

Tags

Share this article

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to Top