On a stable and current WHM/Cpanel server, rarely rebooted, a reboot to disable SELinux caused Dovecot to stop working. The first major clue something was wrong was a client noting that webmail is down with:
503 Service Unavailable The server is temporarily busy, try again later
At first glance one would assume it’s something to do with SELinux, but why? Could it really be?
The log to see the error is:
# tail -f /usr/local/cpanel/logs/error_log dovecot auth service not ready
Googling had no exact results, typically not a good sign.
Only two actions were tried before contacting WHM and logging a ticket:
- Rebuilding Dovecot configuration
- Restarting the Dovecot service
This did not resolve the problem. As this is a perfectly stable server with numerous clients, it’s not worth messing about. Dovecot should always just start. The server has been running for more than a year. So immediate ticket logged.
Hats of the WHM support for resolving the issue. Along the way numerous interesting commands were obtain in troubleshooting. This article documents some of these commands:
Restarting Dovecot on WHM/cPanel with verbose output
/usr/local/cpanel/scripts/restartsrv_dovecot --html --wait --verbose
Checking if Dovecot is actually running
systemctl status dovecot -l ... ├─1741 dovecot/auth ├─7862 dovecot/auth -w ...
Logging into an IMAP mailbox from the command line to test a WHM/cPanel server
First create a test mailbox, then:
doveadm auth login [email protected] secret
Using NMAP to determine if all important WHM ports are open
nmap -p 25,587,465,110,995,143,993,585,2096,2095 220.127.116.11 --reason
At this point the engineer noted that ports
2096 were closed and apparently one or both are used for Webmail.
If they were open it would have looked like this:
2095/tcp open nbx-ser syn-ack 2096/tcp open nbx-dir syn-ack
But they were closed so it looked like this:
2095/tcp closed nbx-ser reset 2096/tcp closed nbx-dir reset
The solution to the problem was to increase authentication processes:
Maximum Number of Authentication Processes in WHM > Mailserver Configuration
Reason Why This Happened
If difficult to say why this happened. It could be that when the server came back after a reboot, many (100s) of IMAP sessions were starved to connect to the server and suddenly overwhelmed the system.
- WHM/cPanel support ticket