Let’s Encrypt expirations can be a nightmare on unautomated systems. When it goes wrong on an automated systems it’s even worse.
The bot will try to renew the certificate, but if it’s not correctly bound to services you’ll have all kinds of failures and technical support. Fortunately as fallback you might be warned about this occurrence by the Let’s Encrypt email warning service.
Typically the renewal for Apache is one thing, but making sure it also renews in other services such as Postfix and Dovecot another thing altogether.
To solve this problem you’ll need to deep dive into a lot of different areas of your operating environment.
- Where the CRON for automatic renewal is located (and is it working?)
- The directory(ies) where the certificates are installed
- The Postfix configuration file
- The Kopano (or other service) configuration file(s)
Here is a real world example of a failure, and how it was troubleshooted:
Email received from Let’s Encrypt and only SEVEN days to failure:
Subject: Let’s Encrypt certificate expiration notice for domain “cloud.example.com” (and 1 more)
The body of the email has two domains:
In this situation, we’re more worried about
mail.example.com, because the moment this fails the phones start ringing.
Question: How do we see what’s what?
Kopano might be enabled for
IMAPS secure, so we can use the
openssl s_client command to get a lot of information. Most of its is garbage, useless, and technical debt, but the
egrep below highlight the relevant sections in red on my system:
openssl s_client -connect mail.example.com:993 -servername mail.example.com -showcerts | openssl x509 -noout -text | egrep -i "validity|not before|not after"
Essentially the command connects to the mail server, at the
mail. address, and then interrogates the SSL for the name of the cert,
The output that takes extra long to complete is:
<snip> Validity Not Before: Nov 9 03:43:32 2023 GMT Not After : Feb 7 03:43:31 2024 GMT
Now since today is the 1st of February, we don’t really want to get stuck next week manning phones and talking to irate clients (like three months ago). So what went wrong?
First let’s check the CRON:
43 6 * * * certbot renew
Mmmm. The cron is there and renewing every day at 06:43, so maybe certbot is doing it’s job? Let’s see the dates on the certificates:
ls -lah /etc/letsencrypt/live/ -R
Mmmm. Next we see the certificate dates are still in November. Not good.
Maybe Let’s Encrypt has a log file?
In there we see the last activity was also in November.
Maybe the CRON isn’t working? When does the CRON actually renew the certificate, surely it doesn’t wait till the last minute? Let’s run it by hand:
certbot renew Traceback (most recent call last): File "/usr/bin/certbot", line 6, in <module> from pkg_resources import load_entry_point ModuleNotFoundError: No module named 'pkg_resources'
Well that’s unexpected. Certbot has been working on this system for sure, so something we did with Python libraries has screwed it up. The most likely candidate for this is one of these commands in the history file, which clearly shows some kind of Python manipulation. Deeper dives showed even more Python nonsense including uninstalling ‘certain’ versions:
- zypper install pip
- zypper install python3-dbus
- zypper install python-dbus
- zypper install dbus-python
So we have to delve into dependency hell, because now it seems our Python is broken.
First we notice Python has at least three versions installed:
python3 -V python3 python3.11 python3.8
How do we know which Python
certbot is using?
Going back one step, on Stack it says try this:
pip install setuptools
Of course this is also failing with
Next we see on the internet there are a lot of ‘different repos’ for OpenSUSE, and we can maybe find out which version by doing this:
openSUSE Tumbleweed 20200801
After much thought I took the radical step:
zypper remove certbot zypper install certbot
This made certbot come back!
We then ran the
openssl command again, but the dates were still wrong.
We restarted the
kopano-server, and dates were still wrong
We restarted the
kopano-gateway, and the dates were right!
If you’ve read up to this point you might think you’re dealing with a systems administrator that doesn’t know what they’re doing. In fact, I’m very experienced but Kopano is legacy and all admins can improve their skills. Kopano (and Let’s Encrypt) is great when it working, but if you only troubleshoot it once or twice a year you have to start from scratch.
So since we’ll have port 993 working we can safely assume port 995 should work. But what about 465? And 587? Maybe Postfix needs a boot too?
To be continued…this troubleshooting took me an hour of complex work and my work queue is piling up.