A Transcript of an Ubuntu 18.04 to Ubuntu 20.04 upgrade gone wrong due to memory and CPU constraints

During an online upgrade of an Ubuntu 18.04 server to Ubuntu 20.04, the system became unresponsive for over an hour.

One idea was to reset the host, but this could have caused seriously problems for the upgrade.

At this point even the host virtual machine manager interface was unresponsive, so we had to use their help line.

How this sorry saga was ended was by asking the host to temporary upgrade the RAM and CPU so that the migration can complete.

Here is a transcript of some of the events:

When the upgrade started, we got this message:

To make recovery in case of failure easier, an additional sshd will 
be started on port '1022'. If anything goes wrong with the running 
ssh you can still connect to the additional one. 
If you run a firewall, you may need to temporarily open this port. As 
this is potentially dangerous it's not done automatically. You can 
open the port with e.g.: 
'iptables -I INPUT -p tcp --dport 1022 -j ACCEPT'

To continue please press [ENTER]

Reading package lists... Done
Building dependency tree 
Reading state information... Done
Hit http://za.archive.ubuntu.com/ubuntu bionic InRelease 
...
Very long delay after finding initrd image, then things went wrong
....
Found linux image: /boot/vmlinuz-4.15.0-153-generic
Found initrd image: /boot/initrd.img-4.15.0-153-generic
done

Setting up python3-gi (3.36.0-1) ...
Setting up libnet-libidn-perl (0.12.ds-3build2) ...
Setting up cloud-initramfs-copymods (0.45ubuntu1) ...
Setting up proftpd-basic (1.3.6c-2) ...
Installing new version of config file /etc/default/proftpd ...
Installing new version of config file /etc/init.d/proftpd ...
usermod: no changes

Failed to reload daemon: Connection timed out
Failed to reload daemon: Connection timed out
Failed to retrieve unit state: Connection timed out
Failed to start proftpd.service: Connection timed out
See system logs and 'systemctl status proftpd.service' for details.
invoke-rc.d: initscript proftpd, action "start" failed.
Failed to get properties: Connection timed out
invoke-rc.d: release upgrade in progress, error is not fatal
...
New CPU and RAM added
...
Setting up libvariable-magic-perl (0.62-1build2) ...
Setting up libb-hooks-op-check-perl (0.22-1build2) ...
...
Installation completed!

During the upgrade many files were updated were only the default was selected. This is a highly complex server with many services, but in the end it appears everything is working.

These are some of the configuration files that had to be chosen back to default:

postfix no configuration was selected
jail.conf
nginx.conf
/etc/services
/etc/logrotate.conf
/etc/bind/named.conf.default-zones
/etc/bind/named.conf.options
/etc/dovecot/conf.d/10-mail.conf
/etc/ssh/sshd_config
/etc/dovecot/conf.d/20-pop3.conf
/etc/default/snmpd
/etc/snmp/snmpd.conf
Configuration file '/etc/mysql/mysql.conf.d/mysqld.cnf'
/etc/default/opendkim
/etc/opendkim.conf
/etc/default/spamassassin

Finally the following cross checks were done:

  • Check if both ip addresses were available afterwards
  • Send and receive test emails
  • Check billing system health including automatic invoice generation
  • Check SMS Gateway
  • Check Bind replication
  • Check Fail2ban
  • Check if the firewall is running
  • Check if email is working
  • Would Apparmor cause issues, e.g. with the control panel?
  • Tail as many log files as possible, especially syslog

In the end, the most complex problem to troubleshoot, believe it or not, was ioncube and PHP 7.4 not working properly together.

Share this article

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to Top