Another day, another crash.
Background
Proxmox VM with Laravel Forge installed, Ubuntu 24.04. I don’t really update these machines a lot because they just work, but as it happens today I updated it with apt dist-upgrade
and quite a lot of files were upgraded. During the day I also rebooted once and it worked. This is one of our oldest VMs.
I noticed swap files issues earlier in the day using Spatie’s Laravel Server Backup and decided to also upgrade the RAM from 8 GB to 12 GB. When I noticed hotplug was off, I added it for RAM and CPU.
Then I rebooted. Then, the VM got stuck at:
SeaBIOS …
…
“Booting from Hard Disk…”
Panic sets in.
I look at the log file and see this error:
kvm: total memory for NUMA nodes (0x3fffffff) should equal RAM size (0x40000000) TASK ERROR: start failed: QEMU exited with code 1
and then this one:
TASK ERROR: NUMA needs to be enabled for memory hotplug
Mmm I think, let’s revert those changes.
Nothing, the machine still doesn’t boot. I try 5 times.
Now I panic even more. This is an iSCSI TrueNAS Core NAS, and it’s on an internal network. I have to first raise the VPN to see what’s up.
In the meanwhile I start a tail on the Hypervisor using journalctl -f
and I don’t really see any out of the ordinary. I do notice a GrandWazoo spelling mistake and quickly do a pull request on GitHub.
Still no luck,No luck, VM stuck at booting SeaBIOS.
I panic even further, thinking I’m going to have to update TrueNAS, but I can’t even reach the internet:
TimeoutError(): Automatic update check failed. Please check system network settings.
Anyway, this is just a symptom of panic and anyway there are other VMs on this NAS that I don’t want to shut down for an upgrade and the inevitable reboot.
Next, I try using another BIOS, EFI, which I hate. This doesn’t work.
I put the entire log file into ChatGPT which notices I have been stuffing around with EFI and suggests maybe I have a BIOS issue.
Next I start another VM on the same disk, which starts just fine. Some of the drastic panic resides.
Eventually I paste the entire journal startup entry into Claud.ai which sanely tells me the VM is actually starting up, so next place to look would be the VM’s boot partition. It follows with some really easy to follow steps and this is what I did:
- SSH to Hypervisor
- cd /var/lib/vz/templates/iso
- wget latest Ubuntu 24.04 Desktop (to get the Live CD)
- Boot Ubuntu 24.04
- fdisk -l
- Thankfully, I immediately see the disk
- mkdir /mnt/repair
- mount /dev/sda1 /mnt/repair
- The disk mounts! Yay!
- ls -la /mnt/repair
- All the files are there!
- Claud thinks I have a separate boot partition – I don’t do that so /boot is also there
- Next, these are the commands I used:
mount --bind /dev /mnt/repair/dev mount --bind /proc /mnt/repair/proc mount --bind /sys /mnt/repair/sys mount --bind /run /mnt/repair/run chroot /mnt/repair grub-install /dev/sda update-grub
I get a message:
Installing for i386-pc platform.
Then I proceed to undo:
exit umount /mnt/repair/dev /mnt/repair/proc /mnt/repair/sys /mnt/repair/run cd / umount /mnt/repair reboot
Hail Mary. It worked!
Bootloader corruption sorted.