Proxmox disks sometimes break. It’s not supposed to, but when it happens it can be devastating.
I only use defaults. Nothing fancy. Whatever the operating system install gave me. Yet, on many of my Debian/Ubuntu boxes I’ve still had some crashed.
Fortunately we also use Proxmox Backup Server so generally we can recover. However, it’s still not nice and is a tense hole where you can find yourself in, not fun especially when you are running VMs for clients.
Recently I had another one, the dreaded screen below:
Note the time delays, it took around 3 second to get to this:
Btrfs loaded, crc32c=crc32c-generic, zoned=yes, fsverity=yes
Then nothing happens. Minutes of waiting, apparently almost 3.5 minutes, then this crypted message:
random: crng init done
The panic already stars setting in after 3 seconds because it’s blatantly obvious that the OS is stuck. The hopefuls wait and wait but after that long wait, and that cryptic message, you’re pretty much screwed.
At this point you ask yourself things like:
- What did I do to deserve this?
- I pray for a recent backup
- I pray for any backup
- I hope yesterday’s backup is good enough
- Help Proxmox forums, help
- Help, google, help
- Help, chatgpt, help
- Am I in the right job?
- etc.
The point really is there isn’t so many people who can help you. Disk crashes are nasty and unpredictable and chances of recovery slim to poor.
Unfortunately there isn’t such a quick thing as “fsck” in Proxmox. That would be the most obvious next step right?
Just before we carry on moaning, here is another screenshot of a broken disk. You’ll note no btrfs output this time, instead, the long hang happens at
hid-generic 0003:0627:0001.0001: input,hidraw0: USB HID v0.01 Mouse [QEMU QEMU USB Tablet] on usb-0000:00:01.2-1/input0
After that I got the same `random: crng init done` but I didn’t hang around to produce screenshots as I had to get on with the recovery.
Miraculously I managed to fix the first problem by doing this:
- Download latest Ubuntu LTS 24.04 Live
- Do not connect to the internet
- Try Ubuntu
- Start a terminal
- lsblk to orientate
I then tried fsck /dev/sda
That broke because you don’t fix disks with fsck, rather partitions. You’ll get `Superblock invalid` if you do this.
then I tried fsck /dev/sda1
That worked!
With regards to the second problem, after 5 tries Proxmox displayed the graphics and I was able to run fsck again:
Once I got going, I had to press Y about 10 times.
And boom!! VM running again.
Upgrade of hypervisor started: 1AM
Fixed final problem: 6AM