When you ZFS Pool runs out on Proxmox

Background

When your ZFS pool runs out on Proxmox, your entire host and possibly many of the VMs will crash and stop working. You won’t even be able to log on.

By looking at journal -f you’ll see errors like the below repeating:

Oct 05 07:51:40 hvX pve-ha-lrm[3127]: unable to write lrm status file - unable to open file '/etc/pve/nodes/hvX/lrm_status.tmp.3127' - Input/output error

Oct 05 07:51:51 hvX pvestatd[3039]: authkey rotation error: cfs-lock 'authkey' error: got lock request timeout

You can easily identify the problem with df -h. Although you’re not completely out of disk space, this is how ZFS behaves:

root@hvX:~# df -h
Filesystem Size Used Avail Use% Mounted on
udev 63G 0 63G 0% /dev
tmpfs 13G 5.0M 13G 1% /run
rpool/ROOT/pve-1 97G 2.1G 95G 3% /
tmpfs 63G 40M 63G 1% /dev/shm
efivarfs 128K 25K 99K 21% /sys/firmware/efi/efivars
tmpfs 5.0M 0 5.0M 0% /run/lock
tmpfs 63G 0 63G 0% /tmp
tmpfs 1.0M 0 1.0M 0% /run/credentials/systemd-journald.service
rpool 95G 128K 95G 1% /rpool
rpool/var-lib-vz 5.6T 5.5T 95G 99% /var/lib/vz

ChatGPT describes it like this:

When ZFS pools get that tight, they start throwing “Input/output error” instead of a gentler “no space left” because ZFS wants a bit of headroom (typically 5–10%) for metadata and copy-on-write transactions.

You can further confirm the problem by going to the directory mentioned in the journal and testing creating a random directory yourself:

cd /etc/pve/nodes/hv01
root@hv01:/etc/pve/nodes/hv01# mkdir t
mkdir: cannot create directory ‘t’: Input/output error

Next, let’s start analysing the situation starting with zfs list:

root@hvX:/etc/pve/nodes/hv01# zfs list
NAME USED AVAIL REFER MOUNTPOINT
rpool 6.75T 94.6G 96K /rpool
rpool/ROOT 2.08G 94.6G 96K /rpool/ROOT
rpool/ROOT/pve-1 2.08G 94.6G 2.08G /
rpool/data 1.31T 94.6G 96K /rpool/data
rpool/data/vm-100-disk-0 15.9G 94.6G 15.9G -
rpool/data/vm-100-disk-1 120K 94.6G 120K -
...
rpool/var-lib-vz 5.44T 94.6G 5.44T /var/lib/vz

Next, let’s sort the right parent directory by size to see what’s big:

root@hvX:/etc/pve/nodes/hv01# du -h /var/lib/vz --max-depth=2 | sort -h | tail -n 20
512 /var/lib/vz/images
512 /var/lib/vz/template/cache
17G /var/lib/vz/template
17G /var/lib/vz/template/iso
5.5T /var/lib/vz
5.5T /var/lib/vz/dump

Finally, let’s see why dump is so big:

root@hvX:/etc/pve/nodes/hvX# ls -lhS /var/lib/vz/dump | head -n 20
total 5.5T
-rw-r--r-- 1 root root 507G Sep 28 03:35 vzdump-qemu-X-2025_09_28-03_13_33.vma.zst
-rw-r--r-- 1 root root 506G Sep 27 03:36 vzdump-qemu-X-2025_09_27-03_13_50.vma.zst
-rw-r--r-- 1 root root 467G Oct 3 03:40 vzdump-qemu-X-2025_10_03-03_17_56.vma.zst
-rw-r--r-- 1 root root 466G Oct 2 03:37 vzdump-qemu-X-2025_10_02-03_17_25.vma.zst
-rw-r--r-- 1 root root 466G Oct 1 03:34 vzdump-qemu-X-2025_10_01-03_14_14.vma.zst
-rw-r--r-- 1 root root 466G Sep 30 03:35 vzdump-qemu-X-2025_09_30-03_13_41.vma.zst
...

Well there you have it. You set up backups at the Datacenter level, choose the default of snapshot thinking it’s just going to do incremental changes like PBS, but instead, it recreates the entire VM every night. No delta. What a disaster. Your attempt at redundancy lead to a failure.

Next, time to clean up, you’ll have to decide what to rm:

root@hvX:/etc/pve/nodes/hv01# cd /var/lib/vz/dump
root@hvX:/var/lib/vz/dump# rm WHATEVER YOU DECISION IS

Will Proxmox now just start working? Nope, you’ll have to restart some services. I was lucky and just restarting below which brought the UI back up.
Note: Some VMs were still crashed status.

systemctl restart pve-cluster

I had to force stop and start a few VMs, and then the backup that caused the crash in the first place, was still locked.

qm unlock Y

Conclusion

Don’t use cluster level backups unless you know what you’re doing.

Share this article

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to Top