Table of Contents
Prognosis
One or even a few of your Proxmox VE VMs might just stop working. When you look at the icon in the Proxmox explorer you see a yellow triangle and when you hover over the yellow triangle, you see “io-error”. No other visible clues are present in the UI. When you go the console you may see the VMs output but it will seem like the console is frozen. At times you may see some disk errors being output.
This occurs because you’ve over provisioned LVM disks. In other words, you took a chance, filled up one or more than one of the disks too quickly, and now the entire server is crashing.
This is not a good situation because due to a lack of space these servers are frozen. Chances of disk corruption is good.
To find problem events in Proxmox use journalctl
. For example:
journalctl | grep -i "warning"
If your Journal is extremely large, in other words months or years of logs, you might want to vacuum it first:
journalctl --vacuum-time=8d
Now look for warnings again.
If there is an over provisioning issue, you will see events similar to the following:
Nov 27 18:06:19 hostname kernel: device-mapper: thin: 253:8: switching pool to out-of-data-space (queue IO) mode Nov 27 18:06:28 hostname lvm[760]: WARNING: Thin pool pve-data-tpool data is now 100.00% full. Nov 27 18:07:21 hostname kernel: device-mapper: thin: 253:8: switching pool to out-of-data-space (error IO) mode
The solution is to free up disk space on the full volume.
Here are two possibilities:
- Delete images that aren’t needed anymore
- Do a live migration to another disk
If you have spare disk you can do a live migration of a VM. You can also delete images that are not in use by browsing to the disk’s “VM Disks” tab. Use the summary tab to see how much space you’ve freed.
In one situation, when I had no ability to migrate or free up space, I had to start a broken server, clean up swap files, and then I could continue some other maintenance. In a way starting the server up again and getting it working was just luck, but I had to switch off other servers. Not a good day at the office.
When does it trim?
Even if you delete space, the disk space might not come available. It’s not Proxmox’s job to trim the freed up space, but rather the operating system. You can see when the operating system is going to do it’s job like so:
# systemctl list-timers --all | grep fstrim Mon 2025-03-03 00:00:00 SAST 6 days left Mon 2025-02-24 00:00:00 SAST 21min ago fstrim.timer fstrim.service
The above is from Ubuntu. There you can see this particular Ubuntu 20.04 trims once a week at 00:00:00 on a Monday. Interestingly on another Ubuntu 22.04 it trims once a week at 01:05:21 on a Monday.
How to manually trim on Ubuntu
On Ubuntu you can manually trim like this:
sudo fstrim -av
-a = all file systems that supports trim
-v = verbose
Other ways of seeing impending failure
See here: https://kb.vander.host/disk-management/oversubscribing-lvm-thin-on-proxmox-warnings/
pvesm status | grep lvmthin | awk ' $7 >=70 {print $1,$7}'
What else can you expect on the UI log
You will also see these messages being output when doing a migration:
2025-02-23 08:04:41 WARNING: You have not turned on protection against thin pools running out of space. 2025-02-23 08:04:41 WARNING: Set activation/thin_pool_autoextend_threshold below 100 to trigger automatic extension of thin pools before they get full. 2025-02-23 08:04:41 Logical volume "vm-xxx-disk-0" created. 2025-02-23 08:04:41 WARNING: Sum of all thin volume sizes (310.00 GiB) exceeds the size of thin pool pve/data and the amount of free space in volume group (16.00 GiB).
When I had this problem migrated would just make a dead stop at around 5%. I tried twice and both times it just froze in the same spot.