Operating Systems

How to monitor disk performance iowait on Linux

Description

When your hard disk run slow, your entire system slows down. So it figures the key to monitoring performance on any server implies monitoring the disk. Linux comes with a number of tools to assist with this operation, and this article aims to present some of the most common utilities, and some common use cases.

The Definition of IO Wait Time

To understand disk performance in Linux one has to understand what’s called io wait time. The quickest way to see IO Wait time is to use the top utility.. Referring to the diagram below, you will notice 1.3 wa This is the IO Wait Time. Although it seems a bit obscure as it’s referring to IO, it’s really just saying “How long must an idle CPU wait for the disk I/O to complete.“. The caveat is it’s not only waiting for the disk – the entire “IO” subsystem might be playing a role. As a rule of thumb though, you don’t really want more than 1.0.

top is one of the first tools that you reach for when checking to see if a disk is running at maximum or degraded performance and it’s universal, so learn to use it.

Historical Statistics

It’s all fine and dandy seeing what’s happening right now, but what if you needed to see historical statistics? In this article we provide a few ways os testing, some utilities, and the SNMP method.

FIO

Fio is our favourite as of July 2023. Fio has many options but can do a really grand 1 Gigabyte file test and show you what’s currently happening.

This is the command we prefer:

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75

This will write a 1 GB file. Remember to delete it.

Benchmarks? Let’s show you some that we got:

Oldish HP Laptop with local SSD: 7 seconds
Host in SA, 3 seconds
Dedicated server in SA, 7 seconds
Mirror 1 server in SA, 39 seconds.
Random AWS server with HyperV and bare metal: 16 minutes
Low powered Supermicro Proxmox with ZFS over iSCSI via TrueNAS, 7 minutes
Random host in SA, 7 minutes
Random host in Germany, 3 seconds

As you can see times are from 3 seconds to 16 minutes. Does this means everything else is slow? It’s quite possible but sometimes perception is worse (or better) than reality.

Script

Fio for all it’s power doesn’t actually clearly show the total time spent so we’ve created the ultimate script to keep track of disk speed tests.

The script relies on fio:

sudo apt install fio

Below is the ultimate disk speed test Bash script, last updated 28 March 2025. See the end of this article for some good results.

#!/bin/bash

# Capture the optional input parameter, if provided
input_param=$1

# Start timer to measure disk speed (in nanoseconds)
start=$(date +%s%N)

# Run fio and capture the output
fio_output=$(fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75 2>&1 | tee /dev/tty)

# Remove the temporary test file
rm random_read_write.fio

# End the timer (in nanoseconds)
end=$(date +%s%N)

# Calculate the time difference in nanoseconds
duration_ns=$((end - start))

# Convert duration to seconds and milliseconds
duration_s=$((duration_ns / 1000000000))
duration_ms=$((duration_ns / 1000000))

# Get current date and time in the format YYYY-MM-DD h:i:s
current_datetime=$(date '+%Y-%m-%d %H:%M:%S')

# Extract the read and write lines from fio output
read_line=$(echo "$fio_output" | grep "^ *read:")
write_line=$(echo "$fio_output" | grep "^ *write:")

# Extract the relevant values from the read line
read_iops=$(echo "$read_line" | grep -oP "IOPS=\K[^\s,]+")
read_bw=$(echo "$read_line" | grep -oP "\(\K[^\)]+MB/s")

# Extract the relevant values from the write line
write_iops=$(echo "$write_line" | grep -oP "IOPS=\K[^\s,]+")
write_bw=$(echo "$write_line" | grep -oP "\(\K[^\)]+MB/s")

# Prepare the result string
if [ -n "$input_param" ]; then
result="$current_datetime The disk speed test took $duration_ms milliseconds on $input_param to complete / read: IOPS=$read_iops ($read_bw) / write: IOPS=$write_iops ($write_bw)"
else
result="$current_datetime The disk speed test took $duration_ms milliseconds to complete / read: IOPS=$read_iops ($read_bw) / write: IOPS=$write_iops ($write_bw)"
fi

# Output a blank line
echo

# Append the result so that a history is kept
echo "$result" >> speed-results.txt

# Display the results
cat speed-results.txt

On FreeBSD skip the --ioengine parameter.

Sample outputs

[table id=4 /]

SAR

SAR stands for System Activity Report and keeps track of historical system data, including CPU and disk I/O. To use the actual utility, just type sar. When you run sar, you will get historical statistics up to 10 minute minute intervals of your system that goes back to the start of the day. In the screenshot below, you will see sar output. What’s notable about the output are the spikes of 11, 14, 12, and 10. Then at 2AM an actual backup kicks off, and you see a dramatic increase in the disk I/O wait time.

At this point you might ask what is a normal range for Disk I/O wait time? In our experience, anything from 1 to 5 is normal, 10 starts getting slow, 20 is really slow, and anywhere above 20 is really very slow. These values are a bit relative though and we recommend checking your system on a regular basis to determine baselines, and experimenting with backups or the du command to test some limits. Leave us a comment to tell us what you think is normal for your system.

Installing SAR

If you’re system doesn’t have sar, then do this for Ubuntu/Debian:

apt install sysstat

Next change ENABLED=”false” to ENABLED=”true” in /etc/default/sysstat

Then

service sysstat restart

The SNMP Method

It turns out SNMP can also monitor system IO stats. To monitor exactly iowait time, use this OID but be sure to specify delta values instead of absolute values.

.1.3.6.1.4.1.2021.11.54.0

To test:

snmpwalk -v 1 -c your_community localhost 1.3.6.1.4.1.2021.11.54.0

Example of PRTG Configuration specifying Delta instead of Absolute.

Other Utilities and more SNMP

Two other notable utilities for monitoring that includes disk performance monitoring are iostat and the cat /proc/diskstats command. If your CentOS system doesn’t have iostat install, install it so yum install iostat

iostat

iostat has the handy d flag which allow you to continuously monitor the output, for example below every two seconds:

iostat -d 2 %iowait

If you don’t have iostat on your Ubuntu rig, do this:

sudo apt install sysstat -y

cat /proc/diskstats

/proc/diskstats is used by the handy Perl script for Webmin, called Webminstats, which draws fairly comprehensive RRD data of disk operation. Here is a snippet from that Perl code:

my $module_name;
my $info  = '/proc/diskstats';
my $EMPTY = EMPTY();

###############################################################################
# ask the system info on file system
sub read_data() {

	my $r_tab = read_full_file($info);
	my @res   = @{$r_tab};
	return @res;
}

More Disk SNMP Monitoring

If you’re looking for more general SNMP monitoring of disk activity, use the following OID:

snmpwalk -v 1 -c your_community localhost 1.3.6.1.4.1.2021.13.15.1

So What’s Causing the Slow Disk

The aim of this article is just to help you determine your disk is slow. To see what’s actually slowing it down, takes more work. As a starting point we generally recommend top, and looking at the top processes by CPU to see what is busy. If you are running a web server, this only paints part of the picture, you might have to go deeper under the hood with netstat to see how many actual connections are made to the web server. Perhaps start gracefully terminating the processes one by one to see if ‘WA’ recovers.

Conclusion

Disk I/O Monitoring is key to performance. Be sure to know what you’re dealing with. If you are working with many disks, graph the data to compare workload surges and ensure they are moved away if they affect other areas.

History of Good Disk Speed Results

Additional statistics provided by one of my all time favourite softwares, HDSentinel!

https://www.hdsentinel.com/hard_disk_sentinel_linux.php

3881 ms

2025-03-28 13:03:02 The disk speed test took 3881 milliseconds to complete / read: IOPS=142k (582MB/s) / write: IOPS=47.5k (195MB/s)

Operating parameters

Config: Single drive (no RAID), EXT4 (df -T)
Host: Dell R530 (PCIe 3)
Drive bus type: PCIe-4
OS: Proxmox VE 8.3, no applications running
HDD Device 2: /dev/nvme2
NVME type: TLC
HDD Model ID : Samsung SSD 990 PRO 2TB
HDD Revision : 4B2QJXD7
Interface : NVMe
Temperature : 30 °C
Highest Temp.: 30 °C
Health : 100 %
Performance : 100 %
Total written: 40.14 GB
Comments: This was a fresh drive imported from China via buydig Store

How to monitor disk performance iowait on Linux