How to monitor disk performance iowait on Linux

How to monitor disk performance iowait on Linux

Description

When your hard disk run slow, your entire system slows down. So it figures the key to monitoring performance on any server implies monitoring the disk. Linux comes with a number of tools to assist with this operation, and this article aims to present some of the most common utilities, and some common use cases.

The Definition of IO Wait Time

To understand disk performance in Linux one has to understand what’s called io wait time. The quickest way to see IO Wait time is to use the top utility.. Referring to the diagram below, you will notice 1.3 wa This is the IO Wait Time. Although it seems a bit obscure as it’s referring to IO, it’s really just saying “How long must an idle CPU wait for the disk I/O to complete.“. The caveat is it’s not only waiting for the disk – the entire “IO” subsystem might be playing a role. As a rule of thumb though, you don’t really want more than 1.0.

How to monitor disk performance iowait on Linux

top is one of the first tools that you reach for when checking to see if a disk is running at maximum or degraded performance and it’s universal, so learn to use it.

Historical Statistics

It’s all fine and dandy seeing what’s happening right now, but what if you needed to see historical statistics? In this article we provide a few ways os testing, some utilities, and the SNMP method.

FIO

Fio is our favourite as of July 2023. Fio has many options but can do a really grand 1 Gigabyte file test and show you what’s currently happening.

This is the command we prefer:

fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75

This will write a 1 GB file. Remember to delete it.

Benchmarks? Let’s show you some that we got:

  • Oldish HP Laptop with local SSD: 7 seconds
  • Host in SA, 3 seconds
  • Dedicated server in SA, 7 seconds
  • Mirror 1 server in SA, 39 seconds.
  • Random AWS server with HyperV and bare metal: 16 minutes
  • Low powered Supermicro Proxmox with ZFS over iSCSI via TrueNAS, 7 minutes
  • Random host in SA, 7 minutes
  • Random host in Germany, 3 seconds

As you can see times are from 3 seconds to 16 minutes. Does this means everything else is slow? It’s quite possible but sometimes perception is worse (or better) than reality.

Script

Fio for all it’s power doesn’t actually clearly show the total time spent so we’ve created the ultimate script to keep track of disk speed tests.

The script relies on fio:

sudo apt install fio

Here is the ultimate disk speed test Bash script:

#!/bin/bash

# Capture the optional input parameter, if provided
input_param=$1

# Start timer to measure disk speed
start=$(date +%s)

# Do heavy reads at 75% and writes at 25% on a temporary test file
fio --randrepeat=1 --ioengine=libaio --direct=1 --gtod_reduce=1 --name=test --filename=random_read_write.fio --bs=4k --iodepth=64 --size=1G --readwrite=randrw --rwmixread=75

# Remove the temporary test file
rm random_read_write.fio

# End the timer
end=$(date +%s)

# Calculate the time difference
duration=$((end - start))

# Get current date and time in the format YYYY-MM-DD h:i:s
current_datetime=$(date '+%Y-%m-%d %H:%M:%S')

# Display the number of seconds passed
#echo "The command took $duration seconds to complete."

# Prepare a result string based on the optional parameter
if [ -n "$input_param" ]; then
result="$current_datetime The disk speed test took $duration seconds on $input_param to complete."
else
result="$current_datetime The disk speed test took $duration seconds to complete."
fi

# Output a blank line
echo

# Append the result so that a history is kept
echo "$result" >> speed-results.txt

# Display the results
cat speed-results.txt

On FreeBSD skip the --ioengine parameter.

Sample outputs

DescriptionReadWriteTime
Single NVMe101MB/s33.6MB/s19
Host SSD Raid115MB/s38.4MB/s17
RAID 5 + Cache on TrueNAS20
RAID 5 Magnetic6028kB/s2014kB/s148

SAR

SAR stands for System Activity Report and keeps track of historical system data, including CPU and disk I/O. To use the actual utility, just type sar. When you run sar, you will get historical statistics up to 10 minute minute intervals of your system that goes back to the start of the day. In the screenshot below, you will see sar output. What’s notable about the output are the spikes of 11, 14, 12, and 10. Then at 2AM an actual backup kicks off, and you see a dramatic increase in the disk I/O wait time.

How to monitor disk performance iowait on Linux

At this point you might ask what is a normal range for Disk I/O wait time? In our experience, anything from 1 to 5 is normal, 10 starts getting slow, 20 is really slow, and anywhere above 20 is really very slow. These values are a bit relative though and we recommend checking your system on a regular basis to determine baselines, and experimenting with backups or the du command to test some limits. Leave us a comment to tell us what you think is normal for your system.

Installing SAR

If you’re system doesn’t have sar, then do this for Ubuntu/Debian:

apt install sysstat

Next change ENABLED=”false” to ENABLED=”true” in /etc/default/sysstat

Then

service sysstat restart

The SNMP Method

It turns out SNMP can also monitor system IO stats. To monitor exactly iowait time, use this OID but be sure to specify delta values instead of absolute values.

.1.3.6.1.4.1.2021.11.54.0

To test:

snmpwalk -v 1 -c your_community localhost 1.3.6.1.4.1.2021.11.54.0

Example of PRTG Configuration specifying Delta instead of Absolute.

How to monitor disk performance iowait on Linux

Other Utilities and more SNMP

Two other notable utilities for monitoring that includes disk performance monitoring are iostat and the cat /proc/diskstats command. If your CentOS system doesn’t have iostat install, install it so yum install iostat

iostat

iostat has the handy d flag which allow you to continuously monitor the output, for example below every two seconds:

iostat -d 2 %iowait

If you don’t have iostat on your Ubuntu rig, do this:

sudo apt install sysstat -y

cat /proc/diskstats

/proc/diskstats is used by the handy Perl script for Webmin, called Webminstats, which draws fairly comprehensive RRD data of disk operation. Here is a snippet from that Perl code:

my $module_name;
my $info  = '/proc/diskstats';
my $EMPTY = EMPTY();

###############################################################################
# ask the system info on file system
sub read_data() {

	my $r_tab = read_full_file($info);
	my @res   = @{$r_tab};
	return @res;
}

More Disk SNMP Monitoring

If you’re looking for more general SNMP monitoring of disk activity, use the following OID:

snmpwalk -v 1 -c your_community localhost 1.3.6.1.4.1.2021.13.15.1

So What’s Causing the Slow Disk

The aim of this article is just to help you determine your disk is slow. To see what’s actually slowing it down, takes more work. As a starting point we generally recommend top, and looking at the top processes by CPU to see what is busy. If you are running a web server, this only paints part of the picture, you might have to go deeper under the hood with netstat to see how many actual connections are made to the web server. Perhaps start gracefully terminating the processes one by one to see if ‘WA’ recovers.

Conclusion

Disk I/O Monitoring is key to performance. Be sure to know what you’re dealing with. If you are working with many disks, graph the data to compare workload surges and ensure they are moved away if they affect other areas.

See Also

References

Share this article

Leave a Reply

Your email address will not be published. Required fields are marked *

Scroll to Top