How to monitor disk performance iowait on Linux

Description

When your hard disk run slow, your entire system slows down. So it figures the key to monitoring performance on any server implies monitoring the disk. Linux comes with a number of tools to assist with this operation, and this article aims to present some of the most common utilities, and some common use cases.

The Definition of IO Wait Time

To understand disk performance in Linux one has to understand what’s called io wait time. The quickest way to see IO Wait time is to use the top utility.. Referring to the diagram below, you will notice 1.3 wa This is the IO Wait Time. Although it seems a bit obscure as it’s referring to IO, it’s really just saying “How long must an idle CPU wait for the disk I/O to complete.

top is one of the first tools that you reach for when checking to see if a disk is running at maximum or degraded performance.

Historical Statistics

It’s all fine and dandy seeing what’s happening right now, but what if you needed to see historical statistics? In this article we present one utility, and the SNMP method.

SAR

SAR stands for System Activity Report and keeps track of historical system data, including CPU and disk I/O. To use the actual utility, just type sar. When you run sar, you will get historical statistics up to 10 minute minute intervals of your system that goes back to the start of the day. In the screenshot below, you will see sar output. What’s notable about the output are the spikes of 11, 14, 12, and 10. Then at 2AM an actual backup kicks off, and you see a dramatic increase in the disk I/O wait time.

At this point you might ask what is a normal range for Disk I/O wait time? In our experience, anything from 1 to 5 is normal, 10 starts getting slow, 20 is really slow, and anywhere above 20 is really very slow. These values are a bit relative though and we recommend checking your system on a regular basis to determine baselines, and experimenting with backups or the du command to test some limits. Leave us a comment to tell us what you think is normal for your system.

Installing SAR

If you’re system doesn’t have sar, then do this for Ubuntu/Debian:

apt install sysstat

Next change ENABLED=”false” to ENABLED=”true” in /etc/default/sysstat

Then

service sysstat restart

The SNMP Method

It turns out SNMP can also monitor system IO stats. To monitor exactly iowait time, use this OID but be sure to specify delta values instead of absolute values.

.1.3.6.1.4.1.2021.11.54.0

To test:

snmpwalk -v 1 -c your_community localhost 1.3.6.1.4.1.2021.11.54.0

Example of PRTG Configuration specifying Delta instead of Absolute.

Other Utilities and more SNMP

Two other notable utilities for monitoring that includes disk performance monitoring are iostat and the cat /proc/diskstats command.

iostat has the handy d flag which allow you to continuously monitor the output, for example below every two seconds:

iostat -d 2 %iowait

/proc/diskstats is used by the handy Perl script for Webmin, called Webminstats, which draws fairly comprehensive RRD data of disk operation. Here is a snippet from that Perl code:

my $module_name;
my $info  = '<strong>/proc/diskstats</strong>';
my $EMPTY = EMPTY();

###############################################################################
# ask the system info on file system
sub read_data() {

	my $r_tab = read_full_file($info);
	my @res   = @{$r_tab};
	return @res;
}

More Disk SNMP Monitoring

If you’re looking for more general SNMP monitoring of disk activity, use the following OID:

snmpwalk -v 1 -c your_community localhost 1.3.6.1.4.1.2021.13.15.1

So What’s Causing the Slow Disk

The aim of this article is just to help you determine your disk is slow. To see what’s actually slowing it down, takes more work. As a starting point we generally recommend top, and looking at the top processes by CPU to see what is busy. If you are running a web server, this only paints part of the picture, you might have to go deeper under the hood with netstat to see how many actual connections are made to the web server. Perhaps start gracefully terminating the processes one by one to see if ‘WA’ recovers.

Conclusion

Disk I/O Monitoring is key to performance. Be sure to know what you’re dealing with. If you are working with many disks, graph the data to compare workload surges and ensure they are moved away if they affect other areas.

References

Share this article

Leave a Reply

Your email address will not be published.

Scroll to Top