The precise meaning of I/O wait time in Linux (2024)

Some time ago I had a discussion with some systems guys about the exact meaning of the I/O wait timewhich is displayed by top as a percentage of total CPU time. Their answer was that it is the timespent by the CPU(s) while waiting for outstanding I/O operations to complete. Indeed, the man pagefor the top command defines this as the “time waiting for I/O completion”.

However, this definition is obviously not correct (or at least not complete), because a CPU neverspends clock cycles waiting for an I/O operation to complete. Instead, if a task running on a givenCPU blocks on a synchronous I/O operation, the kernel will suspend that task and allow other tasksto be scheduled on that CPU.

So what is the exact definition then? There is an interesting Server Fault question thatdiscussed this. Somebody came up with the following definition that describes I/O wait time as asub-category of idle time:

iowait is time that the processor/processors are waiting (i.e. is in an idle state and doesnothing), during which there in fact was outstanding disk I/O requests.

That makes perfect sense for uniprocessor systems, but there is still a problem with that definitionwhen applied to multiprocessor systems. In fact, “idle” is a state of a CPU, while “waiting for I/Ocompletion” is a state of a task. However, as pointed out earlier, a task waiting for outstandingI/O operations is not running on any CPU. So how can the I/O wait time be accounted for on a per-CPUbasis?

For example, let’s assume that on an otherwise idle system with 4 CPUs, a single, completely I/Obound task is running. Will the overall I/O wait time be 100% or 25%? I.e. will the I/O wait time be100% on a single CPU (and 0% on the others), or on all 4 CPUs? This can be easily checked by doing asimple experiment. One can simulate an I/O bound process using the following command, which willsimply read data from the hard disk as fast as it can:

dd if=/dev/sda of=/dev/null bs=1MB

Note that you need to execute this as root and if necessary change the input file to theappropriate block device for your hard disk.

Looking at the CPU stats in top (press 1 to get per-CPU statistics), you should see something likethis:

%Cpu0 : 3,1 us, 10,7 sy, 0,0 ni, 3,5 id, 82,4 wa, 0,0 hi, 0,3 si, 0,0 st%Cpu1 : 3,6 us, 2,0 sy, 0,0 ni, 90,7 id, 3,3 wa, 0,0 hi, 0,3 si, 0,0 st%Cpu2 : 1,0 us, 0,3 sy, 0,0 ni, 96,3 id, 2,3 wa, 0,0 hi, 0,0 si, 0,0 st%Cpu3 : 3,0 us, 0,3 sy, 0,0 ni, 96,3 id, 0,3 wa, 0,0 hi, 0,0 si, 0,0 st

This output indicates that a single I/O bound task only increases the I/O wait time on a singleCPU. Note that you may see that occasionally the task “switches” from one CPU to another. That isbecause the Linux kernel tries to schedule a task on the CPU it ran last (in order to improve CPUcache hit rates), but this is not always possible and the task is moved on another CPU. On somesystems, this may occur so frequently that the I/O wait time appears to be distributed over multiple CPUs,as in the following example:

%Cpu0 : 5.7 us, 5.7 sy, 0.0 ni, 50.5 id, 34.8 wa, 3.3 hi, 0.0 si, 0.0 st%Cpu1 : 3.0 us, 3.3 sy, 0.0 ni, 72.5 id, 20.9 wa, 0.3 hi, 0.0 si, 0.0 st%Cpu2 : 7.0 us, 4.3 sy, 0.0 ni, 62.0 id, 26.7 wa, 0.0 hi, 0.0 si, 0.0 st%Cpu3 : 4.3 us, 2.3 sy, 0.0 ni, 89.6 id, 3.7 wa, 0.0 hi, 0.0 si, 0.0 st

See Also

Quick Guide to Fix Linux IOWait Issue

Nevertheless, assuming that dd is the only task doing I/O on the system, there can be at most oneCPU in state I/O wait at any given point in time. Indeed, 34.8+20.9+26.7+3.7=86.1 which is close to butlower than 100.

To make the experiment more reproducible, we can use the taskset command to “pin” a processto a given CPU (Note that the first command line argument is not the CPU number, but a mask):

taskset 1 dd if=/dev/sda of=/dev/null bs=1MB

Another interesting experiment is to run a CPU bound task at the same time on the same CPU:

taskset 1 sh -c "while true; do true; done"

The I/O wait time now drops to 0 on that CPU (and also remains 0 on the other CPUs), while user andsystem time account for 100% CPU usage:

%Cpu0 : 80,3 us, 15,5 sy, 0,0 ni, 0,0 id, 0,0 wa, 0,0 hi, 4,3 si, 0,0 st%Cpu1 : 4,7 us, 3,4 sy, 0,0 ni, 91,3 id, 0,0 wa, 0,0 hi, 0,7 si, 0,0 st%Cpu2 : 2,3 us, 0,3 sy, 0,0 ni, 97,3 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st%Cpu3 : 2,7 us, 4,3 sy, 0,0 ni, 93,0 id, 0,0 wa, 0,0 hi, 0,0 si, 0,0 st

That is expected because I/O wait time is a sub-category of idle time, and the CPU to which wepinned both tasks is never idle.

These findings allow us to deduce the exact definition of I/O wait time:

For a given CPU, the I/O wait time is the time during which that CPU was idle (i.e. didn’t executeany tasks) and there was at least one outstanding disk I/O operation requested by a task scheduledon that CPU (at the time it generated that I/O request).

Note that the nuance is not innocent and has practical consequences. For example, on a system withmany CPUs, even if there is a problem with I/O performance, the observed overall I/O wait time maystill be small if the problem only affects a single task. It also means that while it is generallycorrect to say that faster CPUs tend to increase I/O wait time (simply because a faster CPU tends tobe idle more often), that statement is no longer true if one replaces “faster” by “more”.

The precise meaning of I/O wait time in Linux (2024)

References