What actually happens in asynchronous IO

Question

I keep reading about why asynchronous IO is better than synchronous IO, which is because in a-sync IO, your program can keep running, while in sync IO you're blocked until operation is finished.
I do not understand this saying because using sync IO (such as write()) the kernel writes the data to the disk - it doesn't happen by itself. The kernel do need CPU time in order to do it.
So in a-sync IO, it needs it as well, which might result in context switch from my application to the kernel. So it's not really blocking, but there cpu cycles do need to run this operation.

Is that correct?
Is the difference between those two that we assume disk access is slow, so compared to sync IO where you wait for the data to be written to disk, in a-sync IO the time you wait for it to be written to disk can be used to continue doing application processing, and the kernel part of writing it to disk is small?
Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?

Examples of sync IO:

write()

Examples of async IO:

io_uring (as I understand has zero copy as well, so it's a benefit)
spdk (should be best, though I don't understand how to use it)
aio

"the kernel writes the data to the disk - it doesn't happen by itself" Actually, the kernel formulates a message to the disk controller, and then the disk controller does do the actual writing by itself without help from the processor or the OS. Modern disk interfaces that support DMA will actually do the fetching of the data from system RAM by themselves -- the kernel will just give the disk controller the address of the data (like a pointer, but a physical address not a virtual address). — Ben Voigt, Commented Jan 4, 2021 at 22:05
Yes, the total amount of work done by the system with async output is usually at least that of synchronous. However, users tend to complain more about a user interface pausing for a fraction of a second than they do about CPU load. If a user interface regularly writes a file in response to some minor user action, synchronous output will mean the user interface keeps pausing or hanging while a file is being written. Async output allows the program to continue responding to the user, even if the output isn't complete. — Peter, Commented Jan 4, 2021 at 22:08
The disk is the slow part, not the CPU. So if you have something else to do, use async I/O. If you're going to wait anyway, then there's no reason to do it. — stark, Commented Jan 4, 2021 at 22:08
when you do a write, you just have to wait for the data to be written to the kernel's buffer cache, not until it is written to the disk. Writing to the disk will go on asynchronously after the write returns. If you want a fully synchronous write, you'll need to use fsync as well. — Chris Dodd, Commented Jan 4, 2021 at 22:27

Brendan · Accepted Answer · 2021-01-05 00:16:26Z

I do not understand this saying because using sync IO (such as write()) the kernel writes the data to the disk - it doesn't happen by itself. The kernel do need CPU time in order to do it.

No. Most modern devices are able to transfer data to/from RAM by themselves (using DMA or bus mastering).

For an example; the CPU might tell a disk controller "read 4 sectors into RAM at address 0x12345000" and then the CPU can do anything else it likes while the disk controller does the transfer (and will be interrupted by an IRQ from the disk controller when the disk controller has finished transferring the data).

However; for modern systems (where you can have any number of processes all wanting to use the same device at the same time) the device driver has to maintain a list of pending operations. In this case (under load); when the device generates an IRQ to say that it finished an operation the device driver responds by telling the device to start the next "pending operation". That way the device spends almost no time idle waiting to be asked to start the next operation (much better device utilization) and the CPU spends almost all of its time doing something else (between IRQs).

Of course often hardware is more advanced (e.g. having an internal queue of operations itself, so driver can tell it to do multiple things and it can start the next operation as soon as it finished the previous operation); and often drivers are more advanced (e.g. having "IO priorities" to ensure that more important stuff is done first rather than just having a simple FIFO queue of pending operations).

Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?

Lets say that you get info from deviceA (while CPU and deviceB are idle); then process that info a little (while deviceA and deviceB are idle); then write the result to deviceB (while deviceA and CPU are idle). You can see that most hardware is doing nothing most of the time (poor utilization).

With asynchronous IO; while deviceA is fetching the next piece of info the CPU can be processing the current piece of info while deviceB is writing the previous piece of info. Under ideal conditions (no speed mismatches) you can achieve 100% utilization (deviceA, CPU and deviceB are never idle); and even if there are speed mismatches (e.g. deviceB needs to wait for CPU to finish processing the current piece) the time anything spends idle will be minimized (and utilization maximized as much as possible).

The other alternative is to use multiple tasks - e.g. one task that fetches data from deviceA synchronously and notifies another task when the data was read; a second task that waits until data arrives and processes it and notifies another task when the data was processed; then a third task that waits until data was processed and writes it to deviceB synchronously. For utilization; this is effectively identical to using asynchronous IO (in fact it can be considered "emulation of asynchronous IO"). The problem is that you've added a bunch of extra overhead managing and synchronizing multiple tasks (more RAM spent on state and stacks, task switches, lock contention, ...); and made the code more complex and harder to maintain.

Thanks. The thing is I see this on top: %Cpu10 : 0.0 us, 99.7 sy, 0.0 ni, 0.3 id, 0.0 wa, 0.0 hi, 0.0 si, 0.0 st. I have an nvme disk. I don't see it wa (wait). Does it mean I write inefficiently ? I remember perf'ing my application, and I saw most of the cycles goes to kernel with functions such as ~ copy 64 (I don't remember the exact names) — hudac, Commented Jan 5, 2021 at 13:41
According to what you say, if fetching the info is CPU intensive, using a-sync io will help, because while disk writes I can fetch the info. If fetching CPU isn't intensive, then this thread just "waste time" on writing the data. Then there's no meaning to change to a-sync io. — hudac, Commented Jan 5, 2021 at 15:52
@hudac: For high level file IO it's even more complicated because there's caching going on (mostly using "otherwise free RAM"). Writes are typically just buffered so that they seem fast/instant (because the data isn't actually written to the device until the kernel gets around to it later), and reads can also seem fast/instant when the files/data being read is already cached in RAM. But... — Brendan, Commented Jan 5, 2021 at 23:18
@hudac: "Do A, then do B, then do C" is never better (for performance, utilization) than "Do A and B and C in parallel"; and therefore synchronous IO is never better (for performance, utilization) than asynchronous IO. They are only "equal" under extremely unlikely scenarios (e.g. very simple scenarios involving devices that don't exist and/or small amounts of data, with a single-CPU, while no other process does anything). For example; if fetching is CPU intensive and processing the fetched data is CPU intensive, then (in practice) you probably have 4 or more CPUs and can do both in parallel. — Brendan, Commented Jan 5, 2021 at 23:39
@hudac: The main benefit of synchronous IO is that's easier for programmers because it matches "procedural thinking". Specifically; "A happens, then B happens, then C happens" is easier to work with than "all these things can happen at the same time and I can't know which order anything will finish in". — Brendan, Commented Jan 5, 2021 at 23:46

R.. GitHub STOP HELPING ICE · Accepted Answer · 2021-01-04 23:39:59Z

Your understanding is partly right, but which tools you use are a matter of what programming model you prefer, and don't determine whether your program will freeze waiting for I/O operations to finish. For certain, specialized, very-high-load applications, some models are marginally to moderately more efficient, but unless you're in such a situation, you should pick the model that makes it easy to write and maintain your program and have it be portable to systems you and your users care about, not the one someone is marketing as high-performance.

Traditionally, there were two ways to do I/O without blocking:

Structure your program as an event loop performing select (nowadays poll; select is outdated and has critical flaws) on a set of file descriptors that might be ready for reading input or accepting output. This requires keeping some sort of explicit state for partial input that you're not ready to process yet and for pending output that you haven't been able to write out yet.
Separate I/O into separate execution contexts. Historically the unixy approach to this was separate processes, and that can still make sense when you have other reasons to want separate processes anyway (privilege isolation, etc.) but the more modern way to do this is with threads. With a separate execution context for each I/O channel you can just use normal blocking read/write (or even buffered stdio functions) and any partial input or unfinished output state is kept for you implicitly in the call frame stack/local variables of its execution context.

Note that, of the above two options, only the latter helps with stalls from disk access being slow, as regular files are always "ready" for input and output according to select/poll.

Nowadays there's a trend, probably owing largely to languages like JavaScript, towards a third approach, the "async model", with even handler callbacks. I find it harder to work with, requiring more boilerplate code, and harder to reason about, than either of the above methods, but plenty of people like it. If you want to use it, it's probably preferable to do so with a library that abstracts the Linuxisms you mentioned (io_uring, etc.) so your program can run on other systems and doesn't depend on latest Linux fads.

Now to your particular question:

Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?

If your application has a single input source (no interactivity) and single output, e.g. like most unix commands, there is absolutely no benefit to any kind of async I/O regardless of which programmind model (event loop, threads, async callbacks, whatever). The simplest and most efficient thing to do is just read and write.

I don't think the trend is caused by Javascript. Win32 had async I/O way before Javascript, I assume it got it from VMS. — MSalters, Commented Jan 5, 2021 at 15:15

eerorika · Accepted Answer · 2021-01-04 22:19:56Z

The kernel do need CPU time in order to do it.

Is that correct?.

Pretty much, yes.

Is the difference between those two that we assume disk access is slow ... in a-sync IO the time you wait for it to be written to disk can be used to continue doing application processing, and the kernel part of writing it to disk is small?

Exactly.

Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?

Depends on many factors. How does the application "get info"? Is it CPU intensive? Does it use the same IO as the writing? Is it a service that processes multiple requests concurrently? How many simultaneous connections? Is the performance important in the first place? In some cases: Yes, there may be significant benefit in using async IO. In some other cases, you may get most of the benefits by using sync IO in a separate thread. And in other cases single threaded sync IO can be sufficient.

Dmitry · Accepted Answer · 2021-01-04 22:45:05Z

Context switching is necessary in any case. Kernel always works in its own context. So, the synchronous access doesn't save the processor time. Usually, writing doesn't require a lot of processor work. The limiting factor is the disk response. The question is will we wait for this response do our work.

Let's say I have an application that all it does is get info and write it into files. Is there any benefit for using a-sync IO instead of sync IO?

If you implement a synchronous access, your sequence is following:

get information
write information
goto 1.

So, you can't get information until write() completes. Let the information supplier is as slow as the disk you write to. In this case the program will be twice slower that the asynchronous one. If the information supplier can't wait and save the information while you are writing, you will lose portions of information when write. Examples of such information sources could be sensors for quick processes. In this case, you should synchronously read sensors and asynchronously save the obtained values.

koder · Accepted Answer · 2021-01-05 13:36:43Z

Asynchronous IO is not better than synchronous IO. Nor vice versa.

The question is which one is better for your use case.

Synchronous IO is generally simpler to code, but asynchronous IO can lead to better throughput and responsiveness at the expense of more complicated code.

I never had any benefit from asynchronous IO just for file access, but some applications may benefit from it.

Applications accessing "slow" IO like the network or a terminal have the most benefit. Using asychronous IO allows them to do useful work while waiting for IO to complete. This can mean the ability to serve more clients or to keep the application responsive for the user.

(and "slow" just means that the time for an IO operation to finish is unbounded, it may ever never finish, eg when waiting for a user to press enter or a network client to send a command)

In the end, asynchronous IO doesn't do less work, it's just distributed differently in time to reduce idle waiting.

Collectives™ on Stack Overflow

What actually happens in asynchronous IO

5 Answers 5

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

Your Answer

Sign up or log in

Post as a guest

Related