And so it’s happened. After a long time of doubts, opposition, and preparation of this feature, WG21 agreed on how the coroutines should look like, and coroutines will likely come into C++ 20. Since it’s a significant feature, I think it’s good to start preparing and learning this now (remember there are also modules, concepts, and ranges in the to learn list ).
There were lots of people opposing this feature. Main complaints were regarding the hardness to understand, lots of customisation points, and possibly not optimal performance due to possibly unoptimised dynamic memory allocations ( possibly 😉 ).
There were even parallel attempts to the accepted TS (officially published Technical Specifications) to create another coroutines mechanism. Coroutines, we will be discussing here are the ones described in the TS (Technical Specification), and this is the document to be merged with the IS (International Standard). The alternative, on the other hand, was created by Google. Google’s approach, after all, turned out to also suffer from numerous issues, which were not trivial to solve, often requiring strange additional C++ features.
The final conclusion was to accept the coroutines by Microsoft (the authors of the TS). And this is what we are going to talk about in this post. So let’s start with…
What are the coroutines?
The coroutines already exist in many programming languages, may it be Python or C#. Coroutines provide one more way to create asynchronous code. How this way differs from threads and why do we need a dedicated language feature for coroutines and finally how we can benefit will be explained in this section.
There is lots of misunderstanding regarding what the coroutine is. Depending on the environment in which they are used they might be called:
- stackless coroutines
- stackful coroutines
- green threads
- fibers
- goroutines
Good news is that stackful coroutines, green threads, fibres, goroutines are the same thing (sometimes used differently). We will refer to them later on as the fibres or stackful coroutines. But there is something special about the stackless coroutines, which is the main subject of this series of posts (you can expect soon more posts about the coroutines and how to use them).
To understand the coroutines and get some intuition about them, we will first have a brief look at the functions and (what we might call) “their API”. The standard way of their usage is to simply call them and wait until they finish:
void foo(){ return; //here we exit the function}foo(); //here we call/start the function
After we once called the function, there is no way we can suspend it or resume it. The only operations on the functions we can perform are start and finish. Once the function is started, we must wait until it’s finished. If we call the function again, it begins execution from it’s beginning.
The situation with the coroutines is different. You can not only start and stop it but also suspend and resume it. It is still different than the kernel’s thread because coroutines are not preemptive by themselves (the coroutines on the other hand usually belong to the thread, which is preemptive). To understand it let’s have a look at the generator defined in the python. Even though the Python world calls this generator it would be called coroutine in the C++ language. The example is taken from this website :
def generate_nums(): num = 0 while True: yield num num = num + 1 nums = generate_nums() for x in nums: print(x) if x > 9: break
The way this code works is that the call to the generate_nums function creates a coroutine object. Each time we iterate over the coroutine object the coroutine is resumed and suspends itself once the yield keyword is encountered, returning next integer in the sequence (the for loop is syntactic sugar for the next function call, which resumes the coroutine). The code finishes the loop on encountering the break statement. In this case, the coroutine never ends, but it’s easy to imagine the situation in which coroutine reaches its end and finishes. So now we can see, that this kind of coroutine can be started, suspended, resumed and finally finished [Note: there is also create and destroy operation in the C++ language, but it’s not needed to get the intuition of the coroutine]
Coroutines as the library.
So you now have the intuition of what coroutines are. You know there already are libraries for creating fibres objects. The question is then why do we need to have a dedicated language feature and not just library, that would enable the use of the coroutines.
This tries to answer this question and show you the difference between stackful and stackless coroutines. The difference is the key to understand the coroutine language feature.
Stackful coroutines
So first let’s talk about what are stackful coroutines, how they work, and why they can be implemented as a library. They might be easier to be explained since those are built similarly to the threads.
Fibers or stackful coroutines are, a separate stack, that can be used to process function calls. To understand how exactly this kind of coroutine works, we will have a brief look into function frames and function calls from the low-level point of view. But first, let’s have a look at the properties of the fibers.
- they have their own stack,
- the lifetime of the fibers is independent of the code that called it (usually they can have user-defined scheduler),
- fibers can be detached from one thread and attached to another,
- cooperative scheduling (fiber must decide to switch to another fiber/scheduler),
- cannot run simultaneously on the same thread.
Implications for the mentioned properties are following:
- fibre’s context switching must be performed by the fibres’ user, not the OS (OS still can dispossess the fibre by dispossessing the thread on which it runs),
- No real data races occur between two fibres running on the same thread since only one can be active,
- fibre developer must know when it’s a proper place and time to give computation power back to the possible scheduler or callee.
- I/O operations in fibre should be asynchronous so that other fibres can do their job without blocking one another.
Let’s now explain how fibres work in the detail starting from the explanation of what is stack doing for the function calls.
So the stack is a contiguous block of the memory, that is needed to store the local variables and function arguments. But what is even more important, after each function call (with few exceptions) additional information is put on the stack to let know the called function how to return to the callee and restore the processor registers.
Some of the registers have a particular purpose and are saved on the stack on function calls. Those registers (in case of the ARM architecture) are:
- SP – stack pointer
- LR – link register
- PC – program counter
A stack pointer is a register that holds the address of the beginning of the stack, that belongs to the current function call. Thanks to this value it’s easy to refer to the arguments and local variables, that are saved on the stack.
Link register is very important during the function calls. It stores the return address (address to the callee) where there is a code to be executed after the current function execution is over. When the function is called the PC is saved to the LR. When the function returns the PC is restored using the LR.
Program counter is the address of currently executed instruction.
Every time when a function is called the link register is saved, so that function knows where to return after it’s finished.

When the stackful coroutine gets executed, the called functions use the previously allocated stack to store its arguments and local variables. Because function calls store all the information on the stack for the stackful coroutine, fibre might suspend its execution in any function that gets called in the coroutine.

Let’s now have a look at what is going on in the picture above. First of all, threads and fibers have their own separate stacks. The green numbers are the ordering number in which actions happen
- The regular function call inside the thread. Performs stack allocation.
- The function creates the fibre object. In result, the stack for the fibre gets allocated. Creating the fibre does not necessarily mean that it gets immediately executed. Also, the activation frame gets allocated. The data in the activation frame is set in such a way, that saving its content to the processor’s registers will cause the context switch to the fibre stack.
- Regular function call.
- Coroutine call. Processor’s registers are set to the content of the activation frame.
- Regular function call inside the coroutine.
- Regular function call inside the coroutine.
- Coroutine suspends. The activation frame content gets updated, and processor’s registers are set, so that context returns to the thread’s stack.
- Regular function call inside the thread.
- Regular function call inside the thread.
- Resuming the coroutine – similar thing happens during the coroutine call. The activation frame remembers the state of processor’s registers inside the coroutine, which were set during the coroutine suspension.
- Regular function call inside the coroutine. Function frame allocated in the coroutine’s stack.
- Some simplification on the image is done. What happens now is that coroutine ends, and the stack is unwinded. But the return from the coroutine in fact happens from the bottom (not top) function.
- Regular function return as above.
- Regular function return.
- Coroutine return. The coroutine’s stack is empty. The context is switched back to the thread. From now on, the fibre cannot be resumed.
- A regular function call in the thread’s context.
- Later on, functions can continue the operation or finish, so that finally the stack gets unwinded.
In the case of the stackful coroutines, there is no need for a dedicated language feature to use them. Whole stackful coroutines could be just implemented with the help of the library and there are already libraries designed to do this:
- https://swtch.com/libtask/
- https://code.google.com/archive/p/libconcurrency/
- https://www.boost.org Boost.Fiber
- https://www.boost.org Boost
.Coroutine
Of those mentioned only Boost is C++ library, as other ones are just C ones.
The details of how the libraries work can be seen in the documentation. But basically, all of those libraries will be able to create the separate stack for the fiber and will provide the possibility to resume (from the caller) and suspend (from inside) the coroutine.
Let’s have a look of the Boost.Fiber example:
#include <cstdlib>#include <iostream>#include <memory>#include <string>#include <thread>#include <boost/intrusive_ptr.hpp>#include <boost/fiber/all.hpp>inlinevoid fn( std::string const& str, int n) { for ( int i = 0; i < n; ++i) { std::cout << i << ": " << str << std::endl; boost::this_fiber::yield(); }}int main() { try { boost::fibers::fiber f1( fn, "abc", 5); std::cerr << "f1 : " << f1.get_id() << std::endl; f1.join(); std::cout << "done." << std::endl; return EXIT_SUCCESS; } catch ( std::exception const& e) { std::cerr << "exception: " << e.what() << std::endl; } catch (...) { std::cerr << "unhandled exception" << std::endl; } return EXIT_FAILURE;}
In the case of Boost.Fiber the library has a built-in scheduler for the coroutines. All fibres get executed in the same thread. Because coroutines scheduling is cooperative, the fibre needs to decide when to give control back to the scheduler. In the example, it happens on the call to the yield function, which suspends the coroutine.
Since there is no other fibre, the fibre’s scheduler always decides to resume the coroutine.
Stackless coroutines
Stackless coroutines have a little bit different properties to the stackful ones. The main characteristics, however, remain as stackless coroutines still can be started, and after they suspend themselves, they can be resumed. We will call stackless coroutines from now on simply coroutines. This is the type of coroutines we are likely to find in the C++20.
Just to make simillar list of coroutines properties, coroutines can:
- Coroutines are strongly connected to the callers – call to the coroutine transfers execution to the coroutine and yielding from the coroutine comes back to its caller.
- Stackful coroutines live as long as their stack. Stackless coroutines live as long as their object.
In case of the stackless coroutines, however, there is no need to allocate the whole stack. They are far less memory consuming, but they have got some limitations because of that.
So first of all, if they do not allocate the memory for the stack, then how do they work? Where goes all the data meant to be stored on the stack in case of the stackful coroutines? The answer is: on the caller’s stack.
The secret of stackless coroutines is that they can suspend themselves only from the top-level function. For all other functions their data is allocated on the callee stack, so all functions called from the coroutine must finish before suspending the coroutine. All the data that coroutine needs to preserve its state is allocated dynamically on the heap. This usually takes a couple of local variables and arguments, which is far smaller in size than the whole stack allocated in advance.
Let’s have a look on how stackless coroutines work:

Now as we can see, there is only one stack – this is threads main stack. Let’s follow step by step what is going on in the picture. (the coroutine activation frame is in two colors – black is what is stored on the stack and blue is what is stored on the heap).
- Regular function call, which frame is stored on the stack
- The function creates the coroutine. This means allocating the activation frame somewhere on the heap.
- Regular function call.
- Coroutine call. The body of the coroutine gets allocated on the usual stack. And program flow is the same as in case of normal function.
- Regular function call from the coroutine. Again everything is still on the stack. [Note: coroutine could not be suspended from this point, as it’s not the top-level function of the coroutine)
- Function returns, to the coroutine’s top-level function [Note coroutine can now suspend itself.]
- Coroutine suspends – All the data needed to be preserved across the coroutines calls are put into the activation frame.
- Regular function call
- Coroutine resumed – this happens as the regular function call, but with the jump to the previous suspension point + restoration of the variables state from the activation frame.
- Function call as in point 5.
- Function return as in point 6.
- Coroutine return. Cannot resume the coroutine from now on
So as we can see there is far fewer data needed to be remembered across the coroutine suspensions and resumes, but coroutine can suspend and return from itself only from the top level function. All the function and coroutine calls happen in the same way, in the case of the coroutine, however, some additional data needs to be preserved across the calls to know how to jump to the suspension point and restore the state of the local variables. Besides, there is no difference between the function frame and coroutine frame.
A coroutine can also call other coroutines (which is not shown in the example). In case of the stackless coroutines, each call to one will end up allocating new space for new coroutines data (multiple calls to the coroutines might cause various dynamic memory allocations).
The reason why coroutines need to have a dedicated language feature is that the compiler needs to decide, which variables describe the state of the coroutine and create boilerplate code for jumps to the suspension points.
Coroutines use cases
The coroutines could be used in the exact same way as the coroutines in other languages. Coroutine will simplify writing:
- generators
- asynchronous I/O code
- lazy computations
- event driven applications
Summary
What I would like you to know after reading this article is:
- why do we need a dedicated language feature for coroutines
- what is the difference between the stackful and stackless coroutines
- why do we need coroutines for
I hope that this article helped you in understanding those topics and was interesting enough to wait for more posts on coroutines – this time with code examples in C++!
Bibliography
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4024.pdf
- http://masnun.com/2015/11/13/python-generators-coroutines-native-coroutines-and-async-await.html
- http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2018/n4760.pdf
- http://www.davespace.co.uk/arm/introduction-to-arm/registers.html
FAQs
What are coroutines in C++? ›
A coroutine is a special function that can suspend its execution and resume later at the exact point where execution was suspended. When suspending, the function may return (yield) a value.
What is the difference between Goroutine and C++ coroutine? ›The differences between coroutines and goroutines are: goroutines imply parallelism; coroutines in general do not. goroutines communicate via channels; coroutines communicate via yield and resume operations.
What are the benefits of coroutines in C++? ›- Implement in asynchronous programming.
- Implement functional programming techniques.
- Implement it because of poor support for true parallelism.
- Pre-emptive scheduling can be achieved using coroutines.
- Keep the system's utilization high.
- Requires less resource than threads.
Coroutines are lighter than threads . Since they stack less . I.e coroutines don't have a dedicated stack . It means coroutine suspend execution by returning to the caller and the data that is required to resume execution is stored separately from the stack.
When should you not use coroutines? ›So, whenever you have nothing to gain from coroutines, you shouldn't use them as the default way to do things. Save this answer.
What is the purpose of coroutines? ›Coroutines are computer program components that allow execution to be suspended and resumed, generalizing subroutines for cooperative multitasking. Coroutines are well-suited for implementing familiar program components such as cooperative tasks, exceptions, event loops, iterators, infinite lists and pipes.
Are coroutines faster than threads? ›Threads can share all the resources owned by the process and threads belonging to the same process. In coroutines, the operating stack basically does not have kernel switching overhead and access global variable without a lock. Hence context switching is faster.
Is coroutine deprecated? ›"@coroutine" decorator is deprecated since Python 3.8, use "async def" instead.
Why coroutine is better than RxJava? ›The reason is coroutines makes it easier to write async code and operators just feels more natural to use. As a bonus, Flow operators are all kotlin Extension Functions, which means either you, or libraries, can easily add operators and they will not feel weird to use (in RxJava observable. lift() or observable.
What problems do coroutines solve? ›Coroutines solve problems of callbacks by asynchronously and cooperatively executing the code. We can set some checkpoints (like system calls, read/write files, and doing floating point operations) in our coroutine to pause and resume.
Are coroutines better than update? ›
In general the performance difference between Update and Coroutine is not relevant. Just follow the approach that suits you best, but use the much more performant MEC Coroutines instead of Unity Coroutine if you want to follow a Coroutine-like approach.
How many types of coroutines are there? ›There are majorly three types of Dispatchers which are IO, Default, and Main. IO dispatcher is used for doing the network and disk-related work. Default is used to do the CPU-intensive work. The Main is the UI thread of Android.
Are coroutines asynchronous C++? ›C++ coroutines can also be used for asynchronous programming by having a coroutine represent an asynchronous computation or an asynchronous task.
Should you use coroutines? ›Coroutines can be useful when a system performs two or more tasks that would be most naturally described as a series of long-running steps that involve a lot of waiting.
Does coroutines run on main thread? ›Use of coroutines in long-running tasks
It does not block the main thread in this process, the main thread is free to do its tasks whatever it is doing. It looks like the sequential code that we have written but it will work without blocking the main thread.
The disadvantage is that you'll need to copy out/in every time a coroutine yields. It's unrelated to gotos in a state machine. Having a stack per coroutine is not a big deal, especially if the coroutine library regularly advises the kernel on the memory areas it isn't using (madvise).
What can I use instead of coroutines? ›The alternative to coroutines, FSMs and event-based programming are CSP (Communicating Sequential Processes). Check out LuaCSP implementation (on top of coroutines) here: github.com/loyso/LuaCSP NB: What people typically missing is the aspect of coordination and communication.
Is async await better than coroutine? ›The main difference between them is that while Async/await has a specified return value, Coroutines leans more towards updating existing data.
What are the two types of coroutines? ›kotlinx-coroutines-android — for Android Main thread dispatcher. kotlinx-coroutines-javafx — for JavaFx Application thread dispatcher.
How do coroutines communicate? ›One coroutine can send information on that pipe. The other coroutine will wait to receive the information. This concept of communication between coroutines is different than threads. Do not communicate by sharing memory; instead, share memory by communicating.
What you need to know about coroutines? ›
A coroutine is an instance of suspendable computation. It is conceptually similar to a thread, in the sense that it takes a block of code to run that works concurrently with the rest of the code. However, a coroutine is not bound to any particular thread.
How many coroutines can be executed at once? ›A coroutine is executed inside a thread. One thread can have many coroutines inside it, but as already mentioned, only one instruction can be executed in a thread at a given time.
Can multiple coroutines run at the same time? ›To launch multiple coroutines at once and wait for all of them, you use async and await() . You can use awaitAll() on a list of them if you simply need all of them to finish before continuing.
What is the difference between handler and coroutines? ›The difference is that the Coroutines will perform the delay being suspended, while the Android Handler will perform the delay by storing a message into the message queue of the thread to be executed at a later point. Seemingly the effect is the same, the difference is in the how the delay is performed.
What is the difference between generator and coroutine? ›Generators let you create functions that look like iterators to the consumer. Coroutines are an extension of the concept of traditional functions. A function will hand control back to its caller once through the return statement. A coroutine will hand control back any number of times by calling yield .
Do coroutines automatically stop? ›As to the OP question, a coroutine will stop automatically if it runs out of code to execute, or if the MonoBehaviour it is on is disabled/destroyed.
Are coroutines stable? ›Among such stable components are, for example, the Kotlin compiler for the JVM, the Standard Library, and Coroutines. Following the Feedback Loop principle we release many things early for the community to try out, so a number of components are not yet released as Stable.
Is RxJava dying? ›RxJava, once the hottest framework in Android development, is dying. It's dying quietly, without drawing much attention to itself. RxJava's former fans and advocates moved on to new shiny things, so there is no one left to say a proper eulogy over this, once very popular, framework.
What is replacing RxJava? ›Coroutines, unlike RxJava, focus on offering a mechanism to write asynchronous code that may also run partially sequentially, allowing the omission of callbacks in code which translates to more compact code that is easy to generate and refactor.
Is RxJava still being used? ›RxJava was created quite a while ago, but it is still widely used in large Android projects as the main tool for managing streams and multi-threading.
What is the difference between coroutines and function? ›
A typical function can return any type, whereas coroutines must return an IEnumerator, and we must use yield before return. Coroutines can be used for two reasons: asynchronous code and code that needs to compute over several frames.
How do coroutines improve performance? ›It may suspend its execution in one thread and resume in another one. As a result, Kotlin coroutines can enable you to write clean and simplified asynchronous code, which keeps your Android app responsive when handling long-running tasks, like network calls or disk operations.
Are coroutines asynchronous? ›Coroutines are not asyncronous by standard, but using the await and async (C# 5.0 and . NET 4.5) keyword you can make them to. In the context of Unity, coroutines seem to refer to iterator methods.
How do coroutines work internally? ›The way that Kotlin coroutines work internally is by using something called continuations. Continuations basically allow us to resume some work at the exact point it was suspended. The way it works is that when we call a suspendable function, its local variables are created on the stack.
Are coroutines production ready? ›coroutines is designed for production use. It is pretty well covered with tests, lots of things are already optimized, all the changes are made considering the issues of backwards compatibility with previously compiled code.
Is coroutine a loop? ›The coroutine should start from the beginning, wait some seconds, send the Message and then start again from top, so it loops. But every time it keeps hanging at the SendMessage function and sends its message every frame.
What is the difference between coroutine and subroutine? ›Subroutines have a single entry point. Coroutines are generalizations of subroutines. They are used for cooperative multitasking where a process voluntarily yield (give away) control periodically or when idle in order to enable multiple applications to be run simultaneously.
Do coroutines run on different threads? ›A coroutine is an instance of suspendable computation. It is conceptually similar to a thread, in the sense that it takes a block of code to run that works concurrently with the rest of the code. However, a coroutine is not bound to any particular thread.