The Concurrency Ambition
Software is forced to take hardware responsibilities in a post-Moore world
Something confusing about how credit cards work is that authorization is both asking the cardholder’s bank whether he or she can pay, and freezing the funds. Shouldn’t they be two different processes?
The answer is no: the only way the bank can guarantee that the answer to the question is valid is to make those funds unavailable as the answer is being sent to the merchant. Otherwise, the cardholder would be able to use funds more than once, multiplying their purchasing power artificially.
This is called double spending, and is the financial manifestation of a tricky computer problem that is forcing engineers to shift the way they think about the code they write.
The Limits of Redundant Hardware
People used to fear the immigrant stealing their jobs; they’re now scared of the AI that can steal the whole society’s. That’s because we are past the point where we expect computers to do the work of humans: most of our technological efforts nowadays are driven by our desire to make computers do multiple things at the same time.
For the last 50 years, our growing computational ambitions have been addressed mostly by increasingly powerful hardware, in what is known as Moore’s Law. But because its end is nigh, concurrency—dealing with lots of things at once—is becoming a requirement that software needs to pick up from hardware. As our ambitions keep growing, and CPU power is unable to catch up, software has to take over.
Precisely because of the power of hardware, programming languages that have been around since the 90s are built on the assumption that most programmers require just one single unit of execution—called thread— and one single chunk of memory to hold data in it. When code means one damned thing after another, programs are way easier to write and to think about.
This, though, is very inefficient. Imagine if Starbucks’ baristas could not switch tasks until they were completed. Imagine their attention put on the coffee machine, idle, unable to attend the next customer because the coffee machine is running. Wouldn’t it be great if the barista moved on, while the coffee machine is doing its thing, and pick the cup once it’s filled?
This barista is an apt metaphor of how most software operates nowadays. Most systems are built redundantly: rather than have one program request data from the database, and serve the next request while the database is doing its thing, multiple programs are run inside the server’s operating system, waiting for the data to come through, each one serving one request until completion.
Hardware isn’t getting better at the rate it used to, and the cloud bill isn’t getting any cheaper, and as a result engineers are incentivized to write software that doesn’t just sit idle for something to happen, wasting valuable resources.
Thread Safety, or “you’re in my spot”
One possible way of executing concurrency is by having one program schedule tasks to be performed independently. Rather than going for the default single-thread option, a program can open more than one of those threads with the underlying operating system, oversee their status, handle the result of their execution, assign priorities, and retry if needed. This is called parallelism.
In AI, parallelism is now all the rage, because training the algorithm is such an enormous task it benefits greatly from breaking down the problem into multiple subtasks running independently of each other, while some central program controls the whole process.
However, AI use cases aside, parallelism is a bad idea. Because software is (for now) written by humans, programs that can be started and suddenly stopped from the outside are very hard to reason about, and prone to errors.
The way in which this manifests is when seemingly impossible situations can happen and do so silently. In the Starbucks example, let’s say the barista gets an order for a latte. She gets the cup, puts it on the coffee machine, presses the coffee button, and the machine starts. She could now get the milk cup and put milk on it to complete the latte. Instead, she decides to attend the next customer, and gets an order for an expresso.
Now, something weird happens. The barista makes a mistake and, instead of grabbing a new cup, she turns for the one in the coffee machine, where the latte is being prepared. With a complete disregard for what’s inside, she gets the cup, puts it again on the coffee machine, presses the coffee button, and the machine starts.
Instead of having two cups of coffee, one with latte and the other one with expresso, she has one cup of absolute disaster!
Does this sound absurd? It happens all the time. When I went to the cinema to watch Avengers: Infinity War, someone was sitting in my spot, with the same ticket for the same session. The system probably booked the same seat for the two of us, as I was selecting the same seat that the other guy was in the process of paying. When two tasks step into each other’s shoes, and the outcome is incorrect, or even nonsensical, some concurrency problem is usually involved.
Some programming languages, because of this, go to great lengths to ensure that only one thread can be in execution at a time. These languages are called thread-safe.
Async to the Rescue
Some engineers may complain that these errors are the outcome of bad programmers. And so, the guardrails that make programming languages thread-safe have a bad reputation, a barrier to increasing the performance of programs.
In practice, these guardrails are often behind the popularity of the languages that implemented them, by increasing their adoption among people for whom writing software is just a means to getting their actual job done, like scientists.
Which brings us back to today: over time, we’ve noticed that using multiple threads is bad. However, concurrency is becoming an absolute must have for software systems. What goes?
The answer, luckily, is neither. During the 2010s, mainstream programming languages have built the toolkits to simulate what the barista should have done all along: leave the machine alone, and pick something else to do instead. This paradigm is called asynchronous programming, where rather than trying to do everything at the same time, tasks “take turns” on the same thread, and the attention is put on one thing at a time, and one thing only.
In doing so, engineers can reason more easily about tasks that are being done concurrently, because the checkpoints are explicit. The barista would wait for the coffee to be done, but she would move on to handle the milk, keeping an eye on the coffee. The latte would be done correctly, and she would be able to move on to the next customer, ready to take that expresso order and avoiding the embarrassment of serving two orders in the same cup. Humans, sometimes, inspire robots.