NoDB: Processing Payments Without a Database

Making payments simpler and more reliable by removing databases out of the way.

Dec 18, 2024

I had the weirdest realization a few days ago.

Over the last few months, I’ve been studying threads and asynchronous programming. I’m not the only one: the software engineering industry is slowly coming to terms with the fact that hardware power is starting to plateau. The future of more efficient applications is going to come from the code, and not from the machine.

But async has a built-in assumption that I hadn’t really given much thought. Until now.

Here’s the thing: async programming isn’t about making your program run any faster. It’s about not wasting time—the time the CPU wastes waiting for a reply to a network request.

This is a well-known problem. In 2010, Ryan Dahl introduced Node.js by stating that we’re doing I/O completely wrong (I/O being roughly equivalent to “accessing what’s out there”). In most programming languages, calling a function and making an external request looks similar, but while the former executes immediately, the latter blocks, taking way longer to execute than the former.

Async is an execution model that accepts this divide, and creates a different set of keywords and evaluation that deals with the blocking operations, even if the code required is difficult to write, read and change.

But this is not the only option. Async accepts this divide; what if we didn’t? What would we need to do if we rejected the possibility of blocking operations, those that had to wait for an external system to do something?

You might be wondering how that is even possible. Can we do away with third-party systems? At least in payments, we can’t. But some provide webhooks, processes meant to make requests to some endpoint of your choice when some pending operation is completed.

Still, that wouldn’t be enough. There’s the database, right? Every payment system relies on persistent storage to do what needs to be done. And we can’t have payment systems that don’t sit on top of databases. Right?

That’s when it hit me. We’re forced to deal with asynchronous programming because we believe databases are completely necessary. Because we have a choice in what kind of database to use (PostgreSQL, MongoDB, etc), but we just had to choose.

What if we didn’t have to?

I’m Alvaro Duran, and this is The Payments Engineer Playbook. If you scroll for 5 minutes on Youtube, you’ll find many tutorials on designing payment systems. But if you want to build one for real users, and real money, you’re on your own.

I know this because I’ve built and maintained payment systems for almost ten years. I’ve been able to see all kinds of private conversations about what works, and what doesn’t.

Lately, I’ve decided to make these conversations public. This is how The Payments Engineer Playbook was born.

One reader said that The Playbook is about “deep dives on the stack behind the magic”. We investigate the technology to transfer money, so that you become a smarter, more skillful, and more successful payments engineer.

And we do that by cutting off one sliver of it, and extracting tactics from it. Today, we’re looking at event sourcing, and how it eliminates the need for persistent storage when we process payments.

Jane Street offices in New York City. Credit: FT.com

If I didn’t know anything about how data systems work, I would have never guessed that we had to store data about payments.

Think about it. Most payment systems are message brokers: they get data from the customer, and use it to message their bank that both the customer and the merchant had agreed on a purchase.

Yes, some data is going to be persisted. Accounting needs it to balance the ledger, Finance needs it for reconciliation, and to keep records in the case of a chargeback dispute. Analytics needs it for business intelligence. But almost none of that data is required by the payment system itself—it’s just produced.

Storing payment data is an afterthought. It doesn’t make the payment system better.

Therefore, in theory, we could process payments using data kept, temporarily, in RAM.

If there’s data, but no database, there has to be a fundamental shift in the way the system processes, and reacts to, information. This shift is, for the most part, treating changes in state, not the state itself, as first-class citizens.

This is known as event sourcing.

Seeing Like an Event Stream

You can always rebuild the status of a payment by looking at its history.

Collecting facts, therefore, is equivalent to storing the most up to date version of a customer’s payment.

You’re already familiar with events. Event notification, or pub/sub, is a design pattern whose goal is to decouple the producer of information from the systems that consume it.

This is great for PSPs, because it wouldn’t make any sense to do any code changes any time a new client wanted to integrate with their systems. In practice, some message queue sits between the PSP system and the end client, so that the PSP emits events, and the message queue makes sure that the appropriate clients receive it.

Event sourcing is the ultimate conclusion of this approach: why not make even the payment system itself deal in terms of events?

Pub/sub decouples producers from consumers. Event sourcing decouples the system from any persistent storage whatsoever.

In the end, the payment system is constantly reacting to new facts, in the expectation that, whenever a new event arrives, the status of such payment will be reconstructed.

I know I’m being too abstract, so let me clarify this with an example. A payment system collects events about a given payment:

A first event, payment_created, initiates the process.
Another event, 3DS_required, makes the system send the customer a 3DS challenge.
The 3DS_completed will make the system attempt authorization.
A pending_authorization event will trigger no action, until a completed_authorization event is received. Some other part of the system will continue the fulfillment of the order as a result.
When order_fulfilled is received, the payment system will request the capturing of the payment to collect the funds.
And when the completed_capture event is received, the payment will be finished.

I want you to notice two things. First, there’s no need to persist any of these events immediately. They can be stored, in memory, while another process fetches them in the background.

The second thing is that, at each step of the way, whether the step is possible is determined by the events that preceded it.

As a result, we can treat these events as temporary records within the payment system, removing the necessity of a database. The moment we complete a payment, we no longer need them. But if the payment system goes down, we have to make sure that no event gets lost, and we can resume operations instantly.

There’s a solution for this: hot backups.

One of the advantages of events is you can have multiple systems running off the same event stream. So they had a very good way of having hot backups. They basically ran two systems all the time with one of them being the lead one and if it went down the second one would instantly take over.
— Martin Fowler, The Many Meanings of Event-Driven Architecture

That’s why event sourcing is so prevalent in the HFT industry. Systems like LMAX deal with millions of transactions per second on a single threaded program by keeping all its operations in memory, with hot backups, and periodic snapshots in the background.

This, by the way, shouldn’t surprise you. This was what triggered this exploration all along. HFT firms deal with millions of financial transactions per second, so I asked myself, “what’s their secret?”, “do they deal with threads and async in a special way?”.

They do; they do away with them altogether.

I’ve drawn from many references to put this article together. The video that lit the fire within me was this talk on mechanical sympathy and LMAX high frequency trading. I’ve already heard the concept of mechanical sympathy on Jane Street’s podcast (they’re not an HFT firm, but they make markets, which is in many ways the same thing). This led me to a video by Martin Fowler on the many meanings of “event-driven”, and an article on his website on memory image, which shockingly mentions LMAX at the end. What were the odds?

If you want to know more about event sourcing, Martin Fowler has a great primer on the topic.

This has been The Payments Engineer Playbook. I’ll be off next week, and will publish my next article on the 2nd of January, for paid subscribers only.

If you’ve been living under a rock for the past couple of months, this newsletter is going paid in 2025. You can find all the info here:

The Payments Engineer Playbook Is Going Paid in 2025—Here's What You Need to Know

Alvaro Duran

October 2, 2024

Read full story

Have a Merry Christmas, and a Happy New Year.

Diana Darie

Dec 18

Really clever insight about how we're forced into async programming because we assume databases are mandatory. The LMAX example is intriguing, having single-threaded processing at scale by keeping everything in memory. Would love a deep dive on their hot backup implementation.

Expand full comment

NoDB: Processing Payments Without a Database

Making payments simpler and more reliable by removing databases out of the way.

Seeing Like an Event Stream

The Payments Engineer Playbook Is Going Paid in 2025—Here's What You Need to Know

Discussion about this post