Consistency, Availability, Prowess: How Stripe and Shopify push the limits of distributed payments
Changes on the infrastructure, noticing what the CAP theorem really means, and scaling like crazy.
Most engineers don’t believe it’s possible that, under a partitioned network, a database can provide both strong consistency and system availability.
Let me be clear about it: this belief is incorrect.
The confusion stems from a misunderstanding of what Eric Brewer really meant by Availability when he conjectured the CAP theorem. A system that keeps only some nodes available is not available in the CAP sense.
Even if clients individually don’t notice it.
Here’s the thing: I can get away with a system that’s both strongly consistent and highly available, as long as each client individually is guaranteed a high degree of availability, at the expense of making the system less available as a whole.
This is the little secret behind Stripe’s 99.999% availability metric.
Stripe’s engineers believe something you don’t.
I’m Alvaro Duran, and this is The Payments Engineer Playbook. If you scroll for 5 minutes on Youtube, you’ll find many tutorials on designing payment systems. But if you want to build one for real users, and real money, you’re on your own.
I know this because I’ve built and maintained payment systems for almost ten years. I’ve been able to see all kinds of private conversations about what works, and what doesn’t.
Lately, I’ve decided to make these conversations public. This is how The Payments Engineer Playbook was born.
One reader said that The Playbook is about “deep dives on the stack behind the magic”. We investigate the technology to transfer money, so that you become a smarter, more skillful and more successful payments engineer.
And we do that by cutting off one sliver of it, and extracting tactics from it. Today, we’re looking at Stripe’s Data Movement Platform, and how it navigates the paradox of the CAP theorem.
I finally decided to find out what was Stripe’s secret sauce to achieve less than 5 minutes of downtime in 2023:
In 2023, Stripe processed $1 trillion in total payments volume, all while maintaining an uptime of 99.999%. We obsess over reliability. As engineers on the database infrastructure team, we provide a database-as-a-service (DBaaS) called DocDB as a foundation layer for our APIs.
Stripe’s DocDB is an extension of MongoDB Community—a popular open-source database—and consists of a set of services that we built in-house. It serves over five million queries per second from Stripe’s product applications.
— How Stripe’s document databases supported 99.999% uptime with zero-downtime data migrations
If you believe that the CAP theorem is some sort of Heisenberg uncertainty principle, where more availability means less consistency, Stripe’s claim is terrifying. With such availability rate, consistency must be too low to handle something as critical as money.
But if you don’t, you quickly notice that the article keeps throwing a few times a key concept.
And that concept is client transparency.
Now, the article never clarifies what they mean by that. But it can only mean one thing: that the mind-blowingly high availability is measured, not for the system as a whole, but for each client.
The trade-off is that Stripe is almost never available, in the CAP sense, so that they can be highly available for each of their clients.
And funnily enough, they’re doing it in a way that reminds me a lot of Shopify.
Available is a Good to have; Consistent is a Must
Shopify runs miniature versions of its gigantic Ruby on Rails codebase for its merchants.
The company probably started like most SaaS do, with a multi-tenant architecture and shared schemas. In time, they realized that, rather than redoing their software, they were going to be way better off by redoing their infrastructure.
Now, even if multi-tenant relational databases are difficult to scale, Shopify has simply built the infra muscle to rearrange merchants, and their data, across an elastic pool of pods.
When Stripe was founded however, its engineers decided against relational databases. They were ready to handle astronomical volumes of financial data, in real time. Relational databases were quickly ruled out.
That’s why they chose MongoDB.
MongoDB is a document database that splits the data in shards, or partitions, based on some aspect of that data. It’s the equivalent of the encyclopedia tomes your grandpa keeps on his bookshelf. The tomes organize the content of the encyclopedia alphabetically, but in a way that balances the length of each tome, even if that meant having some of the words that started with A “spilled into” the next tome.
Sharding is great, because it lets you scale the database horizontally.
That said, the problem with MongoDB is that it places more responsibility on the application and those who develop it, rather than on the database. There are those who think that you should never use it, and Stripe engineers were aware that choosing a document database was entering into uncharted territory.
MongoDB was probably the worst form of database for Stripe, except for all the others.
That’s why they built the Data Movement Platform.
DMP was initially designed to become the horizontal scaling infrastructure of MongoDB, what arranged how big the partitions were, and what went inside of them.
But in time, it became much more. Nowadays, DMP is used to keep maximal client-transparent availability, while delivering strong consistency, as any money software system should.
How Data Movement Platform works
A recurring theme of The Payments Engineer Playbook is that you get better at handling the difficult parts of your system by doing them more often.
DMP leans into the idea that the best way to prevent system availability issues is by having the system as a whole continually unavailable.
What I mean by that is that DMP is constantly moving chunks of data from one partition to another, in an effort to:
Merge underutilized shards
Upgrade the database engine
Keep data that belongs to large users closer
DMP performs each of these chunk data migrations in 6 steps:
Registration: It signals the start of the process
Bulk data import: It copies all the data from the source shard’s last snapshot into the target shard (or shards)
Async replication: Using the collected Change Data Capture logs, it replicates the updates on the source shard into the target shard/s as close to real time as possible.
Correctness check: It verifies the completeness and accuracy of the data.
Traffic switch: It reroutes all production reads and writes into the target shard/s.
Deregistration: It deletes the data from the source shard and signals the end of the process.
The New is often the Rediscovery of the Old
You know what was the most interesting thing to me? How eerily similar this process is to Shopify’s pod rebalancing:
In order to accommodate the growing number of shops powered by Shopify, the infrastructure scales the number of pods horizontally: more shops, more pods.
But in order to accommodate the growth of each merchant, they rebalance the pods: merchants with a lot of traffic get paired with those with less traffic.
And the way they rebalance looks a lot like restoring from a backup.
A few weeks ago, I talked about discoveries made independently and almost simultaneously.
This DMP design looks like another example, made both by Stripe and Shopify, but under different data systems.
I think this is not a coincidence. Both companies grew very fast from the very beginning, and had to build the systems they needed to cope with that growth.
Both companies also concluded that, in order to do that, they had to build their systems in two layers: one scale-aware, and another scale agnostic.
And you know what? There’s something about ideas independently discovered that make me think they’re right. That the likelihood of two successful companies arriving at very similar conclusions when it comes to database infrastructure for scale, and such conclusions being wrong, is probably very low.
My takeaway then isn’t that, to achieve 99.999% availability, one has to blindly follow Stripe, and build payment systems with MongoDB.
Such a mode of thinking is what gets you to believe all the wrong things about what CAP theorem really means.
What I take away from this is that, perhaps, there is really no excuse for payments engineers not to build money software that’s both highly available and strongly consistent.
But we must first stop installing roadblocks in our way, and start building.
Feel free to bookmark this article and use it as a handbook when you’re exploring how to horizontally scale your payments system storage. It should give you enough to keep you going.
I’ve drawn from many references to put this article together. Both Stripe’s article on DocDB and Shopify’s video on rebalancing pods are a must see. The team at FoundationDB has a great discussion on the CAP theorem confusion that you should definitely check out too.
If you’re unfamiliar with a few of the concepts I used, check out good resources on sharding, change data capture, and multi-tenancy.
But most important of all, don’t ever use document databases without reading why you shouldn’t use them.
And last, I’ve already covered Stripe and Shopify before. Feel free to check them out:
This has been The Payments Engineer Playbook. I’ll see you next week.
PS: Here’s something I’ve learned about software projects over the years.
The biggest reason software projects fail is not a lack of technical knowledge, but domain knowledge.
But most companies hire for technical skills. Those engineers who have a deep understanding of the domain their in (read: they’ve made the same mistakes before) are seen as better, indispensable. They’re promoted faster. They are paid attention more often. They have influence and leverage.
There’s a disconnect between what gets you hired, and what gets you promoted.
Now, I think there’s enough content out there on tech interviews. However silly and baseless they are, there’s a process to get good at them.
But on building payment systems specifically? I don’t think there’s anything out there like The Playbook.
Building payment systems, I’ve already nailed some things and made some mistakes. Thanks to it, I have a clearer picture of what payment systems need and what is superfluous.
And I’m offering you access.
Every Wednesday, you’ll get an article, just like this one, full of insight, lessons learned and common mistakes, about a particular aspect of payment systems. All tailored to engineers who build money software. Payments engineers.
In 2025, The Playbook is going paid. And for $15 a month, or $149 a year, you can read every article that’s published, and will be published.
If being able to avoid the mistakes your competitors are making when building payment systems is worth to you, I suggest you pledge a subscription for 2025.
Because at the beginning of next year, the price is going up. And as an early subscriber, I don’t want you to miss the chance.
And if someone you respect shared this article with you, do me a favor and subscribe. Every week I feel I’m getting better at this. That means that my best articles on how to build payment systems are probably yet to be written.
You can only find out if you subscribe to The Payments Engineer Playbook. I’ll see you around.