What Makes DynamoDB a Good Database Choice for Payments
A database engine for the always-on payments era.
Every second counts.
Accepting money from customers is the last of a series of steps between your customers and your product. At small scale, there isn’t enough traffic on your site, and your payment system doesn’t need to be built to be always ready.
But given enough scale, they need to be.
Large enterprises, and those not that large but with peaks of traffic, a second of downtime means that there’s a customer somewhere who has decided to pay, and is left not with a confirmation, but with an error screen. “Something went wrong”, it says, “try again later”.
They often don’t. For these companies, every second counts.
But building an always-on, never-down payment system at scale puts a lot of demands on the database that supports it, namely:
Zero-downtime migrations
Low latency for a global clientele
Replication and failover strategies
Which is why, at large scale, relational databases like PostgreSQL aren’t cut for payments.
This doesn’t mean that PostgreSQL is a bad database choice. Let me be clear: at small scale, Postgres is perhaps the best choice out there.
But the truth is that PostgreSQL is a database designed in an era where people just turned off the machine at the end of the day, and turned it on the next morning. It is a database engine built for consistency, not availability. And though engineers have figured out ways to squeeze as much from these databases as they can, there are situations in which relational databases are no longer worth it.
Migrations are a good example. Relational databases expect all data, both already stored and incoming, to conform to an explicit set of rules known as schema. You have the ability to make changes to that schema with migrations, but they’re not free
In the process of applying a migration, the database engine has to check that the existing data already conforms to the new schema, and it will often block parts of the database from incoming reads or writes to make sure that there are no inconsistencies.
And while this is happening, your system is not fully available.
Replication lag is also a problem.
Relational databases are built on the concept of transactions, the ability to group a bunch of operations as a single unit that succeeds or fails altogether. This guarantee is very hard to maintain on a distributed setup, which is why most distributed relational systems are configured as a single Primary instance, responsible for all the incoming write operations, and a bunch of read-only Replicas.
Primary + Replicas is completely fine if payments happen on one particular region, and everywhere else you just need to be able to read that data. But that is rarely the case.
In this configuration, you end up having two kinds of clients: those who live close enough to the Primary, and experience low latency and strong consistency; and those who live closer to one of the Replicas, and experience higher latencies and eventual consistency.
And finally, moving data around becomes much more complex. Shopify had to implement an in-house system where tenant data could be moved from one instance to another in order to keep using relational databases. And on top of that, there are a few tricks that you have to keep in mind in order to guarantee the uniqueness of the ids.
If relational databases are your thing, you can check what Shopify is doing.
But if every second counts, let me show you what Amazon does.
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust.
I’m Alvaro Duran, and this is The Payments Engineer Playbook. Scroll for five minutes on Youtube and you’ll find tons of tutorials that show you payment system designs that’ll help you pass interviews. But there’s not much that teaches you how to build this critical software for real users and real money.
The reason I know this is because I’ve built and maintained payment systems for close to a decade. And I’ve been able to see all types of interesting conversations about what works and what doesn’t for payment systems behind closed doors.
These conversations are what inspired this newsletter.
In The Payments Engineer Playbook, we investigate the technology that transfers money. And we do that by cutting off one sliver of it and extract tactics from it.
Today’s article is about DynamoDB: Amazon’s solution to their “every second counts” problem. It’s a database engine that you can use today on AWS (or GCP’s Bigtable, Azure’s Cosmos DB or even the open-source ScyllaDB).
But often, moving from SQL to NoSQL is a scary jump. This article will give you the mental model you need to shift from one to another, by focusing on:
What DynamoDB gives up from a relational database to get high availability
A clear mental model of the basic building blocks
Common mistakes that people used to relational databases make and how to avoid them (some are irreversible)
Enough intro, let’s dive in.
Keep reading with a 7-day free trial
Subscribe to The Payments Engineer Playbook to keep reading this post and get 7 days of free access to the full post archives.