The Payments Engineer Playbook

The Payments Engineer Playbook

Share this post

The Payments Engineer Playbook
The Payments Engineer Playbook
The Joy of Being On Call For Payment Systems

The Joy of Being On Call For Payment Systems

How to be at ease when every second counts

Alvaro Duran's avatar
Alvaro Duran
Jul 16, 2025
∙ Paid
1

Share this post

The Payments Engineer Playbook
The Payments Engineer Playbook
The Joy of Being On Call For Payment Systems
Share

This is a chapter from Code-First Reliability.


There’s a certain kind of joy that you can experience when you’re on call.

It’s the feeling of palms are sweaty, knees weak, arms heavy but with no vomit on your sweater already. The joy that comes from proof that you’re competent and have poise under pressure. And I’m fully aware of the rapacious practices of some companies when it comes to unpaid overtime. But still.

Being on call can be a joyful experience.

No matter how good your payment systems are, they depend on external providers. Being “always-on” is not just writing defensive code, or elastic infrastructure. Accepting payments 24/7 is often the result of code that’s always running, but also people that are always vigilant.

As a result, and unlike other domains, money software requires you to be on call not only because your code can fail, but because it depends on systems that are beyond your control. And the decisions on what to do can’t be easily delegated to a machine, because whether a provider is “down” isn’t exactly rules based.

When nuance is required, the decision maker has to be human.

I’m Alvaro Duran, and this is The Payments Engineer Playbook. Over the last month, I’ve expanded on the engineering practices that make payment systems reliable. For many, it is a shock to realize that these practices have little to do with infrastructure. To have reliable money software, you have to engage in practices that are code-first.

That’s because payment systems can’t get away from depending on payment providers. These are the Stripes, Adyens and Paypals of the world, companies authorized by the payment networks to process payments “for the rest of us”.

We’ve covered why payment systems get tested in production, why retrying on a different provider often works, and how tokenization is the key to have seamless retries and agentic commerce.

But none of this matters when providers are down. And that’s when the engineers on call have to jump in. Because, when it comes to payments, every second counts.

This article will cover:

  • What a reasonable on call arrangement looks like

  • Why automated incident response seldom works

  • Why being good at on call is the mark of a great payments engineer

  • How to write code to make being on call easy (and what’s paradoxical about that)

  • What being on call really is all about (hint: not heroics)

  • And some mental tricks to cope with on call induced burnout

Enough intro, let’s dive in.

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Alvaro Duran Barata
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture

Share