Bad API design costs a lot of money. In payments, it's clear why.

Timeouts are the preventable cause of chargebacks that no one bothers to fix

Mar 05, 2024

When it comes to payments, amateurs talk about money. Professionals talk about risk and liability.

The infrastructure of payments is, at heart, layers of responsibility. Every payment goes from one intermediary to another, each assuming a portion of that responsibility, and charging a fee for it. This design reduces the total amount of risk, and helps allocate it more effectively.

Rather than simplifying the problem for simplification sake, engineers should design payment applications so that the representation of payments expresses the concerns of those involved more clearly.

However, there is a technical glitch, built into most payment providers’ systems, that is costing the industry millions in chargeback fees.

And somehow, no one is paying attention to it.

Welcome to Money In Transit, the newsletter bridging the gap between payments strategy and execution. I’m Alvaro Duran.

Recently, we’ve looked at an actionable way to classify payment methods, how shifting just a little bit away from mainstream payments can generate millions of dollars for online marketplaces, and the absurdity of thinking of a payment application as a monolith. They’re all free to read.

Want to be notified when there’s a new post? Smash that button like it’s the last cookie in the jar.

Coordination is critical in payments. It guarantees that the shift of liability is clear, especially when things go wrong, and the customer disputes the transaction.

But what if there is a communication breakdown when that shift happens?

The technological landscape of payments is so old, there are tech companies whose sole purpose is to sit between the merchant and the payment providers. These payment orchestrators help reduce the engineering effort needed to be able to accept payments.

Orchestrators are a facade. But not all orchestrators do their job accurately.

See, the moment we started sending payment information across the Internet, something that was implicit offline became difficult: making sure that the shift of liability was complete. Online, multiple entities stack metaphorically, one on top of another, and every payment becomes a trail of API requests. In order to retrace your steps, you have to use a tracking ID, breadcrumbs that tell you which entity to ask for the data next.

This is also how computers deal internally with data.

This trail is a risk stack, and consists of many entities: if the customer pays with credit card, there is at least a merchant, the PayFac, the scheme and the issuing bank. Crucially, everyone in the stack has a way to point to the next entity, and a way to get pointed by the entity placed one step before.

Tracking IDs are therefore the most important piece of the puzzle. Offline, who to blame was straightforward. Online, the blame is lost in a sea of computers.

Timeouts exist because the Internet is unreliable

People like to think that the Internet is based on connections, but that’s one step removed from reality. Online, everything rests on individual packets of data, sent and received independently of each other.

That moment when the movie you were streaming was a bit blurry? Those were packets that didn’t arrive on time.

On the Internet, even payment data has to be chopped up and sent as packets. And bad connectivity can happen. Both the client sending a request and the server sending its response are unaware of the fate of each packet.

In order to mitigate this problem, the client always sends confirmation every time it receives a packet. And the server re-sends any packet for which it hasn’t received confirmation, sidestepping any temporary problem.

But if the breakdown is more permanent, the server shouldn’t be sending and re-sending needlessly. At some point, the server is going to “time out”. Timeouts free the server from the responsibility of making sure that the response was sent to the client.

But when a timeout happens, the risk stack can crumble down.

Most providers, when a merchant sends a payment request, are designed to send the outcome of that request and the acknowledgement at the same time.

It’s obvious to see why: people want to know if the payment went through immediately, and asynchronous processes can be complex to manage.

See where I am going with this? Bundling the acknowledgement and the outcome of the payment hides a fatal flaw: if processing the payment takes too long, the server will time out, and the merchant will never get any acknowledgement of its request.

Importantly, this doesn’t mean that the payment never happened. It means that the merchant never learns about it. The shift of liability gets broken when a timeout happens, and bundling everything on the same response increases the chance that it will happen.

Timeouts make reconciliation impossible.

A customer may get a charge on their credit card for a payment that the merchant never received confirmation about. The merchant will try again, of course, and the customer will get charged twice for the same service.

What would you do if you were the customer? You’d dispute the payment. And the merchant would have no way to provide evidence that it did its job right, because the merchant would have made a mistake.

100% of the time, chargebacks in these kinds of situations are won by the customer. At $15 a pop, that amounts to a lot of money lost on a bad software design.

That scenario is not only bad, it’s preventable: we should ban timeouts in payments.

Hold Tight: A Different User Experience

Did I say already that payments are a promise?

A Payment is a Promise Made by An Authorized Party about a Transfer.
This definition highlights why many quite don’t get what payments really are. There are dozens of dimensions by which you could categorize payment methods. But the two most important ones have to do with concepts that belong to computer science: promises, and identity/access management.
— A Taxonomy of Payment Methods

In software, the immediate result you get from a promise is not the actual data, but a placeholder. A way to check the status of the process, and a way to retrieve the outcome if the process succeeds.

The idea of promises can be ported completely to the design of payment applications. One that separates the acknowledgement of the payment request from its processing.

In this design, the server first issues a tracking ID, and sends it back in the response to the merchant. Then, and only then, the payment process can start safely.

The downside is that the response sent to the merchant doesn’t by itself confirm that the payment went through. The merchant’s system must now wait for a confirmation before moving on to the delivery.

But your customer is already familiar with that experience. Haven’t you bought something on Amazon recently? This design is baked into how Amazon collects payments.

Customers hold tight and wait for the confirmation email. So should you.

Merchants don’t have to get very large before it starts making sense to have in-house expertise in payments. As payments technology becomes a critical component of the whole customer experience, it is clear that even small companies can benefit from building that expertise internally, rather than delegating it to external entities.

Technology has become easy to develop. The problem is clear: bad software designs, in payments, cost a lot of money. We should be more demanding of the APIs that our systems integrate with.

Why should we settle for anything less?

Bad API design costs a lot of money. In payments, it's clear why.

Timeouts are the preventable cause of chargebacks that no one bothers to fix

Timeouts exist because the Internet is unreliable

Hold Tight: A Different User Experience

Discussion about this post