Tokenization, or How I Learned to Stop Worrying and Love the Vault

A handbook for using tokenization for card payments

Jan 08, 2024

Welcome to Money In Transit, the newsletter for startup founders who find themselves dragged into payments technology. I’m Alvaro Duran.

Today’s post deals with a widespread but not well-understood technology for securing card payments. You may have heard that tokens are “a placeholder for the actual card number”, but have never delved deeper into how it all works.

This is your chance to learn more.

Consider sharing this post with a founder who wants to start doing some analytics on their payment data.

Right now, somewhere, there is a database with millions of rows that contain social security numbers in plain text, unscrambled. For everyone to see.

A Jackpot for hackers.

Early stage startup founders are often dismissive when it comes to security. They are so laser focused on Priority Number One that they even “forget” about keeping things moderately safe.

If I had a dollar every time I heard the phrase “Yeah, we’re not rotating our API keys yet, but we will, someday”...

Scale ups treat security a little bit better, because they see it as a necessary evil. But they almost never invest enough in it. That’s because when a data breach happens, it is often the users, and not the company, who feel the pain.

This is Cybersecurity’s Elephant in the Room: it seldom makes economic sense for companies to secure their data. That’s why regulators have had to step in and gradually build a comprehensive set of rules and restrictions that companies have to follow when it comes to their users’ data.

Or else.

This has happened in card payments, with a twist: the regulators didn’t come from the public sphere. No, they came from Visa, Mastercard, and the rest of the payment schemes.

Sick of credit card fraud, but sicker about the idea that card payments could become so wild-west-y that people reverted back to cash, they ended up forming a regulatory council and a standard commonly known as PCI.

This standard is designed to guarantee the security of the card payment ecosystem. Every participant has to adhere to a set of strict rules and audits, or be subject to fees. Or worse: being unable to accept card payments.

That, online, is synonymous with death.

One of the most recent techniques to protect cardholder data is tokenization. However, even the most important book on payments nowadays has only a petty paragraph about it:

PCI compliance is an important step, but it is becoming evident that attacks are still possible. Several other initiatives are underway to further protect card data, including tokenization (which substitutes a placeholder number for the actual card number) and end-to-end encryption (to protect card data from being entered into point-of-sale acceptance locations).
— Payments Systems in the US

That is an understatement. In my opinion, tokenization has become one of the most powerful ways to reduce the associated costs of PCI compliance.

Why? Because, unlike any other security measure, it isolates the parts of your payment applications that are “in scope” for PCI.

Without tokenization, your whole payment application is subject to strict controls, specialized hardware, having-a-colonoscopy-performed-on-you levels of audits, etc.

With tokenization, startups can direct their auditors to look at those isolated parts more closely, and leave everything else alone.

Curious to learn more? Let’s dive in.

Under the Assumption of Breach

Sensitive data can be protected in two, non-mutually exclusive ways. One way is to secure how the application handles the data. This is the domain of encryption, or how to prevent a breach from happening. This means protecting communications (called encryption in transit) or protecting the database (called encryption at rest).

Another way is securing the data itself. That means assuming that the breach has indeed happened, and figuring out ways to make the information revealed to attackers the least valuable possible. This is the domain of tokenization.

By the way, if you’re reading this and you come from Web3, I’m sorry to break it to you, but for card payments the word tokenization means something completely different to what you’re familiar with. Get used to it.

Anyway, securing the data itself requires some form of obfuscation, the deliberate use of misleading information to interfere with surveillance, according to Helen Nissenbaum. Instead of storing the actual values, you have a “stand-in” piece of data, meaningless on its own, that references the actual data, which is secured in another, better protected environment, acting as the source of truth.

That scrambled piece of nonsense is called a token, and this process, tokenization.

Unlike encryption, there is no mathematical relationship between the token and the data it obfuscates. You may think that is a disadvantage, but in fact it isn't. It is what makes them non-exploitable.

Think about it. Say that an attacker collects millions of these tokens by figuring out how to breach into your VC-funded pet project. If there were a mathematical relationship, that amount of data would be precisely the raw material the attacker could play around with to figure out the relationship. Tokens are non-exploitable by virtue of being generated out of thin air.

Because of that, tokens require a database that maps the sensitive data to the token. A fortified environment, sitting outside the application, which usually goes by the name of vault.

Inside that vault, you can have tokens that are multi-use, which are active until explicitly revoked, and can be used again and again (subscriptions make a handy use of these). Or you can have tokens that are single-use, which after a preconfigured lifespan (usually a day) are automatically removed from the database and void of any meaning whatsoever.

Tokenization Trade-off

In an ideal world, tokens can simply be UUIDs. You swap your data for a foreign key to the table inside the vault, and that’s the end of the story.

Unfortunately, the world we live in puts some real-world requirements on the tokenization process and as a result things are a bit more complicated.

Say, for instance, that you want to tokenize a phone number. Database engineers have the annoying habit of expecting phone numbers to be, well, numbers. Good luck trying to submit a UUID as a phone number and go around the other systems expecting your data to be formatted according to E164.

Even if you drop that constraint, Carol from HR is going to be baffled when she sees a 56321cb7-e4dd-4958-a571-61d9fc1022d2 where she was expecting a +1-202-555-0176.

Are you nuts? The UUID fits in your Phone Number attribute perfectly!

In that case, you want your token to be format preserving. Even if it is a token, you want it to look like a phone number. Other types of tokenization include length preserving (with the same length, or a predefined maximum), deterministic (where the same piece of sensitive data is always represented by the same token), and so on.

One clever trick that many tokenization providers use is to enforce that credit card tokens end in the 4 numbers the PAN they reference ends, which makes showing card information to the user way easier.

However, tokens that relax the “no-mathematical-relationship” rule should do that carefully.

Making your tokens fall into one of these categories involves an important trade-off. Having the ability to perform operations such as equality, aggregation and search comes at the expense of leaving some room for inferring the data from the token.

If the attacker knows that your token shares the last 4 digits with the actual sensitive data, it becomes a little bit easier than it was before to figure out the rest of it.

How Do Tokens Make A Breach Pointless

“Wait”, you may say. “If credit card tokens can be used more than once, if an attacker steals one, can they use it to make an API request to pay themselves?”

This was one of the biggest sources of fraud when tokenization was initially deployed for online card payments.

The way it was solved is as follows: tokenizing a credit card is no longer about the PAN only, but also some context data as well, such as the cardholder name, and most importantly, the merchant for which the card is expected to be used.

That way, subscription payments can go through, and at the same time it prevents any other use of that token. An attacker can only pay to the very merchant it got the token from.

I’m sold. What do I have to do?

Ready to use tokenization in your payment applications? Here are some things you have to consider:

Network segmentation: Your tokenization system must be separated from the rest of payment applications. It must also have dedicated and specific hardware, which has names depending on your cloud provider of choice.
Access control: The system must configure very granularly who has access to which information. Not only for security, but also compliance with things like data-residency or any other GDPR shenanigans.
Monitoring: Is someone trying to brute-force a token? You should be alerted about it. You should also rate-limit it.
Distinguishability: A word that I always need autocorrect to fix, which means that you must be able to differentiate tokens from sensitive data. It may sound silly, but going back to the phone number examples it should make sense why it is not always an easy task.

That said, tokenization alone, even when done right, is much more than having a safe database. What startups need is a combination of a good token generation process, sensible token mapping, the vault, and some key management that manages access to those services secured.

I recommend reading Visa’s bulletin on best practices for card tokenization if you want to learn more.

Tasing TaaS

This is just too much for a small startup to do. But that doesn’t mean you should call your payment platform of choice and hand them over your tokenization needs!

As I said before, relying on them for just about everything payment-wise is probably a mistake.

Payment platforms accumulate power over their customers by courting them into an ecosystem from which they can’t escape, with walls built with proprietary systems and inaccessible talent.
By integrating with payment platforms, startups mindlessly walk into a prison they cannot smell, taste, or touch. Startups’ technical debt becomes the provider’s competitive advantage. Founders may complain, but will always abide by the platform’s rules, whatever those may be.
— Must There Be a Payment Platform?

Giving these platforms the keys to your customers’ card data is playing with fire. As your startup grows, your needs become more sophisticated. The payment platform, smelling desperation, will overprice you for that sophistication, keeping the business and its switching costs hostage to that early “let’s outsource this” decision.

But this is still too much for a small startup to do!

Funnily enough, there’s been a few startups already who started building tokenization in-house only to find out that other companies needed it, and pivoted to offering tokenization as a service.

You should probably rely on those services. Dedicated Tokenization as a Service (TaaS?) providers are just another example of giving the responsibility of specialized components to specialized companies. Vendors like Skyflow and Basis Theory have already proven to be valid alternatives to payment platforms, generating significant value outside your startup’s core competency.

Not affiliated, by the way. There are more out there. These are the ones I know.

Adding some overhead to deal with multiple representatives compared to your One Stop Payment Shop of choice sounds bad. But it pays off when you realize that it puts you in a stronger position when it comes to pricing power and feature prioritization efforts.

A behemoth provider can shut you down unannounced. What would you do if that happened? Ask reddit?

Mitigating Vendor Risk

That said, not every provider under the sun is a good choice. Some perks your selected tokenization provider should have are:

Authentication and Authorization mechanisms to properly restrict access to stored data.
Good latency and throughput to prevent any negative impact on the payment experience.
High availability, being able to meet the business’s demands at scale, which is almost a given nowadays in the age of the cloud, but still.

Startups that seek to understand the payment experience, and figure out ways to leverage that understanding and leave the undifferentiated heavy lifting to specialized providers, will prevail over those that “outsource” it to big industry names.

Tokenization can be the stepping stone. Maybe, after all, you begin to realize that your startup is a payments company?