Stripe Deploys 1,000 Times A Day And You Don't Deploy On Fridays
If you don't know, the thing to do is not to get scared, but to learn
Does your pipeline ground to a halt on Fridays? Good.
I don’t want to preach to the choir here; I’m looking for those who are afraid that a bad release will ruin their weekend. And for those afraid to get back on Monday and find the system on fire.
I’m looking for them, because they’re right.
You must protect your offtime, and you must never miss anything important during a release.
But those are poor reasons to avoid deploys on Friday.
There is a recent video of the CEO of Stripe at Stripe Sessions in which he claimed that, in 2024, the company shipped 1,145 pull requests into production per day.
With 8,500 employees, and 40 percent of those working as engineers, that means that each engineer deploys once every 3 days.
All while having 99.999 percent availability for the entire year.
Neither your company nor mine deploys a thousand times a day. But having each engineer deploy once every 3 days? That’s not an unreasonable target, right?
That is, unless you’re banned from deploying on Fridays.
But first, why does deploying one extra day even matter?
Server-based businesses are built on speed
If you write software that’s bought and used online, as a service, you can do anything.
Your only constraint, really, is that others can do it too. You’re in competition with companies offering similar products, and each and every customer that buys from you is frequently pushed to switch to your competitors.
The only reason that keeps them from switching so far has been your reputation for giving them what they’re looking for.
Psychologically, that translates into this: only an alternative that’s 10x better will make your customers overcome the inertia.
The bad news is that the reason your competitors’ customers don’t switch to buy from you is exactly the same. Every day, you’re either keeping up with the competition or trying to become 10x better than them.
And assuming that the other companies have competent engineers who can develop the same features you can, your only advantage is to deliver those features first.
If you write software that’s bought and sold online, the only competitive advantage is speed.
It was obvious that rapid development would be important in this market. We were all starting from scratch, so a company that could get new features done before its competitors would have a big advantage. We knew Lisp was a really good language for writing software quickly, and server-based applications magnify the effect of rapid development, because you can release software the minute it's done.
— Paul Graham, Beating The Averages
But speed is at odds with quality. If you’re biased for action, you may be biased for rushed action.
You only have a single go at a first impression, and if users see a buggy implementation, there’s no recovering from that. Even the tiniest error is amplified by the many customers already using your product.
So the answer, for some engineers, is to avoid the situations where deploys are extra risky. Like Fridays.
As a company gets successful, your track record, your brand, becomes a pseudo-advantage. You’re slower, but reliable. At least for now.
And you hope that delivering the highest quality software on the first deploy will make up for the slowness.
Quality, or Speed. Pick one.
Or so the thinking goes.
The Velocity Initiative
The flawed assumption in the quality vs speed dilemma is the belief that systems are monolithic, and are changed linearly.
That is, the system is comprehensible, and we can arrive at increasingly high quality by testing it more in a safe environment, and fix the bugs before we go live with the changes.
Such is the underlying thinking of that famous Microsoft memo called Zero Defects:
Zero defects has actually been achieved on software projects; it is not an impossible goal. Zero defects must be the new performance standard for development. A “defect” occurs when something that is labeled “done” does not conform to the requirements. We need to understand our methods, and strive to improve them in order to prevent defects from happening, or recover from them if they do happen. You’ll be able to measure your success by the reduced time from code complete to shipping.
— Chris Manson, Zero Defects memo
Didn’t I say you were right in not deploying on Fridays? You’re not alone. Zero Defects is “The Holy Grail” of programming, and the hardest part about it is “to decide that you want to write perfect code”.
But haven’t you noticed that Chris Manson is talking about 1990s Microsoft, when the company’s most important product was an operating system? Isn’t Zero Defects a perfectly reasonable approach to code as long as you’re delivering a software product in a CD-ROM?
Your company’s systems aren’t like that. They’re not comprehensible. They’re not monolithic. They’re run by many engineers, each pursuing independent goals, often with very little communication with other teams.
That’s why Zero Defects doesn’t work nowadays. In most companies, a single tester can’t check all use cases for all your systems before every release. Tests are made at the service level, maybe checking 2 or 3 services at once.
But not all.
Contrast that with eBay’s Velocity Initiative:
What I came back to eBay to do about two and a half years ago was to introduce this kind of continuous delivery capabilities to the company. Again, eBay’s been around for a long time and so I led this cross-cut company initiative, we called it a Velocity Initiative, to improve software delivery and our idea was to, again, Think Big, Start Small, but Learn Fast. So we iteratively identify a whole bunch of bottlenecks and issues that teams were running into in terms of building and testing and deploying, and we would ask “what would it take for you to deploy your application every day?”
— Randy Shoup, Large-Scale Architecture: The Unreasonable Effectiveness of Simplicity
Notice the absence of “getting things right the first time” mentality in such an approach.
And yet, the results were:
5x faster deployment frequency
5x faster lead time
3x lower change failure rate
3x lower mean-time-to-recover
I’m not surprised by these findings. In How Uber Tests Payments In Production, I said that the only way I know to produce high quality software is to deploy a reasonably good version, and fix all the bugs that appear along the way as fast as you can.
Fast deploys, and fast fixes. Even for something as scary as payments.
eBay’s Velocity Initiative is a powerful signal: quality and speed feed on each other.
Not Quality or Speed, but both.
A Deploy A Day, Even on Fridays
The goal isn’t to deploy for deploy sake, just like testing in order to achieve 100 percent code coverage is pointless.
The goal is to be able to deploy fast, so that:
We can check that our assumptions are valid
We can deploy fixes where they matter the most
We can have code that makes a business difference in front of the users faster than the competition
The goal is not 1,145 deployments per day. It's removing the friction that makes that pace impossible.
You won’t succeed with a no-release-Friday attitude.