I’m not in payments and yet, after reading this brilliant article, I subscribed to your channel. You touch on bedrock engineering principles which transcend your vital niche, methodically demonstrating how those principles apply to payments, but also hinting at how they apply throughout well-engineered systems. Well done.
We have a similar situation at my work. Our system handles card processing. We can gather real-time data and use it in a staging environment but nothing, I mean nothing, is the same as real-time. We found a bug and have stressed over how best to test the fix. I keep saying we just need to release it and watch the results, roll back quickly if it fails. Your article was very timely. Thank you!
Love your work. I think Charity Majors is a legend but the impact of you, from your position within Uber making these points is huge. Even more so given you're speaking about payments, which is where s**t get's real. Never underestimate that impact.
I was pivoting to public cloud back when Netflix was scaling. People like Adrian Cockroft talking about their very real problems and unique solutions has been instrumental to the way I think about resilient, scalable distributed systems. You continue that fine tradition & it's awesome. Thanks :)
This is total vindication for all the times I argued against spending vast engineering effort on the 'perfect' staging system. This approach makes more sense, especially when dealing with 3rd party APIs which often have garbage dev/staging environments.
I would add some nuance to what I said about third party sandboxes. They're not going to improve, but that's because they're built so that you can do API integration testing.
Which is a great first step, and much needed. But there's more to it than just making sure that schemas conform to the API.
I've worked on financial transaction systems that are way lower traffic than Uber, and it was still essential to test in production. It almost seems self-evident - the more critical the software is, the more important it is to test the final, live product properly.
I think the message of how your operations supports this is the real takeaway here.
I've worked in payments (not in this depth, but similarly, for an online game) about 15 years ago. Now I am working on developing a college course about the difference between prototyping and production. Thank you so much for providing such a useful reference on this topic as it pertains to payments!
Found this article through hacker news and followed instantly since this talks about a field I’ve always been fearful to engage in. Handling payments sounds terrifying to gave to deal with both legacy software and behemoth institutions when money is so visibly on the line. This article helped taper my fears though. If we design with failure as an expectation, we can fail more gracefully. There is no need to fear because there is no risk. The stakes are only high if a failure has no backup plan.
Hi Nate, I appreciate your comment, especially the last sentence. It captures the zeitgeist of what I'm going for with this post beautifully!
I believe this post resonated with a lot of people precisely because of its emotional component. We're just afraid to test stuff in prod. It's trite, but the solution is precisely leaning into that fear.
Loved it. Especially the emphasis on resilience vs attempting to "never fail".
I’m not in payments and yet, after reading this brilliant article, I subscribed to your channel. You touch on bedrock engineering principles which transcend your vital niche, methodically demonstrating how those principles apply to payments, but also hinting at how they apply throughout well-engineered systems. Well done.
Stephen, this made my day. Thank you so much for your kind words, I'll do my best to live up to the expectations in future articles :D
This is one of the better eng articles I’ve read lately. Keep it up!
Hope you stay tuned for more!
It is an amazing article, Thanks
A colleague shared this article with me and I loved reading it. Your work on writing it was much appreciated. Thanks for that!
Great article! Super insightful! Keep it going!
Great article, super informative!
We have a similar situation at my work. Our system handles card processing. We can gather real-time data and use it in a staging environment but nothing, I mean nothing, is the same as real-time. We found a bug and have stressed over how best to test the fix. I keep saying we just need to release it and watch the results, roll back quickly if it fails. Your article was very timely. Thank you!
Good luck and Have Fun with your roll out!
super interesting
Love your work. I think Charity Majors is a legend but the impact of you, from your position within Uber making these points is huge. Even more so given you're speaking about payments, which is where s**t get's real. Never underestimate that impact.
I was pivoting to public cloud back when Netflix was scaling. People like Adrian Cockroft talking about their very real problems and unique solutions has been instrumental to the way I think about resilient, scalable distributed systems. You continue that fine tradition & it's awesome. Thanks :)
Hey Andrew thanks for this! Watch out for the next week's post on Airbnb, I'm sure you'll like it :)
This was a great read, subscribed!
Awesome article. It is helpful and it increases my curiosity around getting that playbook.
This is total vindication for all the times I argued against spending vast engineering effort on the 'perfect' staging system. This approach makes more sense, especially when dealing with 3rd party APIs which often have garbage dev/staging environments.
I would add some nuance to what I said about third party sandboxes. They're not going to improve, but that's because they're built so that you can do API integration testing.
Which is a great first step, and much needed. But there's more to it than just making sure that schemas conform to the API.
I've worked on financial transaction systems that are way lower traffic than Uber, and it was still essential to test in production. It almost seems self-evident - the more critical the software is, the more important it is to test the final, live product properly.
I think the message of how your operations supports this is the real takeaway here.
Thanks for writing!
I couldn't agree more Dave. And yet, testing in prod makes most people cringe. What do you think is the reason?
I've worked in payments (not in this depth, but similarly, for an online game) about 15 years ago. Now I am working on developing a college course about the difference between prototyping and production. Thank you so much for providing such a useful reference on this topic as it pertains to payments!
Let me shamelessly plug my recent talk on the topic of prototype vs production: https://news.alvaroduran.com/p/enterprise-python
Hope it's useful!
Found this article through hacker news and followed instantly since this talks about a field I’ve always been fearful to engage in. Handling payments sounds terrifying to gave to deal with both legacy software and behemoth institutions when money is so visibly on the line. This article helped taper my fears though. If we design with failure as an expectation, we can fail more gracefully. There is no need to fear because there is no risk. The stakes are only high if a failure has no backup plan.
Hi Nate, I appreciate your comment, especially the last sentence. It captures the zeitgeist of what I'm going for with this post beautifully!
I believe this post resonated with a lot of people precisely because of its emotional component. We're just afraid to test stuff in prod. It's trite, but the solution is precisely leaning into that fear.