Run Your Engineering Team Like a Hedge Fund
Embracing Asymmetric Risk
What is the common economic theme between:
- Lockheed Skunk Works (1943) (see my analysis here)
- Netflix's Migration to AWS and AWS S3 Storage (later 2000s onward)
- High-Performance Software Engineering (my previous post)
Embracing engineering as high-risk, high-return investment where risk is asymmetric, bad things can happen, and no less than exponential growth is expected!
Some history, three dimensions
Let's start with the following claims:
- Google invented the foundational math and architectural blueprints (GFS, MapReduce, Borg) that made massive distributed computing possible.
- AWS commoditized that architecture, transforming physical hardware into infinitely scalable, programmable infrastructure.
- Netflix pioneered the operational culture to survive it, mathematically engineering application resilience through intentional chaos
Netflix AWS's migration is a great example because it is simple and yet a mathematical crowbar:
- The story is that Netflix had the great insight to intentionally randomly kill their production servers during their migration effort to AWS.
- A simple interpretation is that a system that can survive constantly self inflicted harm will be robust to natural hardware failures in production.
- Yet the more profound understanding is that by doing so purposely disallowed their engineers to take a classical development approach. The teams were "forced" into a different and better way of thinking about their solution!
A classical development approach would have started by stating that the server failure rate was BELOW a value, and that would have allowed the engineers to build a solution bottom-up, as they had done before.
Instead by stating failure rates is always ABOVE a value (as a result of the self inflicted shutdowns), the engineers could no longer approach things in a (then) classical way, and were forced to build their system top down!
The math they used had been much invented at Google. The architecture much invented at Amazon. The genius of Netflix's Chaos engineering approach of random server shutdown "everywhere" was that it enable Netflix to embrace a single "embrace risk or die" culture across all of their engineers.
Embracing mathematics, embracing less than perfect
The key concept is that big system needs math. Yet as these systems are big, exact math is too complicated, and instead big systems are built to be less than perfect, but with the help of math, you can squeeze that "less" to be as small as you want!
Having embraced the less than perfect, we can also embrace non-linear models! An in part why I use the exponential term above.
Also, to highlight again that Netflix built on Google and AWS math and architectures. Long ago, for the math, I would have recommended books like The Design of Approximation Algorithms by David P. Williamson and David B. Shmoys, and especially Approximation Algorithms by Vijay V. Vazirani, as I actually left the following review for it on Amazon back in 2010:
I have always been a operational research aficionado, but when I skimmed through this book the first time I did not really "get it". Then, I was led back to this book and now it has become one of those books I like to pick up just to bring the remembrance of the concepts it talks about. One main topic is duality and how for different problems we can squeeze the gap between the solution built in primal and dual space. I have known the theory for years, yet it is only more recently and with help among others from this book, that I start to get a glimpse of the whole depth and magic in this area of "applied" mathematics.
(Trying not to digress, I started a PhD in the above type of math, got distracted, instead wrote a semiconductor device simulator for my PhD that is still the number one in the business after 30 years, and then I was in charge of math for a derivative market making system).
Mechanizing good enough math
As an example of how applied math has changed, I start by mentioning the computational geometry code I wrote earlier this year to "flex" my vibe-coding skills. Pretty quickly I was proving geometrical properties of specific configurations (e.g. Napoleon theorem). Proving specific properties is nice, but what about demonstrating a global invariant? I asked my LLM to reformulate the code as a Monte-Carlo like testing of polynomial expressions extracted from the geometrical properties. In less than a few hours I was producing approximate proofs of some famous geometrical theorems.
The reality in 2026 is that advanced and powerful abstractions can be now be used efficiently to drive engineering developments. For optimization and proof problems this is much because we never try to get an absolute answer, we are happy to express the precision we need and work with that.