No More Hustleporn: Decisions and Pivots After 5 Years


Tweet by Soumith Chintala

https://twitter.com/soumithchintala

I ❤️ research -- embodied models, robots, ML systems. I think deeply about lowering the barrier to use A.I.

Co-creator and lead of @PyTorch at Meta A.I.


It’s been 5 years since we launched @pytorch. It’s much bigger than we expected -- usage, contributors, funding. We’re blessed with success, but not perfect. A thread (mirrored at soumith.ch/posts/2022/01/… ) about some of the interesting decisions and pivots we’ve had to make 👇

https://soumith.ch/posts/2022/01/pytorch-retro/

In the first year, we started small, focused on researchers — and we did that focused and well. We had a good product, strong support structure that we carefully curated and a twitter account that we set on fire. It worked a little too well :)
/

We were extremely blessed to add in folks like @ptrblck, @ezyang, @t-vi early on, and countless others that joined in on the PyTorch party and have completely changed the way we build things! /

I remember at that time, we still had hope that we close all github issues -- we tried hard. When we had the 150th open issue, I remember sitting with @zou3519 and @TongzhouWang saying if we worked faster and smarter, we could go back to Inbox Zero. Oh, how naive we were! /

After the 0.3 release, we knew that at the rate at which the hardware was getting faster, we absolutely needed a compiler to be able to drive hardware optimally. Fun fact, @colesbury (who built the Python nogil work) opposed that view :-) /

Building an ML compiler was and is a research problem, for two reasons:
1. We didn’t know how to codegen efficient code for dynamic shapes
2. It's hard to slice Python in the right way to make it small, yet be good for all the flexibility that users expect from Python / PyTorch

So, we bet on TorchScript (more specifically jit.script). This has been a rough ride, because Python is large and people like using most or all of Python. It wasn’t obvious then, though it is somewhat obvious in retrospect. We’re unbundling it. /

We could’ve made TorchScript more appealing if we focused it on performance, shown 10x better perf than in eager mode – taken a Numba-like approach – limited but powerful subset. /

We tried that minimally with optimizing small RNNs, but the world had moved on by the time we got there. That would’ve given people strong incentives to port to TorchScript. /

But we didn’t focus too much on performance, and focused instead on exporting PyTorch programs to C++ for productionisation. Here’s why: /

We were bold, competent but very underfunded. For example, the first version of PyTorch-Distributed was built by @apaszke and three of his friends as a class project. Additionally, the biggest criticism (+demand) at that time was that PyTorch was a playtoy not ready for prod. /

So, around the time we were building our compiler, we got the opportunity to merge together with the Caffe2 project, for a significantly larger team and a much more sustained and growing funding. We took it, and I made the call. /

This is also where we baked in the “commits have to be landed by FB engineers” which was a huge trade-off. I made the call knowing that the downside was increased friction in open-source for a few years until we could streamline this aspect. Life is not perfect. /

For about two years, we pivoted our compiler stack to handle prod, silently seeing XLA and TVM breeze past. We also had to integrate two teams (PyTorch and Caffe2) – which was more of a social engineering problem than a technical problem. It took time – between 2018 and 2020. /

We also cleaned up internals which were bubble-gum wrapped house of cards, enabling PyTorch Mobile, enable hardware vendors to build out-of-tree backends, helped start a strong collab with @GoogleAI on TPUs, and many ongoing projects (fx, functorch, dynamo, dispatch, lazy). /

Also, our CPU perf was horrendous (but the researchers didn’t notice), and prod workloads cared about CPU a lot – we fixed it. /

All of this had a massive impact: libtorch enabled us to enter many new markets as a result – self-driving, mobile apps, recommendation. We are running in prod in many of the top companies across the world. /

PyTorch's super-strong emphasis on backward-compatibility is universally appreciated till today, and I’m proud of that. /

Anyways, none of this big production push helped researchers in a significant way, so many jokingly say that not much has changed since PyTorch 0.4, and in a simplistic, squinty view, they are somewhat right. /

While we have seen massive success in research, it is heavily lifted by our core initial product + strong backward-compatibility guarantees.
/

With the additional funding due to the prod pivot, we made great improvements to distributed, perf, added new layers and functions (some of them with bad design, sorry!), complex nb. This has been massive time+effort, and that makes people’s day-to-day usage easier and better /

But what researchers mean when they say “not much has changed” is that there hasn’t been a step-change in their day-to-day user-experience. /

Product exploration and product design are generally handled the same way research is – we explore, and exploit what we find. /

With the various threads tying up, I am pretty bullish that today, we have the right infra, the right set of leaders and maintainers, the right attitude and priorities to have PyTorch lead significant product disruption again (unless all our bets are uniformly bad by luck). /

We are blessed to have the trust of the research and open-source community even as I made certain calls that were not ideal in their perspective (at that time). /

I am really excited to see prototypes like dynamo, lazy, fx, functorch, nvfuser etc. and I’m pretty confident that they will consolidate into a disruptive next set of experiences. I’m proud that we’ve built prototypes like named-tensor, even though they didn’t work out.