Chris Lattner: Compilers, LLVM, Swift, TPU, and ML Accelerators

Lex Fridman Podcast

Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond. Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond.

Transcribed podcasts: 442
Time transcribed: 44d 12h 13m 31s

results.

Mention graph

This graph shows how many times the word ______ has been mentioned throughout the history of the program.

The following is a conversation with Chris Latner.
Currently, he's a senior director at Google,
working on several projects, including
CPU, GPU, TPU accelerators for TensorFlow,
Swift for TensorFlow, and all kinds of machine learning
compiler magic going on behind the scenes.
He's one of the top experts in the world on compiler
technologies, which means he deeply
understands the intricacies of how hardware and software come
together to create efficient code.
He created the LLVM Compiler Infrastructure Project
and the Clang Compiler.
He led major engineering efforts at Apple,
including the creation of the Swift programming language.
He also briefly spent time at Tesla
as vice president of Autopilot software
during the transition from Autopilot Hardware 1
to Hardware 2, when Tesla essentially started from scratch
to build an in-house software infrastructure for Autopilot.
I could have easily talked to Chris for many more hours.
Compiling code down across the level's abstraction
is one of the most fundamental and fascinating aspects
of what computers do.
And he is one of the world experts in this process.
It's rigorous science, and it's messy, beautiful art.
This conversation is part of the Artificial Intelligence
podcast.
If you enjoy it, subscribe on YouTube, iTunes,
or simply connect with me on Twitter
at Lex Friedman, spelled F-R-I-D.
And now, here's my conversation with Chris Ladner.
What was the first program you've ever written?
Well, the first program back.
And when was it?
I think I started as a kid, and my parents
got a basic programming book.
And so when I started, it was typing out programs
from a book, and seeing how they worked,
and then typing them in wrong, and trying
to figure out why they were not working right,
and that kind of stuff.
So basic, what was the first language
that you remember yourself maybe falling in love with,
like really connecting with?
I don't know, I mean, I feel like I've learned a lot
along the way, and each of them have a different
special thing about them.
So I started in basic, and then went GW basic,
which was the thing back in the DOS days,
and then upgraded to Q basic, and eventually quick basic,
which are all slightly more fancy versions
of Microsoft basic.
Made the jump to Pascal, and started doing
machine language programming and assembly in Pascal,
which was really cool.
Turbo Pascal was amazing for its day.
Eventually got into C, C++, and then kind of
did lots of other weird things.
I feel like you took the dark path, which is the,
you could have gone lisp.
Yeah, yeah.
You could have gone higher level,
sort of functional, philosophical, hippie route.
Instead you went into like the dark arts of the C.
It was straight into the machine.
Straight to the machine.
So it started with basic Pascal, and then assembly,
and then wrote a lot of assembly, and.
Why, how?
Eventually did small talk, and other things like that.
That was not the starting point.
But so what is this journey to see?
Is that in high school, is that in college?
That was in high school, yeah.
So, and then that was,
it was really about trying to be able to do
more powerful things than what Pascal could do,
and also to learn a different world.
So he was really confusing to me with pointers,
and the syntax, and everything, and it took a while.
But Pascal's much more principled in various ways.
C is more, I mean, it has its historical roots,
but it's not as easy to learn.
With pointers, there's this memory management thing
that you have to become conscious of.
Is that the first time you start to understand
that there's resources that you're supposed to manage?
Well, so you have that in Pascal as well,
but in Pascal, the carrot instead of the star,
there's some small differences like that,
but it's not about pointer arithmetic.
In C, you end up thinking about
how things get laid out in memory a lot more.
And so in Pascal, you have allocating and deallocating
and owning the memory, but just the programs are simpler,
and you don't have to, well, for example,
Pascal has a string type.
And so you can think about a string
instead of an array of characters
which are consecutive in memory.
So it's a little bit of a higher level abstraction.
So let's get into it.
Let's talk about LLVM, CLang, and compilers.
Sure.
So can you tell me first what LLVM and CLang are,
and how is it that you find yourself the creator
and lead developer, one of the most powerful
compiler optimization systems in use today?
Sure, so I guess they're different things.
So let's start with what is a compiler?
Is that a good place to start?
What are the phases of a compiler?
What are the parts, yeah, what is it?
So what is even a compiler used for?
So the way I look at this is you have a two-sided problem
of you have humans that need to write code,
and then you have machines that need to run the program
that the human wrote.
And for lots of reasons, the humans don't want to be writing
in binary and want to think about every piece of hardware.
And so at the same time that you have lots of humans,
you also have lots of kinds of hardware.
And so compilers are the art of allowing humans
to think at a level of abstraction
that they want to think about,
and then get that program, get the thing that they wrote
to run on a specific piece of hardware.
And the interesting and exciting part of all this
is that there's now lots of different kinds of hardware,
chips like x86 and PowerPC and ARM and things like that,
but also high-performance accelerators for machine learning
and other things like that,
or also just different kinds of hardware, GPUs.
These are new kinds of hardware.
And at the same time, on the programming side of it,
you have your basic, you have C, you have JavaScript,
you have Python, you have Swift,
you have lots of other languages
that are all trying to talk to the human in a different way
to make them more expressive and capable and powerful.
And so compilers are the thing
that goes from one to the other, no?
And to end, from the very beginning to the very end.
And to end.
And so you go from what the human wrote,
and programming languages end up being about
expressing intent, not just for the compiler and the hardware,
but the programming language's job
is really to capture an expression
of what the programmer wanted
that then can be maintained and adapted
and evolved by other humans,
as well as interpreted by the compiler.
So when you look at this problem,
you have, on the one hand, humans, which are complicated,
and you have hardware, which is complicated.
And so compilers typically work in multiple phases.
And so the software engineering challenge
that you have here is try to get maximum reuse
out of the amount of code that you write,
because these compilers are very complicated.
And so the way it typically works out
is that you have something called a front end or a parser
that is language specific.
And so you'll have a C parser, and that's what Clang is,
or C++ or JavaScript or Python or whatever,
that's the front end.
Then you'll have a middle part,
which is often the optimizer.
And then you'll have a late part,
which is hardware specific.
And so compilers end up,
there's many different layers often,
but these three big groups are very common in compilers.
And what LLVM is trying to do
is trying to standardize that middle and last part.
And so one of the cool things about LLVM
is that there are a lot of different languages
that compile through to it.
And so things like Swift, but also Julia, Rust,
Clang for C, C++, subjective C,
like these are all very different languages
and they can all use the same optimization infrastructure,
which gets better performance,
and the same code generation infrastructure
for hardware support.
And so LLVM is really that layer that is common
that all these different specific compilers can use.
And is it a standard, like a specification,
or is it literally an implementation?
It's an implementation.
And so I think there's a couple of different ways
of looking at it, right?
Because it depends on which angle you're looking at it from.
LLVM ends up being a bunch of code, okay?
So it's a bunch of code that people reuse
and they build compilers with.
We call it a compiler infrastructure
because it's kind of the underlying platform
that you build a concrete compiler on top of.
But it's also a community.
And the LLVM community is hundreds of people
that all collaborate.
And one of the most fascinating things about LLVM
over the course of time is that we've managed somehow
to successfully get harsh competitors
in the commercial space to collaborate
on shared infrastructure.
And so you have Google and Apple.
You have AMD and Intel.
You have Nvidia and AMD on the graphics side.
You have Cray and everybody else doing these things.
And like all these companies are collaborating together
to make that shared infrastructure really, really great.
And they do this not out of the goodness of their heart,
but they do it because it's in their commercial interest
of having really great infrastructure
that they can build on top of
and facing the reality that it's so expensive
that no one company, even the big companies,
no one company really wants to implement it all themselves.
Expensive or difficult?
Both.
That's a great point because it's also about the skill sets.
And the skill sets are very hard to find.
How big is the LLVM?
It always seems like with open source projects,
the kind, LLVM is open source?
Yes, it's open source.
It's about, it's 19 years old now.
So it's fairly old.
It seems like the magic often happens
within a very small circle of people.
Yes.
At like at least the early birth and whatever.
Yes.
So the LLVM came from a university project.
And so I was at the University of Illinois
and there it was myself, my advisor,
and then a team of two or three research students
in the research group.
And we built many of the core pieces initially.
I then graduated and went to Apple
and Apple brought it to the products,
first in the OpenGL graphics stack,
but eventually to the C compiler realm
and eventually built Clang
and eventually built Swift and these things
along the way, building a team of people
that are really amazing compiler engineers
that helped build a lot of that.
And so as it was gaining momentum
and as Apple was using it, being open source in public
and encouraging contribution, many others,
for example, at Google came in and started contributing.
And in some cases, Google effectively owns Clang now
because it cares so much about C++
and the evolution of that ecosystem.
And so it's investing a lot in the C++ world
and the tooling and things like that.
And so likewise, Nvidia cares a lot about CUDA.
And so CUDA uses Clang and uses LLVM
for graphics and GPU.
And so when you first started as a master's project, I guess,
did you think it's gonna go as far as it went?
Were you crazy ambitious about it?
No.
It seems like a really difficult undertaking, a brave one.
Yeah, no, no, no, it was nothing like that.
So I mean, my goal when I went to the University of Illinois
was to get in and out with the non-thesis masters in a year
and get back to work.
So I was not planning to stay for five years
and build this massive infrastructure.
I got nerd sniped into staying.
And a lot of it was because LLVM was fun
and I was building cool stuff
and learning really interesting things
and facing both software engineering challenges,
but also learning how to work in a team
and things like that.
I had worked at many companies as interns before that,
but it was really a different thing
to have a team of people that were working together
and trying to collaborate in version control
and it was just a little bit different.
Like I said, I just talked to Don Knuth
and he believes that 2% of the world population
have something weird with their brain, that they're geeks,
they understand computers, they're connected with computers.
He put it at exactly 2%.
Okay, so-
He's a specific guy.
It's very specific.
Well, he says, I can't prove it,
but it's very empirically there.
Is there something that attracted you
to the idea of optimizing code?
I mean, it seems like that's one of the biggest,
coolest things about elevation.
Yeah, that's one of the major things it does.
So I got into that because of a person actually.
So when I was in my undergraduate,
I had an advisor or a professor named Steve Wegdahl
and I went to this little tiny private school.
There were like seven or nine people
in my computer science department,
students in my class.
So it was a very tiny, very small school.
It was kind of a work on the side of the math department
kind of a thing at the time.
I think it's evolved a lot in the many years since then,
but Steve Wegdahl was a compiler guy
and he was super passionate
and his passion rubbed off on me
and one of the things I like about compilers
is that they're large, complicated software pieces.
And so one of the culminating classes that many
computer science departments at least at the time did
was to say that you would take algorithms
and data structures in all these core classes,
but then the compilers class
was one of the last classes you take
because it pulls everything together
and then you work on one piece of code
over the entire semester.
And so you keep building on your own work,
which is really interesting.
It's also very challenging because in many classes,
if you don't get a project done,
you just forget about it and move on to the next one
and get your B or whatever it is,
but here you have to live with the decisions you make
and continue to reinvest in it and I really like that.
And so I did an extra study project with him
the following semester and he was just really great.
And he was also a great mentor in a lot of ways.
And so from him and from his advice,
he encouraged me to go to graduate school.
I wasn't super excited about going to grad school.
I wanted the master's degree,
but I didn't want to be an academic.
And, but like I said,
I kind of got tricked into saying
and was having a lot of fun
and I definitely do not regret it.
What aspects of compilers
were the things you connected with?
So LVM, there's also the other part
that's just really interesting.
If you're interested in languages,
is parsing and just analyzing,
like yeah, analyzing the language,
breaking it down, parsing and so on.
Was that interesting to you
or were you more interested in optimization?
For me, it was more,
so I'm not really a math person.
I could do math.
I understand some bits of it when I get into it,
but math is never the thing that attracted me.
And so a lot of the parser part of the compiler
has a lot of good formal theories
that Dawn, for example, knows quite well.
Still waiting for his book on that.
But the, but I just like building a thing
and seeing what it could do
and exploring and getting to do more things
and then setting new goals and reaching for them
and in the case of LVM,
when I started working on that,
my research advisor that I was working for
was a compiler guy.
And so he and I specifically found each other
because we were both interested in compilers
and so I started working with them and taking his class.
And a lot of LVM initially was,
it's fun implementing all the standard algorithms
and all the things that people had been talking about
and were well known and they were in the curricula
for advanced studies and compilers.
And so just being able to build that was really fun.
And I was learning a lot by,
instead of reading about it, just building.
And so I enjoyed that.
So you said compilers are these complicated systems.
Can you even just, with language,
try to describe how you turn a C++ program into code?
Like what are the hard parts?
Why is it so hard?
So I'll give you examples of the hard parts along the way.
So C++ is a very complicated programming language.
It's something like 1,400 pages in the spec.
So C++ by itself is crazy complicated.
Can we just, sorry, pause.
What makes the language complicated
in terms of what's syntactically, like?
So it's what they call syntax.
So the actual, how the characters are arranged, yes.
It's also semantics, how it behaves.
It's also, in the case of C++,
there's a huge amount of history.
C++ built on top of C.
You play that forward and then a bunch of suboptimal,
in some cases, decisions were made
and they compound.
And then more and more and more things
keep getting added to C++
and it will probably never stop.
But the language is very complicated from that perspective.
And so the interactions between subsystems
is very complicated.
There's just a lot there.
And when you talk about the front end,
one of the major challenges, which Clang, as a project,
the C, C++ compiler that I built, I and many people built,
one of the challenges we took on was we looked at GCC.
Okay, GCC at the time was like a really good
industry standardized compiler
that had really consolidated a lot of the other compilers
in the world and was a standard.
But it wasn't really great for research.
The design was very difficult to work with.
And it was full of global variables and other things
that made it very difficult to reuse
in ways that it wasn't originally designed for.
And so with Clang, one of the things that we wanted to do
is push forward on better user interface.
So make error messages that are just better than GCCs.
And that's actually hard because you have to do
a lot of bookkeeping in an efficient way to be able to do that.
We want to make compile time better.
And so compile time is about making it efficient,
which is also really hard when you're keeping track
of extra information.
We wanted to make new tools available.
So refactoring tools and other analysis tools
that the GCC never supported,
also leveraging the extra information we kept,
but enabling those new classes of tools
that then get built into IDEs.
And so that's been one of the areas
that Clang has really helped push the world forward in
is in the tooling for C and C++ and things like that.
But C++ and the front end piece is complicated
and you have to build syntax trees
and you have to check every rule in the spec
and you have to turn that back into an error message
to the human that the human can understand
when they do something wrong.
But then you start doing what's called lowering.
So going from C++ and the way that it represents code
down to the machine.
And when you do that,
there's many different phases you go through.
Often there are, I think,
I love you know something like 150 different
what are called passes in the compiler,
that the code passes through
and these get organized in very complicated ways,
which affect the generated code and the performance
and compile time and many of the things.
What are they passing through?
So after you do the Clang parsing,
what's the graph?
What does it look like?
What's the data structure here?
Yeah, so in the parser, it's usually a tree
and it's called an abstract syntax tree.
And so the idea is you have a node for the plus
that the human wrote in their code
or the function call, you'll have a node for call
with the function that they call
and the arguments they pass, things like that.
This then gets lowered into what's called
an intermediate representation
and intermediate representations are like LVM has one.
And there it's what's called a control flow graph.
And so you represent each operation in the program
as a very simple, like this is gonna add two numbers,
this is gonna multiply two things,
maybe we'll do a call,
but then they get put in what are called blocks.
And so you get blocks of these straight line operations
or instead of being nested like in a tree,
it's straight line operations.
And so there's a sequence
in ordering to these operations.
And then in the block or outside the block?
That's within the block.
And so it's a straight line sequence of operations
within the block.
And then you have branches,
like conditional branches between blocks.
And so when you write a loop, for example,
in a syntax tree, you would have a for node,
like for a for statement in a C like language,
you'd have a for node
and you have a pointer to the expression
for the initializer, a pointer to the expression
for the increment, a pointer to the expression
for the comparison, a pointer to the body.
And these are all nested underneath it.
In a control flow graph, you get a block
for the code that runs before the loop.
So the initializer code,
then you have a block for the body of the loop.
And so the body of the loop code goes in there,
but also the increment and other things like that.
And then you have a branch that goes back to the top
and a comparison and a branch that goes out.
And so it's more of a assembly level kind of representation.
But the nice thing about this level of representation
is it's much more language independent.
And so there's lots of different kinds of languages
with different kinds of, you know,
JavaScript has a lot of different ideas
of what is false, for example,
and all that can stay in the front end,
but then that middle part can be shared across all those.
How close is that intermediate representation
to a neural networks, for example?
Are they, because everything you describe
as a kind of echoes of a neural network graph?
Are they neighbors or what?
They're quite different in details,
but they're very similar in idea.
So one of the things that neural networks do
is they learn representations for data
at different levels of abstraction, right?
And then they transform those through layers, right?
So the compiler does very similar things,
but one of the things the compiler does
is it has relatively few different representations.
Where a neural network often, as you get deeper,
for example, you get many different representations
in each, you know, layer or set of ops
is transforming between these different representations.
In a compiler, often you get one representation
and they do many transformations to it.
And these transformations are often applied iteratively.
And for programmers, they're familiar types of things.
For example, trying to find expressions inside of a loop
and pulling them out of a loop.
So if they execute for your time,
sort of find redundant computation or find constant folding
or other simplifications turning, you know,
two times X into X shift left by one
and things like this are all the examples
of the things that happen.
But compilers end up getting a lot of theorem proving
and other kinds of algorithms that try to find
higher level properties of the program
that then can be used by the optimizer.
Cool, so what's like the biggest bang for the buck
with optimization?
What's a...
A day, yeah.
Well, no, not even today.
At the very beginning, the 80s, I don't know.
Oh yeah, so for the 80s,
a lot of it was things like register allocation.
So the idea of in a modern, like a microprocessor,
what you'll end up having is you'll end up having memory,
which is relatively slow,
and then you have registers relatively fast.
But registers, you don't have very many of them, okay?
And so when you're writing a bunch of code,
you're just saying like, compute this,
put in temporary variable, compute this,
compute this, compute this,
put in temporary variable, I have a loop,
I have some other stuff going on.
Well, now you're running on an x86,
like a desktop PC or something.
Well, it only has, in some cases, some modes,
eight registers, right?
And so now the compiler has to choose
what values get put in what registers
at what points in the program.
And this is actually a really big deal.
So if you think about, you have a loop,
and then an inner loop that executes
millions of times maybe.
If you're doing loads and stores inside that loop,
then it's gonna be really slow.
But if you can somehow fit all the values
inside that loop in registers, now it's really fast.
And so getting that right requires a lot of work,
because there's many different ways to do that.
And often what the compiler ends up doing
is it ends up thinking about things
in a different representation than what the human wrote.
All right, you wrote into x.
Well, the compiler thinks about that
as four different values,
each which have different lifetimes
across the function that it's in.
And each of those could be put in a register
or memory or different memory
or maybe in some parts of the code,
re-computed instead of stored and reloaded.
And there are many of these different kinds of techniques
that can be used.
So it's adding almost like a time dimension
to, it's trying to optimize across time.
So it's considering when you're programming,
you're not thinking in that way.
Yeah, absolutely.
And so the risk era made things.
Risk.
So risk chips, R-I-S-C.
The risk chips as opposed to SysC chips.
The risk chips made things more complicated
for the compiler because what they ended up doing
is ending up adding pipelines to the processor
where the processor can do more than one thing at a time.
But this means that the order of operations matters a lot.
And so one of the classical compiler techniques
that you use is called scheduling.
And so moving the instructions around
so that the processor can like keep its pipelines full
instead of stalling and getting blocked.
And so there's a lot of things like that
that are kind of bread and butter compiler techniques
that have been studied a lot over the course of decades now.
But the engineering side of making them real
is also still quite hard.
And you talk about machine learning.
This is a huge opportunity for machine learning
because many of these algorithms
are full of these like hokey hand rolled heuristics
which work well on specific benchmarks
but don't generalize and full of magic numbers.
And I hear there's some techniques
that are good at handling that.
So what would be the,
if you were to apply machine learning to this,
what's the thing you're trying to optimize?
Is it ultimately the running time?
Yeah, you can pick your metric
and there's running time, there's memory use,
there's lots of different things that you can optimize
where code size is another one
that some people care about in the embedded space.
Is this like the thinking into the future
or has somebody actually been crazy enough
to try to have machine learning based parameter tuning
for optimization of compilers?
So this is something that is I would say research right now.
There are a lot of research systems
that have been applying search in various forms
and using reinforcement learning is one form
but also brute force search has been tried for quite a while
and usually these are in small problem spaces.
So find the optimal way to code generate
a matrix multiply for a GPU, right?
Something like that where you say,
there there's a lot of design space of
do you unroll loops a lot?
Do you execute multiple things in parallel?
And there's many different confounding factors here
because graphics cards have different numbers
of threads and registers and execution ports
and memory bandwidth and many different constraints
to interact in nonlinear ways.
And so search is very powerful for that
and it gets used in certain ways
but it's not very structured.
This is something that we need,
we as an industry need to fix.
So you said 80s but like so,
have there been like big jumps in improvement
and optimization?
Yeah.
Yeah, since then what's the coolest things about?
It's largely been driven by hardware.
So hardware and software.
So in the mid 90s Java totally changed the world, right?
And I'm still amazed by how much change
was introduced by Java.
In a good way or?
In a good way.
So like reflecting back Java introduced things like,
all at once introduced things like JIT compilation.
None of these were novel
but it pulled it together and made it mainstream
and made people invest in it.
JIT compilation, garbage collection, portable code,
safe code, like memory safe code.
Like a very dynamic dispatch execution model.
Like many of these things which had been done
in research systems and had been done
in small ways in various places
really came to the forefront
and really changed how things worked
and therefore changed the way people thought
about the problem.
JavaScript was another major world change
based on the way it works.
But also on the hardware side of things,
multicore and vector instructions
really change the problem space
and are very, they don't remove any of the problems
the compiler's faced in the past
but they add new kinds of problems
of how do you find enough work
to keep a four wide vector busy, right?
Or if you're doing a matrix multiplication,
how do you do different columns out of that matrix
at the same time and how do you maximally utilize
the arithmetic compute that one core has
and then how do you take it to multiple cores?
How did the whole virtual machine thing change
the compilation pipeline?
Yeah, so what the Java virtual machine does
is it splits, just like I was talking about before
where you have a front end that parses the code
and then you have an intermediate representation
that gets transformed.
What Java did was they said we will parse the code
and then compile to what's known as Java bytecode
and that bytecode is now a portable code representation
that is industry standard and locked down and can't change
and then the back part of the compiler
that does optimization and code generation
can now be built by different vendors.
Okay.
And Java bytecode can be shipped around across the wire,
it's memory safe and relatively trusted.
And because of that it can run in the browser.
And that's why it runs in the browser, right?
And so that way you can be in, you know,
again, back in the day you would write a Java applet
and you use it as a web developer,
you build this mini app that would run a web page.
Well, a user of that is running a web browser
on their computer, you download that Java bytecode
which can be trusted and then you do all the compiler stuff
on your machine so that you know that you trust that.
That was a good idea or a bad idea?
It's a great idea.
I mean, it's a great idea for certain problems
and I'm very much a believer that technology is itself
neither good nor bad, it's how you apply it.
You know, this would be a very, very bad thing
for very low levels of the software stack,
but in terms of solving some of these software
portability and transparency or portability problems,
I think it's been really good.
Now Java ultimately didn't win out on the desktop
and like there are good reasons for that,
but it's been very successful on servers
and in many places it's been a very successful thing
over decades.
So what has been LLVM's and CLang's improvements
in optimization that throughout its history,
what are some moments where you had set back
and really proud of what's been accomplished?
Yeah, I think that the interesting thing about LLVM
is not the innovations in compiler research.
It has very good implementations
of various important algorithms, no doubt.
And a lot of really smart people have worked on it,
but I think that the thing that's most profound
about LLVM is that through standardization,
it made things possible that otherwise
wouldn't have happened, okay?
And so interesting things that have happened with LLVM,
for example, Sony has picked up LLVM
and used it to do all the graphics compilation
in their movie production pipeline.
And so now they're able to have better special effects
because of LLVM.
That's kind of cool, that's not what it was designed for,
right, but that's the sign of good infrastructure
when it can be used in ways it was never designed for
because it has good layering and software engineering
and it's composable and things like that.
Just where, as you said, it differs from GCC.
Yes, GCC is also great in various ways,
but it's not as good as a infrastructure technology.
It's really a C compiler, or it's a Fortran compiler.
It's not infrastructure in the same way.
Is it, now you can tell, I don't know what I'm talking about
because I keep saying C lang.
You could always tell when a person is closed,
by the way, pronounce something.
I don't think, have I ever used Clang?
Entirely possible, have you?
Well, so you've used code, it's generated probably.
So Clang is in LLVM or used to compile
all the apps on the iPhone effectively and the OSs.
It compiles Google's production server applications.
So it's used to build GameCube games
and PlayStation 4 and things like that.
So it's a user I have, but just everything I've done
that I experienced with Linux has been,
I believe, always GCC.
Yeah, I think Linux still defaults to GCC.
And is there a reason for that?
Or is it, I mean, is there a reason for that?
It's a combination of technical and social reasons.
Many Linux developers do use Clang,
but the distributions for lots of reasons
have used GCC historically and they've not switched.
Yeah.
Because it's just anecdotally online,
it seems that LLVM has either reached the level of GCC
or superseded on different features or whatever.
The way I would say it is that they're so close
it doesn't matter.
Yeah, exactly.
Like they're slightly better in some ways,
slightly worse in other ways,
but it doesn't actually really matter anymore at that level.
So in terms of optimization, breakthroughs,
it's just been solid incremental work.
Yeah, yeah.
Which describes a lot of compilers.
The hard thing about compilers, in my experience,
is the engineering, the software engineering,
making it so that you can have hundreds of people
collaborating on really detailed low-level work
and scaling that.
And that's really hard.
And that's one of the things I think LLVM has done well.
And that kind of goes back to the original design goals
with it to be modular and things like that.
And incidentally, I don't want to take all the credit
for this, right?
I mean, some of the best parts about LLVM
is that it was designed to be modular.
And when I started, I would write, for example,
a register allocator and then something much smarter
than me would come in and pull it out
and replace it with something else
that they would come up with.
And because it's modular, they were able to do that.
And that's one of the challenges with GCC, for example,
is replacing subsystems is incredibly difficult.
It can be done, but it wasn't designed for that.
And that's one of the reasons that LLVM has been very
successful in the research world as well.
But in a community sense, Guido van Rossum, right,
from Python, just retired from, what is it,
Benevolent Dictated for Life, right?
So in managing this community of brilliant compiler folks,
is there, did it, for a time, at least,
fall on you to approve things?
Oh yeah, so I mean, I still have something
where I can order a magnitude more patches
in LLVM than anybody else.
And many of those I wrote myself.
But you still write, I mean, you still close to the,
I don't know what the expression is, to the metal,
you still write code.
Yeah, I still write code.
Not as much as I was able to in grad school,
but that's an important part of my identity.
But the way that LLVM has worked over time is that
when I was a grad student, I could do all the work
and steer everything and review every patch
and make sure everything was done exactly
the way my opinionated sense felt like it should be done.
And that was fine.
But as things scale, you can't do that, right?
And so what ends up happening is LLVM has a hierarchical
system of what's called code owners.
These code owners are given the responsibility
not to do all the work, not necessarily to review
all the patches, but to make sure that the patches
do get reviewed and make sure that the right things
happening architecturally in their area.
And so what you'll see is you'll see that,
for example, hardware manufacturers end up owning
the hardware specific parts of their hardware.
That's very common.
Leaders in the community that have done really good work
naturally become the de facto owner of something.
And then usually somebody else is like,
how about we make them the official code owner
and then we'll have somebody to make sure
that all the patches get reviewed in a timely manner.
And then everybody's like, yes, that's obvious.
And then it happens, right?
And usually this is a very organic thing, which is great.
And so I'm nominally the top of that stack still,
but I don't spend a lot of time reviewing patches.
What I do is I help negotiate a lot of the technical
disagreements that end up happening and making sure
that the community as a whole makes progress
and is moving in the right direction and doing that.
So we also started a nonprofit six years ago,
seven years ago, time's gone away.
And the nonprofit, the Elvium Foundation nonprofit
helps oversee all the business sides of things
and make sure that the events that the Elvium community
has are funded and set up and run correctly
and stuff like that.
But the foundation is very much stays out
of the technical side of where the project is going.
All right, so it sounds like a lot of it is just organic.
Yeah, well, and this is, Elvium is almost 20 years old,
which is hard to believe.
Somebody pointed out to me recently
that Elvium is now older than GCC was when Elvium started.
So time has a way of getting away from you.
But the good thing about that is it has a really robust,
really amazing community of people
that are in their professional lives,
spread across lots of different companies,
but it's a community of people
that are interested in similar kinds of problems
and have been working together effectively for years
and have a lot of trust and respect for each other.
And even if they don't always agree that,
we're able to find a path forward.
So then in a slightly different flavor of effort,
you started Apple in 2005 with the task of making,
I guess, Elvium production ready.
And then eventually 2013 through 2017,
leading the entire developer tools department.
We're talking about LLVM, Xcode, Objective-C to Swift.
So in a quick overview of your time there,
what were the challenges?
First of all, leading such a huge group of developers.
What was the big motivator dream mission
behind creating a Swift, the early birth of it
from Objective-C and so on and Xcode?
Yeah, so these are different questions.
Yeah, I know.
Okay, so I'll stay on the technical side,
then we could talk about the big team pieces,
if that's okay.
So it's to really oversimplify many years of hard work.
Elvium started, joined Apple, became a thing,
became successful and became deployed.
But then there's a question about
how do we actually parse the source code?
So Elvium is that back part,
the optimizer and the code generator.
And Elvium is really good for Apple
as it went through a couple of harder transitions.
I joined right at the time of the Intel transition,
for example, and 64-bit transitions
and then the transition to ARM with the iPhone.
And so Elvium was very useful
for some of these kinds of things.
But at the same time, there's a lot of questions
around developer experience.
And so if you're a programmer pounding out
at the time Objective-C code,
the error message you get, the compile time,
the turnaround cycle, the tooling in the IDE,
we're not great, we're not as good as they could be.
And so, you know, as I occasionally do,
I'm like, well, okay, how hard is it
to try to see compilers, right?
And so I'm not gonna commit to anybody,
I'm not gonna tell anybody,
I'm just gonna just do it on nights and weekends
and start working on it.
And then, you know, I built up and see
there's this thing called the preprocessor
which people don't like,
but it's actually really hard and complicated
and includes a bunch of really weird things
like trigraphs and other stuff like that
that are really nasty and it's the crux
of a bunch of the performance issues in the compiler.
Started working on the parser
and kind of got to the point where I'm like,
oh, you know what, we could actually do this.
Everybody's saying that this is impossible to do,
but it's actually just hard, it's not impossible.
And eventually told my manager about it
and he's like, oh, wow, this is great,
we do need to solve this problem.
Oh, this is great, we can get you one other person
to work with you on this, you know?
And slowly a team is formed and it starts taking off
and C++, for example, a huge complicated language.
People always assume that it's impossible to implement
and it's very nearly impossible,
but it's just really, really hard
and the way to get there is to build it
one piece at a time, incrementally.
And there, that was only possible
because we were lucky to hire
some really exceptional engineers
that knew various parts of it very well
and could do great things.
Swift was kind of a similar thing.
So Swift came from, we were just finishing off
the first version of C++ support in Clang
and C++ is a very formidable and very important language,
but it's also ugly in lots of ways
and you can't influence C++ without thinking
there has to be a better thing, right?
And so I started working on Swift again
with no hope or ambition that would go anywhere,
just let's see what could be done,
let's play around with this thing.
It was me in my spare time,
not telling anybody about it, kind of a thing
and it made some good progress.
I'm like, actually it would make sense to do this.
At the same time, I started talking with the senior VP
of software at the time, a guy named Bertrand Sorlet
and Bertrand was very encouraging.
He was like, well, let's have fun, let's talk about this.
And he was a little bit of a language guy
and so he helped guide some of the early work
and encouraged me and got things off the ground
and eventually told my manager and told other people
and it started making progress.
The complicating thing with Swift
was that the idea of doing a new language
is not obvious to anybody, including myself.
And the tone at the time was that the iPhone
was successful because of Objective-C, right?
Oh, interesting, not despite of or just because of, yeah.
And you have to understand that at the time Apple
was hiring software people that loved Objective-C, right?
And it wasn't that they came despite Objective-C,
they loved Objective-C and that's why they got hired.
And so you had a software team that the leadership
in many cases went all the way back to Next
where Objective-C really became real.
And so they, quote unquote, grew up writing Objective-C
and many of the individual engineers
all were hired because they loved Objective-C.
And so this notion of, okay, let's do new language
was kind of heretical in many ways, right?
Meanwhile, my sense was that the outside community
wasn't really in love with Objective-C.
Some people were and some of the most outspoken people were
but other people were hitting challenges
because it has very sharp corners and it's difficult to learn.
And so one of the challenges of making Swift happen
that was totally non-technical is the social part
of what do we do?
Like if we do a new language, which at Apple
many things happen that don't ship, right?
So if we ship it, what is the metrics of success?
Why would we do this?
Why wouldn't we make Objective-C better?
If Objective-C has problems,
let's file off those rough corners and edges.
And one of the major things that became the reason
to do this was this notion of safety, memory safety.
And the way Objective-C works is that a lot
of the object system and everything else is built
on top of pointers in C.
Objective-C is an extension on top of C.
And so pointers are unsafe.
And if you get rid of the pointers
it's not Objective-C anymore.
And so fundamentally that was an issue
that you could not fix safety or memory safety
without fundamentally changing the language.
And so once we got through that part
of the mental process and the thought process
it became a design process of saying,
okay, well if we're gonna do something new, what is good?
Like, how do we think about this?
And what do we like?
And what are we looking for?
And that was a very different phase of it.
So what are some design choices early on in Swift?
Like we're talking about braces.
Are you making a typed language or not?
All those kinds of things.
Yeah, so some of those were obvious given the context.
So a typed language, for example,
Objective-C is a typed language
and going with an untyped language
wasn't really seriously considered.
We wanted the performance and we wanted refactoring tools
and other things like that to go with typed languages.
Quick dumb question.
Was it obvious, I think this would be a dumb question,
but was it obvious that the language
has to be a compiled language?
Not an...
Yes, that's not a dumb question.
Earlier, I think late 90s,
Apple had seriously considered moving
its development experience to Java.
But Swift started in 2010,
which was several years after the iPhone
or when the iPhone was definitely on an upper trajectory
and the iPhone was still extremely
and is still a bit memory constrained, right?
And so being able to compile the code
and then ship it and then having standalone code
that is not JIT compiled is a very big deal
and it's very much part of the Apple value system.
Now, JavaScript's also a thing, right?
I mean, it's not that this is exclusive
and technologies are good,
depending on how they're applied, right?
But in the design of Swift,
saying like, how can we make objectivity better, right?
Objectivity is statically compiled
and that was the contiguous natural thing to do.
Just skip ahead a little bit and I will write back
just as a question.
As you think about today in 2019,
in your work at Google, TensorFlow and so on
is again, compilations, static compilation.
It's the right thing.
It's still the right thing.
Yeah, so the funny thing
after working on compilers for a really long time
is that, and this is one of the things
that LLVM has helped with,
is that I don't look at compilations being static
or dynamic or interpreted or not.
This is a spectrum, okay?
And one of the cool things about Swift
is that Swift is not just statically compiled.
It's actually dynamically compiled as well
and it can also be interpreted
that nobody's actually done that.
And so what ends up happening
when you use Swift in a workbook,
for example, in Colab or in Jupyter,
is it's actually dynamically compiling the statements
as you execute them.
And so let's get back to the software engineering problems,
right, where if you layer the stack properly,
you can actually completely change
how and when things get compiled
because you have the right abstractions there.
And so the way that a Colab workbook works with Swift
is that when you start typing into it,
it creates a process, a Unix process,
and then each line of code you type in,
it compiles it through the Swift compiler,
the front-end part, and then sends it through the optimizer,
JIT compiles machine code,
and then injects it into that process.
And so as you're typing new stuff,
it's like squirting in new code
and overwriting and replacing and updating code in place.
And the fact that it can do this is not an accident.
Like Swift was designed for this,
but it's an important part of how the language was set up
and how it's layered, and this is a non-obvious piece.
And one of the things with Swift that was,
for me, a very strong design point
is to make it so that you can learn it very quickly.
And so from a language design perspective,
the thing that I always come back to is this UI principle
of progressive disclosure of complexity.
And so in Swift, you can start by saying print,
quote, hello world, quote, right?
And there's no slash n, just like Python,
one line of code, no main, no header files,
no public static class void, blah, blah, blah, string,
like Java has, right?
So one line of code, right?
And you can teach that and it works great.
Then you can say, well, let's introduce variables.
And so you can declare a variable with var,
so var x equals four, what is a variable?
You can use x, x plus one, this is what it means.
Then you can say, well, how about control flow?
Well, this is what an if statement is,
this is what a for statement is,
this is what a while statement is.
And then you can say, let's introduce functions, right?
And many languages like Python have had this kind of notion
of let's introduce small things
and then you can add complexity,
then you can introduce classes,
and then you can add generics in the case of Swift,
and then you can build in modules and build out
in terms of the things that you're expressing,
but this is not very typical for compiled languages.
And so this was a very strong design point.
And one of the reasons that Swift in general is designed
with this factoring of complexity in mind
so that the language can express powerful things.
You can write firmware in Swift if you want to,
but it has a very high level feel,
which is really this perfect blend
because often you have very advanced library writers
that want to be able to use the nitty gritty details,
but then other people just want to use the libraries
and work at a higher abstraction level.
It's kind of cool that I saw
that you can just interpretability.
I don't think I pronounced that word enough,
but you can just drag in Python.
It's just straight, you can import like,
I saw this in the demo, import now,
how do you make that happen?
Yeah, well-
What's up with-
Yeah.
Is that as easy as it looks or is it-
Yeah, it's as easy as it looks.
That's not a stage magic hack or anything like that.
No, no, I don't mean from the user perspective,
I mean from the implementation perspective to make it happen.
So it's easy once all the pieces are in place,
the way it works.
So if you think about a dynamically typed language
like Python, right?
You can think about it in two different ways.
You can say it has no types, right?
Which is what most people would say.
Or you can say it has one type, right?
And you can say it has one type
and it's like the Python object, right?
And the Python object is passed around
and because there's only one type, it's implicit, okay?
And so what happens with Swift
and Python talking to each other,
Swift has lots of types, right?
It has arrays and it has strings
and all like classes and that kind of stuff.
But it now has a Python object type, right?
So there is one Python object type.
And so when you say import NumPy,
what you get is a Python object,
which is the NumPy module.
And then you say np.array and it says, okay,
hey, Python object, I have no idea what you are.
Give me your array member, right?
Okay, cool.
And it just uses dynamic stuff,
talks to the Python interpreter and says,
hey, Python, what's the.array member in that Python object?
It gives you back another Python object.
And now you say parentheses for the call
and the arguments you're gonna pass.
And so then it says, hey, a Python object
that is the result of np.array call with these arguments,
right?
And again, calling into the Python interpreter
to do that work.
And so right now this is all really simple.
And if you dive into the code,
what you'll see is that the Python module in Swift
is something like 1200 lines of code or something.
It's written in pure Swift, it's super simple
and it's built on top of the C interoperability
because it just talks to the Python interpreter.
But making that possible required us
to add two major language features to Swift
to be able to express these dynamic calls
and the dynamic member lookups.
And so what we've done over the last year
is we've proposed, implement, standardized,
and contributed new language features to Swift language
in order to make it so it is really trivial, right?
And this is one of the things about Swift
that is critical to the Swift for TensorFlow work,
which is that we can actually add new language features
and the bar for adding those is high,
but it's what makes it possible.
So you're now at Google doing incredible work
on several things, including TensorFlow.
So TensorFlow 2.0 or whatever leading up to 2.0
has by default in 2.0 has eager execution.
And yet, in order to make code optimized for GPU or TPU
or some of these systems, computation
needs to be converted to a graph.
So what's that process like?
What are the challenges there?
Yeah, so I am tangentially involved in this.
But the way that it works with autograph
is that you mark your function with a decorator.
And when Python calls it, that decorator is invoked.
And then it says, before I call this function,
you can transform it.
And so the way autograph works is, as far as I understand,
is it actually uses the Python parser
to go parse that, turn it into a syntax tree,
and now apply compiler techniques to, again, transform this
down into TensorFlow graphs.
And so you can think of it as saying, hey,
I have an if statement.
I'm going to create an if node in the graph,
like you say, tf.cond.
You have a multiply.
Well, I'll turn that into a multiply node in the graph.
And it becomes this tree transformation.
So where does the Swift for TensorFlow come in?
Which is, for one, Swift is an interface,
like Python is an interface to TensorFlow.
But it seems like there's a lot more going on
than just a different language interface.
There's optimization methodology.
So the TensorFlow world has a couple of different what
I'd call front-end technologies.
And so Swift and Python and Go and Rust and Julia
and all these things share the TensorFlow graphs and all
the runtime and everything that's later.
And so Swift for TensorFlow is merely another front-end
for TensorFlow, just like any of these other systems are.
There's a major difference between, I would say,
three camps of technologies here.
There's Python, which is a special case,
because the vast majority of the community efforts
go into the Python interface.
And Python has its own approaches
for automatic differentiation, has its own APIs,
and all this kind of stuff.
There's Swift, which I'll talk about in a second.
And then there's kind of everything else.
And so the everything else are effectively language bindings.
So they call into the TensorFlow runtime.
But they usually don't have automatic differentiation,
or they usually don't provide anything other than APIs
that call the C APIs in TensorFlow.
And so they're kind of wrappers for that.
Swift is really kind of special.
And it's a very different approach.
Swift for TensorFlow, that is, is a very different approach.
Because there we're saying, let's
look at all the problems that need
to be solved in the full stack of the TensorFlow compilation
process, if you think about it that way.
Because TensorFlow is fundamentally a compiler.
It takes models, and then it makes them go fast on hardware.
That's what a compiler does.
And it has a front-end, it has an optimizer,
and it has many back-ends.
And so if you think about it the right way,
if you look at it in a particular way, it is a compiler.
And so Swift is merely another front-end.
But it's saying, and the design principle
is saying, let's look at all the problems
that we face as machine learning practitioners
and what is the best possible way we can do that,
given the fact that we can change literally anything
in this entire stack.
And Python, for example, where the vast majority
of the engineering and effort has gone into,
is constrained by being the best possible thing you
can do with the Python library.
There are no Python language features
that are added because of machine learning
that I'm aware of.
They added a matrix multiplication operator
with that, but that's as close as you get.
And so with Swift, it's hard, but you
can add language features to the language.
And there's a community process for that.
And so we look at these things and say, well,
what is the right division of labor
between the human programmer and the compiler?
And Swift has a number of things that shift that balance.
So because it has a type system, for example,
it makes certain things possible for analysis of the code.
And the compiler can automatically
build graphs for you without you thinking about them.
That's a big deal for a programmer.
You just get free performance.
You get clustering and fusion and optimization,
things like that, without you as a programmer
having to manually do it because the compiler can do it for you.
Automact differentiation is another big deal.
And I think one of the key contributions of the Swift
for TensorFlow project is that there's
this entire body of work on automatic differentiation that
dates back to the Fortran days.
People doing a tremendous amount of numerical computing
in Fortran used to write these what they call source
to source translators, where you take a bunch of code,
shove it into a mini compiler, and it would push out
more Fortran code.
But it would generate the backwards passes
for your functions for you, the derivatives.
And so in that work in the 70s, a tremendous number
of optimizations, a tremendous number of techniques
for fixing numerical instability and other kinds of problems
were developed.
But they're very difficult to port into a world
where in eager execution, you get an op by op at a time.
Like, you need to be able to look at an entire function
and be able to reason about what's going on.
And so when you have a language integrated
automatic differentiation, which is one of the things
that the Swift project is focusing on,
you can open all these techniques
and reuse them in familiar ways.
But the language integration piece
has a bunch of design room in it, and it's also complicated.
The other piece of the puzzle here that's kind of interesting
is TPUs at Google.
Yes.
So we're in a new world with deep learning.
It's constantly changing.
And I imagine, without disclosing anything,
I imagine you're still innovating on the TPU front, too.
Indeed.
So how much interplay is there between software and hardware
in trying to figure out how to get or move
towards an optimized solution?
There's an incredible amount.
So we're on our third generation of TPUs, which are now
100 petaflops in a very large liquid-cooled box,
and virtual box with no cover.
And as you might imagine, we're not out of ideas yet.
The great thing about TPUs is that they're
a perfect example of hardware software co-design.
And so it's about saying, what hardware
do we build to solve certain classes of machine learning
problems?
Well, the algorithms are changing.
The hardware takes, in some cases, years to produce.
And so you have to make bets and decide
what is going to happen.
And so what is the best way to spend the transistors
to get the maximum performance per watt, or area per cost,
or whatever it is that you're optimizing for?
And so one of the amazing things about TPUs
is this numeric format called bfloat16.
bfloat16 is a compressed 16-bit floating point format,
but it puts the bits in different places.
In numeric terms, it has a smaller mantissa
and a larger exponent.
That means that it's less precise,
but it can represent larger ranges of values,
which in the machine learning context
is really important and useful because sometimes you
have very small gradients you want to accumulate
and very, very small numbers that
are important to move things as you're learning.
But sometimes you have very large magnitude numbers
as well.
And bfloat16 is not as precise.
The mantissa is small.
But it turns out the machine learning algorithms actually
want to generalize.
And so there's theories that this actually
increases the ability for the network
to generalize across data sets.
And regardless of whether it's good or bad,
it's much cheaper at the hardware level
to implement because the area and time of a multiplier
is n squared in the number of bits in the mantissa,
but it's linear with size of the exponent.
And you're connected to both efforts here,
both on the hardware and the software side.
And so that was a breakthrough coming from the research side.
And people working on optimizing network transport of weights
across a network originally and trying
to find ways to compress that.
But then it got burned into silicon.
And it's a key part of what makes
TPU performance so amazing and great.
Now TPUs have many different aspects that are important.
But the co-design between the low-level compiler bits
and the software bits and the algorithms
is all super important.
And this is an amazing trifecta that only Google can do.
Yeah, that's super exciting.
So can you tell me about MLIR project,
previously the secretive one?
Yeah, so MLIR is a project that we announced
at a compiler conference three weeks ago or something
at the Compilers for Machine Learning conference.
Basically, again, if you look at TensorFlow as a compiler stack,
it has a number of compiler algorithms within it.
It also has a number of compilers
that get embedded into it.
And they're made by different vendors.
For example, Google has XLA, which
is a great compiler system.
NVIDIA has TensorRT.
Intel has NGRAPH.
There's a number of these different compiler systems.
And they're very hardware specific.
And they're trying to solve different parts of the problems.
But they're all kind of similar in a sense
of they want to integrate with TensorFlow.
Now TensorFlow has an optimizer.
And it has these different code generation technologies built in.
The idea of MLIR is to build a common infrastructure
to support all these different subsystems.
And initially, it's to be able to make it so that they all
plug in together.
And they can share a lot more code and can be reusable.
But over time, we hope that the industry will start collaborating
and sharing code.
And instead of reinventing the same things over and over again,
that we can actually foster some of that working together
to solve common problem energy that has been useful
in the compiler field before.
Beyond that, MLIR is some people have joked that it's kind of LLVM
too.
It learns a lot about what LLVM has been good
and what LLVM has done wrong.
And it's a chance to fix that.
And also, there are challenges in the LLVM ecosystem
as well, where LLVM is very good at the thing it was designed to do.
But 20 years later, the world has changed.
And people are trying to solve higher level problems.
And we need some new technology.
And what's the future of open source in this context?
Very soon.
So it is not yet open source.
But it will be hopefully the next couple of months.
So you still believe in the value of open source
in these kinds of companies?
Oh, yeah, absolutely.
And I think that the TensorFlow community at large
fully believes in open source.
So I mean, there is a difference between Apple, where
you were previously, and Google now, in spirit and culture.
And I would say the open sourcing of TensorFlow
was a seminal moment in the history of software.
Because here's this large company releasing
a very large code base that's open sourcing.
What are your thoughts on that?
Happy or not, were you just see that kind
of degree of open sourcing?
So between the two, I prefer the Google approach,
if that's what you're saying.
The Apple approach makes sense given the historical context
that Apple came from.
But that's been 35 years ago.
And I think that Apple is definitely adapting.
And the way I look at it is that there's
different kinds of concerns in the space.
It is very rational for business to care about making money.
That fundamentally is what a business is about.
But I think it's also incredibly realistic to say,
it's not your string library that's
the thing that's going to make you money.
It's going to be the amazing UI product
differentiating features and other things like that
that you build on top of your string library.
And so keeping your string library proprietary and secret
and things like that is maybe not the important thing
anymore, where before platforms were different.
And even 15 years ago, things were a little bit different.
But the world is changing.
So Google strikes a very good balance, I think.
And I think the TensorFlow being open source really
changed the entire machine learning field
and it caused a revolution in its own right.
And so I think it's amazingly forward-looking
because I could have imagined, and I
wasn't at Google at the time, but I
could imagine a different context in a different world
where a company says, machine learning is
critical to what we're doing.
We're not going to give it to other people.
And so that decision is a profoundly brilliant insight
that I think has really led to the world being better
and better for Google as well.
And has all kinds of ripple effects.
I think it is really, I mean, you
can't understate Google deciding how profound that
is for software.
It's awesome.
Well, and again, I can understand the concern
about if we release our machine learning software,
our competitors could go faster.
But on the other hand, I think that open-sourcing TensorFlow
has been fantastic for Google.
And I'm sure that decision was very non-obvious at the time,
but I think it's worked out very well.
So let's try this real quick.
You were at Tesla for five months
as the VP of Autopilot Software.
You led the team during the transition
from H hardware one to hardware two.
I have a couple questions.
So one, first of all, to me, that's
one of the bravest engineering decisions undertaking really
ever in the automotive industry to me, software-wise,
starting from scratch.
It's a really brave engineering decision.
So my one question there is, what was that like?
What was the challenge of that?
Do you mean the career decision of jumping
from a comfortable, good job into the unknown, or?
That combined, so at the individual level,
you making that decision.
And then when you show up, it's a really hard engineering
problem.
So you could just stay, maybe slow down, say hardware one,
or those kinds of decisions.
Just taking it full on, let's do this from scratch.
What was that like?
Well, so I mean, I don't think Tesla
has a culture of taking things slow and seeing how it goes.
And one of the things that attracted me about Tesla
is it's very much a gung-ho, let's change the world,
let's figure it out kind of a place.
And so I have a huge amount of respect for that.
Tesla has done very smart things with hardware one
in particular.
And the hardware one design was originally
designed to be very simple automation features in the car
for traffic aware cruise control and things like that.
And the fact that they were able to effectively feature creep
into lane holding and a very useful driver assistance
feature is pretty astounding, particularly given
the details of the hardware.
Hardware two built on that in a lot of ways.
And the challenge there was that they were transitioning
from a third party provided vision stack
to an in-house built vision stack.
And so for the first step, which I mostly helped with,
was getting onto that new vision stack.
And that was very challenging.
And it was time critical for various reasons.
And it was a big leap.
But it was fortunate that it built
on a lot of the knowledge and expertise in the team
that had built hardware one's driver assistance features.
So you spoke in a collected and kind way
about your time in Tesla.
It was ultimately not a good fit Elon Musk.
We've talked on this podcast, several guests of the course.
Elon Musk continues to do some of the most bold
and innovative engineering work in the world
at times at the cost to some of the members
of the Tesla team.
What did you learn about this working
in this chaotic world with Elon?
Yeah, so I guess I would say that when I was at Tesla,
I experienced and saw the highest degree of turnover
I'd ever seen in a company, which was a bit of a shock.
But one of the things I learned and I came to respect
is that Elon's able to attract amazing talent
because he has a very clear vision of the future
and he can get people to buy into it
because they want that future to happen.
And the power of vision is something
that I have a tremendous amount of respect for.
And I think that Elon is fairly singular in the world
in terms of the things he's able to get people to believe in.
And it's a very, there are many people
that stand in the street corner and say,
ah, we're gonna go to Mars, right?
But then there are a few people that can get others
to buy into it and believe in and build the path
and make it happen.
And so I respect that.
I don't respect all of his methods,
but I have a huge amount of respect for that.
You've mentioned in a few places,
including in this context, working hard.
What does it mean to work hard?
And when you look back at your life,
what were some of the most brutal periods
of having to really sort of put everything you have
into something?
Yeah, good question.
So working hard can be defined a lot of different ways,
so a lot of hours.
And so that is true.
The thing to me that's the hardest
is both being short-term focused on delivering
and executing and making a thing happen,
while also thinking about the longer term
and trying to balance that, right?
Because if you are myopically focused
on solving a task and getting that done,
and only think about that incremental next step,
you will miss the next big hill you should jump over to,
right?
And so I've been really fortunate
that I've been able to kind of oscillate between the two.
And historically at Apple, for example,
that was made possible because I was able to work
with some really amazing people
and build up teams and leadership structures
and allow them to grow in their careers
and take on responsibility,
thereby freeing up me to be a little bit crazy
and thinking about the next thing.
And so it's a lot of that,
but it's also about with experience,
you make connections that other people
don't necessarily make.
And so I think that's a big part as well.
But the bedrock is just a lot of hours
and that's okay with me.
There's different theories on work-life balance
and my theory for myself,
which I do not project onto the team,
but my theory for myself is that,
I want to love what I'm doing and work really hard.
And my purpose, I feel like in my goal
is to change the world and make it a better place.
And that's what I'm really motivated to do.
So last question, LLVM logo is a dragon.
Yeah.
You explain that this is because dragons
have connotations of power, speed, intelligence.
It can also be sleek, elegant, and modular,
though you remove the modular part.
What is your favorite dragon-related character
from fiction, video, or movies?
So those are all very kind ways of explaining it.
Do you wanna know the real reason it's a dragon?
Well, yeah.
Is that better?
So there's a seminal book on compiler design
called The Dragon Book.
And so this is a really old now book on compilers.
And so the dragon logo for LLVM came about
because at Apple, we kept talking about
LLVM-related technologies and there's no logo
to put on a slide.
It's sort of like, what do we do?
And somebody's like, well, what kind of logo
should a compiler technology have?
And I'm like, I don't know.
I mean, the dragon is the best thing that we've got.
And Apple somehow magically came up with the logo.
And it was a great thing,
and the whole community rallied around it.
And then it got better as other graphic designers
got involved.
But that's originally where it came from.
The story.
Is there dragons from fiction that you connect with
that Game of Thrones, Lord of the Rings,
that kind of thing?
Lord of the Rings is great.
I also like role-playing games and things like
computer role-playing games,
and so dragons often show up in there.
But it really comes back to the book.
Oh no, we need a thing.
Yeah.
And hilariously, one of the funny things about LLVM
is that my wife, who's amazing, runs the LLVM foundation,
and she goes to Grace Hopper
and is trying to get more women involved in the,
she's also a compiler engineer,
so she's trying to get other women
to get interested in compilers and things like this.
And so she hands out the stickers,
and people like the LLVM sticker
because of Game of Thrones.
And so sometimes culture has this helpful effect, too.
Get the next generation of compiler engineers
engaged with the cause.
Okay, awesome.
Chris, thanks so much for talking to me.
It's been great talking with you.