Travis Oliphant: NumPy, SciPy, Anaconda, Python & Scientific Programming

Lex Fridman Podcast

Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond. Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond.

Transcribed podcasts: 442
Time transcribed: 44d 12h 13m 31s

results.

Mention graph

This graph shows how many times the word ______ has been mentioned throughout the history of the program.

The following is a conversation with Travis Oliphant, one of the most impactful programmers
and data scientists ever.
He created NumPy, SciPy, and Anaconda.
NumPy formed the foundation of tensor-based machine learning in Python.
SciPy formed the foundation of scientific programming in Python.
And Anaconda, specifically with Kanda, made Python more accessible to a much larger audience.
Travis's life work across a large number of programming and entrepreneurial efforts
has and will continue to have immeasurable impact on millions of lives by empowering scientists
and engineers in big companies, small companies, and open-source communities to take on difficult
problems and solve them with the power of programming.
Plus, he's a truly kind human being, which is something that when combined with vision
and ambition makes for a great leader and a great person to chat with.
To support this podcast, please check out our sponsors in the description.
This is the Lex Friedman podcast, and here is my conversation with Travis Oliphant.
What was the first computer program you've ever written?
Do you remember?
Whoa, that's a good question.
I think it was in fourth grade.
Just a simple loop in basic.
Basic.
Yeah, an Atari 400, I think, or maybe an Atari 800.
It was a part of a class, and we just were just basic loops to print things out.
Did you use go-to statements?
Yes.
Yes, we used go-to statements.
I remember in the early days, that's when I first realized there's principles to programming
when I was told that don't use go-to statements, those are bad software engineering.
It goes against what great, beautiful code is, and I was like, oh, okay, there's rules
to this game.
I didn't see that until high school when I took an AP computer science course.
I did a lot of other kinds of just programming and TI, but finally, when I took an AP computer
science course in Pascal, that was Pascal.
That's when I, oh, there are these principles.
Not C or C++?
No, I didn't take C until the next year in college, I had a course in C, but I haven't
done much in Pascal, just that AP computer science course.
Now, sorry for the romanticized question, but when did you first fall in love with programming?
Oh, man, good question.
I think actually when I was 10, my dad got us a TI Timex in Claire, and he was excited
about the spreadsheet capability, but I made him get the basic, the add-ons so we could
actually program in basic, and just being able to write instructions and have the computer
do something.
Then we got a TI-994A when I was about 12, and I would just, it had sprites and graphics
and music you could actually program to do music.
That's when I really fell in love with programming.
This is a full, a real computer with memory and storage and processor, so we're not going
to say TI.
Yeah, the Timex in Claire was one of the very first, it was cheap, cheap.
I think it was, well, it was still expensive, but it was 2K of memory.
We got the 16K add-on pack, but yeah, it had memory and you could program it.
You had the, in order to store your programs, you had to attach a tape drive.
Remember that old, the sound that would play when you converted the modem that would convert
digital bits to audio files, so it was tape drive.
Still remember that sound, but that was the storage.
And what was the programming language, do you remember?
It was basic.
It was basic.
And then they had a VisiCalc, and so a little bit of spreadsheet program in VisiCalc, but
mostly just some basic.
Do you remember what kind of things drew you to programming?
Was it working with data?
Was it video games and video games?
Math.
Mathy stuff.
Yeah, I've always loved math, and a lot of people think they don't like math because
I think when they're exposed to early, it's about memory.
When you're exposed to math early, you have a good short-term memory, you remember timetables.
And I do have a reasonably, I mean, not perfect, but a reasonably long little short-term memory
buffer.
And so I did great at timetables, I said, oh, I get a math, but I started to really like
math just the problem-solving aspect.
And so computing was problem-solving applied.
And so that's always kind of been the draw, kind of coupled with the mathematics.
Did you ever see the computer as like an extension of your mind, like something able
to achieve?
Not till later.
Okay.
Yeah, not then.
Like a little set of puzzles that you can play with, and you can play with math puzzles.
Yeah, it was too rudimentary early on, like it was sort of, yeah, it was a lot of work
to actually take a thought you'd have and actually get it implemented.
And that's still work, but it's getting easier.
And so yeah, I would say that's definitely what's attracting me to Python is that that
was more real, right?
I could think in Python.
Speaking of foreign language, I only speak another language fluently besides English,
which is Spanish.
I remember the day when I would dream in Spanish, and you start to think in that language.
And then you actually, I do definitely believe that language limits or expands your thinking.
There's some languages that actually lead you to certain thought processes.
Yeah, like, so I speak Russian fluently, and that's certainly a language that leads
you down certain thought processes.
Is that right?
Well, yeah.
There's a history of the two world wars, of the millions of people starving to death
or near to death throughout his history, of suffering, of injustice, like this promise
sold to the people, and then the carpet or whatever swept from under them, it's like
broken promises, and all of that pain and melancholy is in the language, the sad songs,
the sad hopeful songs, the over romanticized, like, I love you, I hate you, the sort of
the swings between all the various spectrums of emotion.
So that's all within the language, the way it's twisted, poetry, there's a strong culture
of rhyming poetry, so like the Bards, there's a musicality to the language too.
Does Dostoevsky write in Russian?
Yeah.
Like, Dostoevsky, Tostoevsky, all the, all the...
The ones that I know about, which are translated, and I'm curious how the translations...
So Dostoevsky did not use the musicality of the language too much, so he actually translates
pretty well because it's so philosophically dense that the story does a lot of the work,
but there's a bunch of things that are untranslatable.
Certainly the poetry is not translatable.
I actually have a few conversations coming up offline and also in this podcast with people
who've translated Dostoevsky, and that's for people who worked in this field know how difficult
that is.
Sometimes you can spend, you know, months thinking about a single sentence in context,
like, because there's just a magic captured by that sentence, and how do you translate
just in the right way?
Because those words can be really powerful.
There's a famous line, beauty will save the world from Dostoevsky.
You know, there's so many ways to translate that.
And you're right.
The language gives you the tools with which to tell the story, but it also leads your
mind down certain trajectories and paths to where over time, as you think in that language,
you become a different human being.
Yes.
Yeah.
Yeah, that's a fascinating reality, I think.
I know people have explored that, but it's, I guess, rediscovered.
Well, we don't...
So, given our own little pockets, like, this is the sad thing, is I feel like, unfortunately,
given time and getting older, I'll never know the China, the Chinese world, because I don't
truly know the language.
Same with Japanese.
I don't truly know Japanese and Portuguese and Brazil, that whole South American continent.
Like, yeah, I'll go to Brazil and Argentina, but will I truly understand the people?
If I don't understand the language, it's sad because I wonder how many geniuses we're
missing because so much of the scientific world, so much of the technical world is in
English, and so much of it might be lost because they just, we don't have the common language.
I completely agree.
I'm very much in that vein of, there's a lot of genius out there that we miss, and it's
sort of, we're sort of fortunate when it bubbles up into something that we can understand
or process.
There's a lot we miss.
So, I tend to lean towards really loving democratization or things that empower people,
or, you know, very resistant to sort of authoritarian structures.
Fundamentally, for that reason, well, several reasons, but it just hurts us.
We're soft.
So, speaking of languages that empower you, so, Python was the first language for me
that I could, I really enjoyed thinking in, as you said.
Sounds like you shared my experience too.
So, when did you first, do you remember when you first kind of connected with Python?
Maybe you even fell in love with Python?
It's a good question.
It was a process that took about a year.
I first encountered Python in 1997.
I was a graduate student studying biomedical engineering at the Mayo Clinic, and I had
previously, I'd been involved in taking information from satellites.
I was an electrical engineering student.
Used to taking information and trying to get something out of it, doing some data processing
and getting information out of it.
And I'd done that in MATLAB.
I'd done that in Perl.
I'd done that in, you know, scripting on a VMS.
There's actually a VAX VMS system, and they had their own little scripting tools around
Fortran.
I'd done a lot of that.
And then, as a graduate student, I was looking for something and encountered Python.
And because Python had an array, it had two things that made me not filter it away.
Because I was filtering a bunch of stuff, as Yorick, I looked at Yorick, I looked at
a few other languages throughout there at the time, in 1997.
But it had arrays.
There's a library called Numeric that had just been written in 95, not too much earlier.
By an MIT alum, Jim Hugenin, and I went back and read the mailing list to see the history
of how it grew.
And there was a very interesting, fascinating to do that, actually, to see how this emergent
cooperation, unstructured cooperation, happens in the open source world that led to a lot
of this collective programming, which is something maybe we might get into a little later, but
what that looks like.
What gap did Numeric fill?
Numeric filled the gap of having an array object.
There was no array object.
There was a one-dimensional byte concept, but there was no n-dimensional, two, three, four-dimensional
tensor they call it now.
I'm still in the category that a tensor is another thing, and it's just an n-v array,
we should call it, but kind of lost that battle.
There's many battles in this world, some which we'll win, some we lose.
That's exactly right.
But it had no math to it.
So Numeric had math and a basic way to think in arrays.
So I was looking for that, and it had complex numbers, a lot of programming languages.
You can see it because if you're just a computer scientist, you think, complex numbers just
too float, so people can build that on.
But in practice, a complex number as one of the significant algebras that helps connect
a lot of physical and mathematical ideas, particularly FFT for an electrical engineer.
It's a really important concept, and not having it means you have to develop it several times,
and those times may not share an approach.
One of the common things in programming, one of the things programming enables is abstractions.
But when you have shared abstractions, it's even better.
It sort of gets to the level of language of actually we all think of this the same way,
which is both powerful and dangerous, because powerful in that we now can quickly make bigger
and higher level things on top of those abstractions dangerous because it also limits us as to
the things we maybe left behind in producing an abstraction, which is at the heart of programming
today and actually building around the programming world.
I think it's a fascinating philosophical topic.
Yeah, they will continue for many years, I think, as we build more and more and more
abstractions.
Yes.
I often think about we have a world that's built on these abstractions that were they
the only ones possible, certainly not, but they led to, it's very hard to do it differently.
There's an inertia that's very hard to push out, push away from that has implications
for things like the Julia language, which you have heard of, I'm sure, and I've met
the creators and I like Julia.
It's a really cool language, but they've struggled to kind of against just the tide
of this inertia of people using Python.
There's strategies to approach that, but nonetheless, it's a phenomenon and sometimes, so I love
complex numbers and I love to raise, so I looked at Python.
I had the experience, I did some stuff in Python and I was just doing my PhD, so I was
out, my focus was on, I was actually doing a combination of MRI and ultrasound and looking
at a phenomenon called elastography, which is you push waves into the body and observe
those waves, like you can actually measure them, and then you do mathematical inversion
to see what the elasticity is, and so that's the problem I was solving is how to do that
with both ultrasound and MRI.
I needed some tool to do that with, so I was starting to use Python in 97, in 98, I went
back, looked at what I'd written and realized I could still understand it, which is not
the experience I'd had when doing Perl in 95, I'd done the same thing and then I looked
back and I'd forgotten what I was even saying.
Now I'm not saying it, so that made me, hey, this may work, I like this, this is something
I can retain without becoming an expert per se, and so that led me to go, I'm gonna push
more into this, and then that 98 was kind of the, when I started to fall in love with
Python, I would say.
A few peculiar things about Python, so maybe compared to Perl, compared to some of the
other languages, so there's no braces.
So space is used, indentation, I should say, is used as part of my language.
So did you, I mean, that's quite a leap, were you comfortable with that leap, or were you
just very open-minded?
It's a good question, I was open-minded, so it was cognizant of the concern, and it
definitely has specific challenges, you know, cut and pasting, for example, your cut and
pasting code, and if your editors aren't supportive of that, if you're putting it into a terminal,
and particularly in the past when terminals didn't necessarily have the intelligence to
manage it now.
Now, I Python and Jupyter Notebooks handle it just fine, so there's really no problem,
but in the past, it created some challenges, formatting challenges, also mixed tabs and
spaces, if editors weren't, you weren't clear on what was happening, you would have these
issues.
So there were really concrete reasons about it that I heard and understood, I never really
encountered a problem with it personally, like it was occasional annoyances, but I really
liked the fact that it didn't have all this extra characters, right, that these extra
characters didn't show up in my visual field when I was just trying to process understanding
a snippet of code.
Yeah, there's a cleanness to it, but I mean, the idea is supposed to be that Perl also
has a cleanness to it because of the minimalism of how many characters it takes to express
a certain thing, so it's very compact, but what you realize with that compactness comes...
There's a culture that prizes compactness, and so the code gets more and more compact
and less and less readable to a point where it's like, to be a good programmer in Perl,
you write code that's basically unreadable.
There's a culture like...
Correct, and you're proud of it.
Yeah, you're proud of it.
Right, exactly, and it feels good, and it's really selective.
It means you have to be an expert in Perl to understand it, whereas Python was allowed
you not to have to be an expert, you didn't have to take all this brain energy, you could
leverage what I say, you could leverage your English language center, which you're using
all the time.
I've wondered about other languages, particularly non-Latin-based languages, Latin-based languages
with the characters are at least similar.
I think people have an easier time, but I don't know what it's like to be a Japanese
or a Chinese person trying to learn a different syntax, like what would computer programming
look like in that?
I haven't looked at that at all, but it certainly doesn't...
Using your Chinese language center, I'm not sure Python or any program language does that,
but that was a big deal.
The fact that it was accessible, I could be a scientist.
What I really liked is many programming languages really demand a lot of you, and you can get
a lot, you do a lot if you learn it, but Python enables you to do a lot without demanding
a lot of you.
There's nuance to that statement, but it certainly is more accessible.
More people could actually, as a scientist, as somebody or an engineer who is trying to
solve another problem besides point programming, I could still use this language and get things
done and be happy about it.
I was also comfortable in C at that time.
And MATLAB you did a little bit of that.
And MATLAB I did a lot before that, exactly.
So I was comfortable in...
Those three languages were really the tools I used during my studies and schooling, but
to your point about language helping you think, one of the big things about MATLAB
was it was an APL before it.
I don't know if you remember APL.
Nope.
It was actually the predecessor of array-based programming, which I think is really an under-appreciated...
If I talk to people who are just steeped in computer programming, computer science, like
most of the people that Microsoft has hired in the past, for example, Microsoft as a company
generally did not understand array-based programming, like culturally, they didn't understand it.
So they kept missing the boat, kept missing the understanding of what this was.
They've gotten better, but there's still a whole culture of folks that doesn't...
Programming, that's systems programming, or web programming, or lists and maps, and what
about an end-dimensional array?
Oh, yeah, that's just an implementation detail.
Well, you can think that, but then actually, if you have that as a construct, you actually
think differently.
APL was the first language to understand that, and it was in the 60s.
The challenge of APL was APL had very dense, not only glyphs, like new characters, new
glyphs, but they even had a new keyboard because to produce those glyphs, this is back in the
early days of computing when the Quarity keyboard maybe wasn't as established, like, well, we
can have a new keyboard, no big deal.
But it was a big deal, and it didn't catch on, and the language, APL, very much like
Perl, as people would pride themselves on how much, could they write the game of life
in 30 characters of APL?
APL has characters that mean summation, and they have adverbs, they would have adjectives
and these things called adverbs, which are like methods, like reduction, reduction would
be an adverb on an ad operator.
But using these tools, you could construct, and then you start to think at that level.
You think in end dimensions is something I like to say, and you start to think differently
about data at that point.
Now, it really helps.
Yeah.
Outside of programming, if you really internalize linear algebra as a course, it philosophically
allows you to think of the world differently.
It's almost like liberating.
You don't have to think about the individual numbers in the end dimensional array.
You could think of it as an object in itself, and all of a sudden this world can open up.
You're saying MATLAB and APL were like the early, I don't know if many languages got
that right ever.
No.
No.
They did it.
Even still.
Even still, I would say.
NumPy is an inheritor of the traditions.
I would say APLJ was another version that was, what it did is not have the glyphs, just have
short characters, but still a Latin keyboard could type them, and then numeric inherited
from that in terms of, let's add arrays plus broadcasting, plus methods, reduction.
Even some of the language like rank is a concept that was in Python, is still in Python, for
the number of dimensions.
That's different than, say, the rank of a matrix, which people think of as well.
It came from that tradition, but NumPy is a very pragmatic, practical tool.
NumPy inherited from numeric, and we can get to where NumPy came from, which is the current
array, at least current as of 2015, 2017, now there's a ton of them over the past two or
three years.
We can get into that, too.
If we just linger on the early days of what was your favorite feature of Python?
Do you remember what, it's so interesting to linger on what really makes you connect
with a language.
I'm not sure it's obvious to introspect that.
No, it isn't.
I've thought about that at some length.
I think definitely the fact that I could read it later, that I could use it productively
without becoming an expert.
Other languages I had to put more effort into.
That's like an empirical observation.
You're not analyzing any one aspect of the language.
It just seems time after time, when you look back, it's somehow readable.
It's somehow readable.
I could take executable English and translate it to Python more easily.
There was no translation layer.
As an engineer or as a scientist, I could think about what I wanted to do, and then
the syntax wasn't that far behind it.
Now, there are some warts there still.
It wasn't perfect.
There are some areas where I'm like, it would be better if this were different or if this
were different.
Some of those things got out of the language, too.
I was really grateful for some of the early pioneers in the Python ecosystem back because
Python got written in 1991, was when the first version came out.
Guido was very open to users.
One of the sets of users were people like Jim Huguenin and David Asher and Paul Dubois
and Conrad Hinson.
These were people that were on the main list, and they were just asking for things like,
hey, we really should have complex numbers in this language.
There's a J, there's a 1J, and the fact that they went the engineering route of J is interesting.
I don't think that's entirely favorite engineers.
I think it's because I is so often used as the index of a for loop.
I think that's actually why.
Probably.
I mean, there's a pragmatic aspect.
In fact, the complex numbers were there.
I love that.
The fact that I could write in the array constructs and that reduction was there, very simple
to write summations and broadcasting was there.
I could do addition of whole arrays.
So that was cool.
Those are some things I loved about it.
I don't know what to start talking to you about because you've created so many incredible
projects that basically change the whole landscape of programming, but okay.
Let's start with, let's go chronologically with SciPy.
You created SciPy over two decades ago now?
Yes.
I love to talk about SciPy.
SciPy was really my baby.
What is it?
What was its goal?
What is its goal?
How does it work?
Yeah, fantastic.
So SciPy was effectively, here I am using Python to do stuff that I previously used Matlab
to use, and I was using Numeric, which is an array library that made a lot of it possible,
but there's things that were missing.
Like I didn't have an ordinary differential equation solver I could just call.
I didn't have integration, I wanted to integrate this function, okay, well, I don't have just
a function I can call to do that.
These are things I remember being critical things that I was missing, optimization.
I just want to pass a function to an optimizer and have it tell me what the optimal value
is.
Those are things like, well, why don't we just write a library that adds these tools,
and I started to post on the main list, and there had previously been people have discussed,
I remember Conrad Hinson saying, wouldn't it be great if we had this optimizer library
or David Ash would say this stuff, and I'm ambitious the wrong word, an eager and probably
more time than sense.
I was a poor graduate student.
My wife thinks I'm working on my PhD, and I am, but part of the PhD that I loved was
the fact that it's exploratory.
You're not just taking orders fulfilling a list of things to do, you're trying to figure
out what to do.
I thought, well, I'm writing tools for my own use in a PhD, so I'll just start this
project.
In 1999, 1998 was when I first started to write libraries for Python.
In fact, when I fell in love with Python 98, I thought, well, there's just a few things
missing.
Like, oh, I need a reader to read Dicom files.
I was in medical imaging, and Dicom was a format that I want to be able to reload that
into Python.
Okay, how do I write a reader for that?
I wrote something called, it was an IO package, and that was my very first extension module,
which is C.
I wrote C code to extend Python so that in Python, I could write things more easily.
That combination kind of hooked me.
It was the idea that I could, here's this powerful tool I can use as a scripting language
and a high-level language to think about, but that I can extend easily.
Easily in C, easily for me because I knew enough C, and then Guido had written a link.
I mean, the hard part of extending Python was something called the way memory management
networks, and you have to reference counting.
And so there's a tracking of reference counting you have to do manually.
And if you don't, you have memory leaks.
So that's hard.
Plus then C, you know, it's just much more, you have to put more effort into it.
It's not just, I have to now think about pointers and I have to think about stuff that is different.
I have to kind of, you're like putting a new cartridge in your brain.
Like, okay, I'm thinking about MRI, now I'm thinking about programming, and they're distinct
modules you end up having to think about.
So it's harder.
And when I was just in Python, I could just think about MRI and high-level writing.
But I could do that, and that kind of, I liked it.
I found that to be enjoyable and fun.
And so I ended up, oh, let me just add a bunch of stuff to Python to do integration.
Well, and the cool thing is, is that, you know, the power of the internet, I just looking
around and I found, oh, there's this NetLib, which has hundreds of 4-channel routines that
people have written in the 60s and the 70s and the 80s.
And 4-channel 77, fortunately, it wasn't 4-channel 60s, it had been ported to 4-channel 77.
And 4-channel 77 is actually a really great language.
4-channel 90 probably is my favorite 4-channel.
Because it's got complex numbers, it's got arrays, and it's pretty high-level.
Now, the problem with it is you'd never want to write a program in 4-channel 90 or 4-channel
77.
But it's totally fine to write a subroutine in, right?
And so, and then 4-channel kind of got a little off course when they tried to compete
with C++.
But at the time, I just want libraries that do something, like, oh, here's an ordinary
infrastructure equation.
Here's integration.
Here's RungeCut integration.
Already done.
I don't have to think about that algorithm.
You could, but it's nice to have somebody who's already done one and tested it.
And so, I sort of started this journey in 98, really, if you look back at the main list,
there's sort of this productive era of me writing an extension module to connect RungeCut
integration to Python and making an ordinary differential equation solver.
And then releasing that as a package, so we could call ODE pack, I think I called it,
then quad pack.
And then I just made these packages.
Eventually that became multi-pack, because they were originally modular, you can install
them separately.
But a massive problem in Python was actually just getting your stuff installed.
At the time, releasing software for me, like, today, it's, people think, what does that
mean?
Well, then it meant some poorly written web page, I had some bad web page up and I put
a tar ball, just a GZip tar ball of source code.
That was the release.
But okay, can we just stand that, because the community aspect of creating the package
and sharing that, that's rare.
To have to both have the, at that time, so like, the raw,
Yeah, it was pretty early, yeah.
So, oh well, not rare.
Maybe you can correct me on this, but it seems like in the scientific community, so many
people, you were basically solving the problems you needed to solve to process the particular
application, the data that you need.
And to also have the mind that I'm going to make this usable for others, that's.
I would say I was inspired.
I'd been inspired by Linux, been inspired by, you know, Linus and him making his code
available.
And I was starting to use Linux at the time.
And I went, this is cool.
So, I'd kind of been previously primed that way.
And generally, I was into science, because I liked the sharing notion.
I liked the idea of, hey, let's, if collectively we build knowledge and share it, we can all
be better off.
Okay.
So, you want to energize by that idea?
So, it's energized by that idea already.
Yeah.
Right?
And I can't deny that I was.
I'm sort of, had this very, I liked that part of science, that part of sharing.
And then all of a sudden, oh wait, here's something, and here's something I could do.
And then I slowly over years learned how to share better so that you could actually engage
more people faster.
One of the key things was actually giving people a binary they could install, right?
So that wasn't just here's source code, good luck.
Compile this and then.
Compile, ready to install, you just, you know, so, in fact, a lot of the journey from 98
even through 2012, when I started Anaconda was about that.
Like it's why, you know, it's really the key as to why a scientist with dreams of doing
MRI research ended up starting a software company that installs software.
I work with a few folks now that don't program like on the creative side, the video side,
the audio side, and because my whole life is running on scripts, I have to try to get
them, I have now the task of teaching them how to do Python enough to run the scripts.
And so I've been actually facing this, whether it's on the condor, some, with the task of
how do I minimally explain, basically to my mom, how to write a Python script.
And it's an interesting challenge after it's a to-do item for me to figure out, like, what
is the minimal amount of information I have to teach?
What are the tools you use?
Like, one, you enjoy it to your effect of it.
And they're related.
Those are two related questions.
And then the debugging, like the iterative process of running the script to figure out
what the error is, maybe even for some people to do the fix yourself.
So do you compile it?
Do you, like, how do you distribute that code to them?
And it's interesting because I think it's exactly what you're talking about.
If you increase the circle of empathy, the circle of people that are able to use your
programs, you increase its, like, effectiveness and its power.
And so you have to think, you know, can I write scripts?
Can I write programs that can be used by medical engineers, by all kinds of people that don't
know programming, and actually maybe plant a seed, have them catch the bug of programming
so that they start on their journey.
And that's a huge responsibility.
And ultimately, it has to do with the Amazon one-click buy, like, how frictionless can
you make the early steps?
Frictionless is actually really key.
To go in any community is every, any friction point, you're just going to lose, you're going
to lose some people.
Yeah.
Right?
Now, sometimes you may want to intentionally do that.
If you're early enough on, you need, you know, a lot of help.
We need people who have the skills.
You might actually, it's helpful.
You don't necessarily have too much, too many users as opposed to contributors if you're
early on.
Anyway, there's, sci-fi started in 98, but it really emerged as this collection of modules
that I was just putting on the net.
People were downloading.
And, you know, I think I got a hundred users, right, by the end of that year.
But the fact that I got a hundred users and more than that, people started to email me
with fixes, like, and that was actually intoxicating, right?
That was the, that was the, you know, here I'm writing papers and I'm giving conferences
and I get, people would say hello, but yeah, good job.
But mostly it was, you're reviewed with, it's competitive, right?
You publish a paper and people were like, oh, it wasn't my paper, you know.
I was starting to see that sense of academic life where it was so much, I thought it was
a cooperative effort, but it sounds like we're here just to one up each other.
And you know, it's not, that's not true across the board, but a lot of that's there.
But here in this world, I was getting responses from people all over the world.
You know, I remember Piero Peterson and Estonia, right?
It was one of the first people and he sent me back this make file because, you know,
the first thing it is, yeah, you're building stinks and here's a better make file.
Now, it was a complex make file, I don't think I never understood that make file actually,
but it worked and it did a lot more and so I was like, thanks, this is cool.
And that was my first kind of engagement with community development.
But you know, the process was he sent me a patch file, I had to upload a new tar ball
and I just found, I really love that and the style back then was, here's a main list is
very, it wasn't as, these certainly weren't the tools that are available today, it was
very early on, but I really started to, that's the whole year, I think I did about seven
packages that year, right?
And then by the end of the year, I collected them into a thing called multi-pack.
So 99, there was this thing called multi-pack and that's when a high school student, always
a high school student at the time, getting Robert Kern, took that package and made a
windows installer, right?
And then of course, a massive increase of usage.
So by the way, most of this development was under Linux.
Yes.
Yes, it was on Linux.
I was a Linux developer doing it on a new box.
I mean, at the time I was actually getting into, I had a new hard drive, there's some
kernel programming to make the hard drive work, I mean, not programming, but modification
to the kernel so I could actually hard drive working.
I love that aspect of it.
I was also in, at school, I was building a cluster, I took Mac computers and you put
Yellow Dog Linux on them.
At the Mayo Clinic, all these Macs that were older, they were just getting rid of and so
I kind of got permission to go grab them together.
I put about 24 of them together in a cluster, in a cabinet and put Yellow Dog Linux on them
all and I wrote a C++ program to do MRI simulation.
That was what I was doing at the same time for my day job, so to speak.
So I was loving the whole process and the same time I was, oh, I need an ordinary differential
equation.
That's why ordinary differential equations were key was because that's the heart of a block
equation for similarly an MRI is an ODE solver and so that's, I actually did that, it doesn't
happen at the same time.
That's why, kind of what you're working on and what you're interested in, they're coinciding.
I was definitely scratching my own itch in terms of building stuff and which helped in
the sense that I was using it for me, so at least I had one user, I had one person who
was like, well, no, this is better.
I like this interface better and I had the experience of Matlab to guide some of what
those APIs might look like, but you're just doing yourself, you're building all this stuff.
But the Windows installer was the first time I realized, oh, yeah, the binary installer
really helps people.
And so that led to spending more time on that side of things.
So around 2000, so I graduated my PhD in 2000, end of year, end of 2000.
So 99 doing a lot of work there, 98 doing a lot of work there, 99 kind of spending more
time on my PhD, helping people use the tools, thinking about where do I want to go from
here.
There was a company, there was a guy actually, Eric Jones and Travis Vott.
They were two friends who founded a company called Enthought.
It's here in Austin, still here.
And they, Eric contacted me at the time when I was a graduate student still and he said,
hey, why don't you come down?
We want to build a company, we're thinking of a scientific company and we want to take
what you're doing and kind of add it to some stuff that he'd done, he'd written some tools.
And then Piero Peterson had done F2Py, let's come together and build, pull this all together
and call it SciPy.
So that's the origin of the SciPy brand.
It came from, you know, multi-pack and a whole bunch of modules I'd written, plus a few
things from some other folks and then pull together in a single installer.
SciPy was really a distribution of Python masquerading as a library.
How did you think of all SciPy in context of Python, in context of Numeric?
Like what?
So we saw SciPy as a way to make an R&D environment for Python, like use Python, dependent on Numeric.
So Numeric was the array library we depended on, and then from there, extend it with a
bunch of modules that allowed for, and at the time, the original vision of SciPy was
to have plotting, was to have, you know, replant, you know, the replant environment and kind
of a whole, really a whole data environment that you could then install and get going
with.
And that was kind of the thinking, it didn't really evolve that way, right?
It sort of had a, but one, it's really hard to do massive scale projects in a, with open
source collectives, actually, there's a, there's sort of an intrinsic cooperation limit as
to which, you know, too many cooks in the kitchen, you know, you can do amazing infrastructure
work.
When it comes down to bringing it all together into a single deliverable, that actually requires
a little more, a little more product management that is not, that doesn't really emerge from
the same dynamic.
So it struggled, you know, struggled to get almost too many voices, it's hard to have
everybody agree, you know, consensus doesn't really work at that scale, you end up with
politics, you know, with the same kind of things that's happened in large organizations trying
to decide on what to do together.
So consensus building was still, was challenging at scale as more people came in, right?
Early on, it's fine because there's nobody there, and so it works, but then as you get
more successful and more people use it, all of a sudden, oh, there's this, this, this scale
at which this doesn't work anymore and we have to come up with different approaches.
So Sci-Fi came out officially in 2001, was the first release, most of the time, I remember
the days of getting that release ready, it was Windows installer and there was, there
were bugs on how, you know, the Windows compiler handled complex numbers and you were, you're
chasing segmentation faults and it was, it's a lot of work, there's a lot of effort had
nothing to do with my area of study, and at the same time, I had just gotten an offer,
so he wondered if I wanted to come down and help him start that, you know, start that
company with his friend, and at the time, I was like, I was intrigued, but I was squaring
a path, an academic path, and I just got an offer to go and teach at my alma mater.
So I took that tenure track position and Sci-Fi, and kind of, then I started working on Sci-Fi
as a professor too.
Okay.
So that's, I left, I've got the Mayo Clinic graduate, wrote my thesis using Sci-Fi, wrote,
you know, there's, there's images that were created, now the plotting tool I used was
something from Yorick, actually, it was a plotting, a PLT, kind of a plotting language
that I used.
Yorick is a programming language?
It was a programming language, it had a plotting tool, Dislin, it had integration to Dislin,
I ended up using Dislin, plus some, some of the plotting from Yorick, linked to from Python.
Anyway, it was a, people don't plot that way now, but this is before, and Sci-Fi was trying
to add plotting, right?
It didn't have much success, really the success of plotting came from John Hunter, who had
a similar experience to my experience, my kind of Maverick experience as a person just
trying to get stuff done and kind of having more time than, than money maybe, right?
And John Hunter created what?
Mapplotlib.
He's the creator of Mapplotlib.
Yeah.
So John Hunter was a, you know, he wasn't a student at the time, but he was working in
Quant field, and he said, we need better plotting.
So he just went out and said, cool, I'll make a new project, and we'll call it Mapplotlib,
and he released in 2001, about the same time that Sci-Fi came out.
And it was separate library, separate install, use numeric, Sci-Fi, use numeric.
And so Sci-Fi, you know, 2001, we released Sci-Fi, and then Enthoc created a conference
called Sci-Fi, which was brought people together to talk about the space, and that conference
is still ongoing.
It's one of the favorite conferences of a lot of people because it's, you know, it's
changed over the years, but early on it was, you know, a collection of 50 people who care
about scientists mostly, you know, practicing scientists who want to care about coding and
doing it well and not using MATLAB, and I remember being driven by, you know, I like
MATLAB, but I didn't like the fact that, like, so I'm not opposed to proprietary software.
I'm actually not an open source zealot.
I love open source for the, what it brings, but I also see the role for proprietary software.
But what I didn't like was the fact that I would develop code and publish it, and then
effectively telling somebody here to run my code, you have to have this proprietary software.
Right.
And there's also culture on MATLAB as much, because I've talked to a few folks at MathWorks,
Grids, MATLAB.
Yeah.
I mean, there's just a culture, they try really hard, but it's just this corporate IBM style
culture that's like, or whatever.
I don't want to say negative things about IBM or whatever, but there's a, no, it's really
that connection.
It's something I'm in the middle of right now is the business of open source and how
do you connect the ethos of cooperative development with the necessity of creating profits, right?
And like right now today, you know, I'm still in the middle of that.
It's actually the early days of me exploring this question, because as I was writing SciPy,
I mean, as an aside, I also had, so I had three kids at the time.
I have six kids now.
Yeah.
I got married early, wanted a family.
I had three kids, and I remember reading, I read Richard Stallman's post, and I was
a fan of Stallman.
I would read his work.
I liked this collective ideas he would have.
Certainly the ideas on IP law, I read a lot of his stuff.
But then he said, you know, okay, well, how do I make money with this?
How do I make a living?
How do I pay for my kids?
All this stuff was in my mind, young graduate student making no money, thinking I got to
get a job.
And he said, well, you know, I think just be like me and don't have kids, right?
That's just don't, don't.
That's his take on.
That's his take on.
That was what he said in that moment, right?
That's the thing I read and I went, okay, this is a train I can't get on.
There has to be a way to preserve the culture of open source and still be able to make sufficient
money to feed you.
Yes, exactly.
There's got to be.
Well, so that actually led me to a study of economics, because at the time I was ignorant
and it really was, and I'm actually, I'm embarrassed for educational system that they
could let me, and I was valedictorian of my high school class and I did super well in
college and like academically, I did great, right?
But the fact that I could do that and then be clueless about this key part of life, it
led me to go, there's a problem.
Like I should have learned this in fifth grade.
I should have learned this in eighth grade.
Like everybody should come out with a basic knowledge of economics.
Another interesting example, because you've created tools that change the lives of probably
millions of people, and the fact that you don't understand at the time of the creation
of those tools, the basics economics of how to build up a giant system is a problem.
Yeah, it's a problem.
And so during my PhD at the same time, this is back in 98, 99 at the same time, I was
in the library, I was reading books on capitalism, I was reading books on Marxism, I was reading
books on what is this thing, what does it mean?
And I encountered a set of writings from people that said they were the inheritors of Adam
Smith, but Adam Smith for the first time, right?
Which is the wealth of nations and kind of this notion of emergent societies and realized,
oh, there's this whole world out here of people.
And the challenge of economics is also political, because economics, people, different parties
running for office, they want their economic friends, they want their economists to back
them up, right?
Or to be their magicians, like the magicians in Pharaoh's court, right?
The people that are going to say, hey, this is, you should listen to me, because I've
got the expert who says this.
And so it gets really muddled, right?
I was looking at it as a scientist going, what does this space, what does this mean?
How does Paris get fed, what does money, how does it work?
And I found a lot of writings I really loved, I found some things I really loved, and I
learned from that.
It was writings from people like von Mises, he wrote a paper in 1920 that still should
be read more than it is, it was the economic calculation problem of the socialist common
wealth.
It was basically in response to the Bolshevik Revolution in 1917.
And his basic argument was, it's not going to work to not have private property.
You're not going to be able to come up with prices.
The bureaucrats aren't going to be able to determine how to allocate resources without
a price system.
And a price system emerges from people making trades.
And they can merely make trades if they have authority over the thing they're trading.
And that creates information flow that you just don't have if you try to top down it.
Right.
Right.
It's like, huh, that's a really good point.
Yeah, prices have a signal that's used, and it's important to have that signal when you're
trying to build a community of productive people like you would in the software engineering
space.
Yeah, the prices are actually an important signaling mechanism, right?
And that money is just a bartering tool, right?
So this is the first time I've encountered any of this concept, right?
And the fact that, oh, this is actually really critical, like it's so critical to our prosperity
and that we're dangerously not learning about this, not teaching our children about this.
So you had the three kids, they had to make some hard decisions.
That makes some money, right?
Had to figure it out.
But I didn't really care.
I mean, I've never been driven by money, just need it.
Yeah, right.
You need to eat.
So how did that resolve itself in terms of side by?
So I would say it didn't really resolve itself.
It sort of started a journey that I'm continuing on.
I'm still on, I would say.
I don't think it resolved itself.
But I will say, I went in eyes wide open.
I knew that there were problems with giving stuff away and creating the market externalities.
The fact that, yeah, people might use it and I might not get paid for it and I'll have
to figure something else out to get paid.
At least I can say I'm not bitter that a lot of people have used stuff that I've written
and I haven't necessarily benefited economically from it.
I've heard other people be bitter about that when they write or they talk, like, oh, I
should have got more value out of this.
And I'm also, I want to create systems that let people like me who might have these desires
to do things, let them benefit.
So it actually creates more of the same.
Not to turn on your bitterness module, but there's some aspect I wish there was mechanisms
for me to reward whoever created side pie and num pie because it brought so much joy
to my life.
I appreciate that.
You know what I mean?
The tip dark notion was there.
I appreciate that.
There should be a very frictionless mechanism.
There should be frictionless mechanism.
I totally agree.
I would love to talk about some of the ideas I have because I actually came across, I think
I've come up with some interesting notions that could work, but they'll require anything
that will work takes time to emerge, right?
Things don't just turn overnight.
That's definitely one thing I've also understood and learned is any fixes.
That's why it's kind of funny.
We often give credit to, you know, oh, this president gets elected and, oh, look how great
things have done.
I saw that when I had a transition in a condo when a new CEO came in, right, and it's like
the success that's happening, there's an inertia there.
Yeah.
Right.
And sometimes the decision made like 10 years before is the reason why the success is the
same.
Right.
Exactly.
So we're sort of just one around taking credit for stuff.
The credit assignment has like a delay to it that makes the credit assignment basically
wrong more than right.
Wrong more than right.
Exactly.
Because, you know, that's the stuff I would read a ton about, you know, early on.
So I don't, I feel like I'm with you.
Like I want the same thing.
I want to be able to, and honestly not for personally, I've been happy.
I've been happy.
I feel like I don't have any, I mean, we've been done reasonably okay, but I've had to
pursue it.
Like that's really what started my trajectory from academia is reading that stuff led me
to say, oh, entrepreneurship matters.
I love software, but we need more entrepreneurs and I want to understand that better.
So once I kind of had that, that virus infect my brain, it, even though I was on a trajectory
to go to a tenure track position at a university and I was there for six years, I was kind
of already out the door when I started and we can get into that.
Yeah.
Can I just ask a quick question on, is there some design principles that were in your mind
around sci-fi?
Like, is there some key ideas that were just like sticking to you that this is, this is
the fundamental ideas?
Yeah.
I would say so.
I would think it's basically accessibility to scientists, like give them, give scientists
and engineers tools that they don't have to think a lot about programming.
So give them really good building blocks.
Give them functions that they want to call and sort of just the right length of spelling.
You know, there's a one tradition in a programming where it's like, you know, make very, very
long names, right?
And you can see it in some programming languages where the names get, you know, tick off the
screen.
And in the 4chan world, characters had to be six letters early on, right?
And that's way too much, too little.
But I was like, I like to have names that were informative, but short.
So even though Python, well, this is a different conversation, but documentation is doing some
work there.
So when you look at great scientific libraries and functions, there's a richness of documentation
that helps you get into the details.
The first glance at a function gives you the intuition of all it needs to do by looking
at the headers and so on.
But to get the depths of all the complexities involved, all the options involved, documentation
does some of the work.
Documentation is essential.
Yeah.
So that was actually, so we thought about several things.
One is we wanted plotting.
We wanted interactive environment.
We wanted good documentation.
These were things we knew, we wanted.
The reality is those took about 10 years to evolve, right?
Given the fact that we didn't have a big budget, it was all volunteer labor.
It was sort of, when EnThot got created and they started to try to find projects, people
would pay for pieces and they were able to fund some of it.
Not nearly enough to keep up with what was necessary and no criticism, just simply the
reality.
I mean, it's hard to start a business and then do consulting and then also promote an open
source project that's still fairly new.
Cypher was fairly niche.
We stayed connected all while I was a student, sorry, a professor.
I went to BYU and started to teach, electrical engineering, all the applied math courses.
I loved teaching, single processing, probability theory, electrical magnetism.
If you look at right in my professor, which my kids love to do, I got some bad reviews
because people-
What was the criticism?
I would speak too high of a level.
I definitely had a calibration problem coming out of graduate work where I hate to be condescending
to people.
I really have a ton of respect for people fundamentally.
My fundamental thing is I respect people.
Sometimes that can lead to a-
I was thinking they had more knowledge than they did and so I would just speak at a very
high level, assume they got it.
But they need to rise to the standard that you set.
That's one of the-
Some of the greatest teachers do that.
I agree.
That was what was inspiring me, but you also have to-
I cannot say I was articulate of some of the greatest teachers.
One classic example, when I first taught at BYU, my very first class, it was overheads.
Transparencies overheads before projectors were really that common.
Transparencies, I'm writing my notes out.
I go in, rooms half dark, I just blaring into these transparencies.
Here it is.
Here it is.
Here it is.
Here it is.
Here it is.
And I gave a quiz after two weeks.
No one knew anything.
Nothing I had taught had gotten anywhere.
And I realized, okay, this is not working.
So I put away the transparencies and I turned around and just started using the chalkboard.
And what it did is it slowed me down.
The chalkboard just slowed me down and gave people time to process and to think and then
made me focus.
My writing wasn't great on their chalkboard, but I really love that part of the teaching.
So that entered sci-pi's world in terms of we always understood that there's a di-bactic
aspect of sci-pi, kind of how do you take the knowledge and then produce it.
The challenge we had was the scope.
Like, ultimately, sci-pi was everything, right, and so 2001, when it first came out, people
were starting to use it.
No, this is cool.
This is a tool we actually use.
At the same time, 2001 timeframe, there was a little bit of the Hubble space telescope.
The folks at Hubble had started to say, hey, Python, we're going to use Python for processing
images from Hubble.
And so Perry Greenfield was a good friend in running that program.
And he had called me before I left to BYU and said, you know, we want to do this, but
numeric actually has some challenges in terms of, you know, it's not the array doesn't have
enough types.
We need more operations, you know, broadcasting needs to be a little more settled.
They wanted record arrays.
They wanted, you know, record arrays are like a data frame, but a little bit different.
But they wanted more structured data.
So he had called me even early on then, and they said, you know, would you want to work
on something to make this work?
And I said, yeah, I'm interested, but I'm going here and we'll see if I have time.
So in the meantime, while I was teaching and sci-pi was emerging, and I had a student,
I was constantly while I was teaching, trying to figure a way to fund this stuff.
So I had a graduate student, my only graduate student, a Chinese fellow, Lu Hongze is his
name, great guy.
He wrote a bunch of stuff for iterative, iterative linear algebra, like got into writing some
of the iterative linear algebra tools that are currently there in sci-pi and they've
gotten better since, but this is in 2005, kept working on sci-pi.
But Perry has started working on a replacement to numeric called numere, and in 2004, a package
called ndimage.
It was an image processing library that was written for numere, and it had in it a morphology
tool.
I don't know if you know what morphology is.
It's open, dilations, you know, there's sort of this, as a medical imaging student, I knew
what it was because it was used in segmentation a lot.
And in fact, I'd wanted to do something like that in Python, in sci-pi, but just had never
gotten around to it.
So when it came out that it worked only on numere, and sci-pi needed numeric, and so we
effectively had the beginning of this split.
And numeric and numere didn't share data, they were just two, so you could have a gigabyte
of numeric numere data and gigabyte of numeric data and they wouldn't share it.
And so you had these, then you had these scientific libraries written on top.
I got really bugged by that.
I got really like, oh man, this is not good.
We're not cooperating now.
We're sort of redoing each other's work and we're just this young community.
So that's what led me, even though I knew it was risky because my, you know, I was on
a tenure track position, 2004, I got reviewed.
They said, hey, things are going okay, you're doing well, paper's coming out, but you're
kind of spending a lot of time on this open source stuff, maybe do a little less of that
and a little more of the paper writing and grant writing, which was naive, but it was
definitely the tie, you know, the thinking that still goes on, still goes on.
We're basically creating a thing which enables science in the 21st century.
Maybe don't emphasize that so much for your tenure.
Right.
It illustrates some of the challenges.
It does.
And it's, people mean well, like, but we've gotten broken in a bunch of ways.
Certain things, a programming, understanding the role of software engineering, programming
in society is a little bit like, I guess.
Exactly.
Now I was in an electrical engineering position.
Right.
That's even worse.
Yeah.
Yeah.
They were very focused.
And so, you know, good people, and I had a great time.
I loved my time.
I loved my teaching.
I loved all the things I did there.
The problem was, this split was happening, this community I loved.
I saw people, and I go, oh my gosh, this is going to be, this is not great.
And so, I happened, you know, fate, I had a class I signed up for, I was trying to build
an MRI system.
So, I had a kind of a radio, a digital radio class, a digital MRI class.
And I had people sign up, two people signed up, then they dropped, and so, I had nobody
in this class.
So, and I didn't have any other courses to teach, and I thought, oh, I've got some time.
And I'll just write a merger of numeric and numeral.
Like I'll basically take the numeric code base, add the features number I was adding,
and then kind of come up with a single array library that everybody can use.
So that's where NumPy came from, was my thinking, hey, I can do this, and who else is going
to?
And then I ran the community long enough, and I'd written enough C code, I knew the
structures.
And in fact, my first contribution to numeric had been writing the CAPI documentation that
went in the first documentation for NumPy, for numeric, sorry.
This is Paul de Bois, David Asher, Conrad Henson, and myself.
I got credit because I wrote this chapter, which is all the CAPI of numeric, all the
C stuff.
So, I said, ah, probably the one to do it, you know, nobody else is going to do this.
So it's sort of out of a sense of duty and passion, knowing that I don't think my academic,
I don't think the department here is going to appreciate this, but it's the right thing
to do.
Can we just link on that moment because the importance of the way you thought and the
action you took, I feel is understated and is rare, and I would love to see so much more
of it because what happens as the tools become more popular, there's a split that happens.
And it's a truly heroic and impactful action to in those early, in that early split to
step up and you, it's like great leaders throughout history, like get, what is the brave heart,
like get on a horse and, and rile the troops because I think that can have, make a big
difference.
We have TensorFlow versus PyTorch in the machine learning.
We have the same problem today.
Yeah.
It's, I wonder, it's actually bigger.
I wonder if it's possible to, in the early days to rally the troops.
It is possible, especially in the early days, the longer it goes, the harder, right?
And the more energy in the factions, the harder, but in the early days, it is possible and
it's extremely helpful and then there's a willingness there, but, but the, but the
challenge is there's usually not a willingness to fund it.
There's not a willingness to, you know, like I was literally walking into a field saying
I'm going to do this and you know, here I am, like, you know, I have five kids at home
now.
Pressure builds.
Sometimes my wife hears these stories and she's like, you did what?
I thought we were going to, I thought you were actually on a path to make sure we had
resources and money, but, but again, there's a, there's an aspect, I'm, I'm a very hopeful
person.
I'm an optimistic person.
My nature.
I love people.
I learned that about myself later on, uh, uh, part of my, uh, my religious beliefs actually
lead to that.
And it's why I hold them dear because it's actually how I feel about, it's what, it's
what leads me to this, to these attitudes, sort of this hopefulness and this sense of,
yeah, it may not, it may not work out for me financially or maybe, but that's not the
ultimate gain.
Like that's a thing, but it's not, you know, that's not the scorecard, uh, for me.
And so I just wanted to be helpful and I knew, and partly because these sci-fi conferences
because of the main list conversations, I knew there was a lot of need for this.
Right.
And so I had this, it wasn't like I was alone in terms of no feedback.
I had these people who knew, but it was crazy.
Like people who, to the time said, yeah, we didn't think you'd be able to do it.
Yeah.
We thought it was crazy.
And also instructive, like practically speaking, that you had a cool feature that you were
chasing in the morphology, like the, like it's, it's not just like, it's not some visionary
thing.
I'm going to unite the community.
You were like, you were actually practically, this is what one person actually could do,
uh, and actually build.
Cause that is important cause you can get over your skis.
Yeah.
You can definitely get over your skis.
And I had, in fact, this almost got me over my skis, right?
I would say, well, in retrospect, I hate looking back, we can, I can tell you all the flaws
with NumPy.
Right.
We want to go into it.
I would, there's lots of stuff that I'm like, oh man, that's embarrassing.
That was wrong.
I wish I had somebody slot me with a wet fish there.
Yeah.
Like I needed, like what I'd wished I'd had was somebody with more experience and certainly
library writing and array library.
There's like, I wish I had me, I could go back in time and go, do this, do that.
There's an important thing.
Cause there's things we did that are still there that are problematic that created challenges
for later.
And, and I didn't know it at the time, didn't understand how important that was.
And in many cases, didn't know what to do.
Like there was pieces of the design of NumPy.
I didn't know what to do until five years ago.
Now I know what they should have been, but I didn't know at the time and nobody, and
I couldn't get the help.
Anyway.
So I wrote it.
It took about, it took four months to write the first version, then about 14 months
to make it usable.
But it was, it wasn't, it was that first four months of intense writing, coding, getting
something out the door that worked.
That was, it was, it was definitely challenging.
And then the big thing I did was create a new type object called D type.
That was probably the sync, the, the contribution.
And then the fact that I added broad, not just broadcasting, but advanced indexing.
So that you could do, um, masked indexing and indirect indexing instead of just slicing
in.
So for people who don't know, and maybe you can elaborate NumPy, I guess the vision in
the narrowest sense is to have this object that represents n dimensional arrays.
And like at any level of abstraction you want, but basically it could be a black box that
you can investigate in ways that you would naturally want to investigate such objects.
Yes.
Exactly.
So you could do math on it easily.
Math on it easily.
Yeah.
So it had an associated library of math operations.
And effectively PsyPy became an even larger operate set of math operations.
So the key for me was I was going to write NumPy and then move PsyPy to depend on NumPy.
In fact, early on, one of the initial proposals was that we would just write PsyPy and it
would have the numeric object inside of it and it'd be PsyPy dot array or something.
That turned out to be problematic because numeric already had a little mini library
of linear algebra and some functions.
And it had enough momentum, enough users that nobody wanted to, they wanted the backward
compatibility.
One of the big challenges of NumPy was I had to be backward compatible with both numeric
and num array in order to allow both of those communities to come together.
There was a ton of work in creating that backward compatibility that also created echoes in
today's object.
Like some of the complexity in today's object is actually from that goal of backward compatibility
with these other communities, which if you didn't have that, you'd do something different,
which is instructive because a lot of things are there, you know, what is that there for?
It's like, well, it's a remnant, it's an artifact of its historical existence.
By the way, I love the empathy and the lack of ego behind that because I feel, you see
that in the split, in the JavaScript frameworks, for example, the arbitrary branching, is,
I think in order to unite people, you have to kind of put your ego aside and truly listen
to others.
What do you do? What do you love about num array? What do you love about numeric? Actually
get a sense, we're talking about languages earlier, sort of empathize to the culture,
the people that love something about this particular API, the naming style or the usage
patterns and truly understand them so that you can create that same draw in the united
thing.
And that you have to also have enough passion that you'll do it. It can't be just like a
perfunctory, oh yes, I'll listen to you and then I'm not really that excited about it.
So it really is an aspect, it's a philosophical, like there's a filia, there's a love of esteeming
of others that's actually at the heart of what, it's sort of a life philosophy for me,
right, that I'm constantly pursuing and that helped, absolutely helped.
It makes me wonder in a philosophical, like looking at human civilization as one object,
it makes me wonder how we can copy and paste travesties in this world.
Well, some aspects maybe.
Some aspects, right, right, exactly. Well, it's a good question, how do we teach this,
how do we encourage it, how we lift it?
Because so much of the software world, it's giant communities, right, but it seems like
so much is moved by little individuals. You talk about Linus Tarwald, it's like, could
you have not, could you have had Linux without him?
Yeah, Guido and Python, Guido and Python, I mean, in the sci-fi community particularly,
it's like I said, we wanted to build this big thing, but ultimately we didn't. What
happened is we had Mavericks and champions like John Hunter who created Matplotlib,
we had Fernando Perez who created iPython. And so we sort of inspired each other, but
in that credit, there's sort of a culture of this selfless, given the stewardship mentality
as opposed to ownership mentality, but stewardship and community focused, community focused but
intentional work, like not waiting for everybody else to do the work, but you're doing it for
the benefit of others and not worried about what you're going to get. You're not worried
about the credit, you're not worried about what you're going to get, you're worried
about, I later realized that I have to worry a little about credit, not because I want
the credit, because I want people to understand what led to the results. Like I don't, it's
not, it's not about me, is I want to understand this is what led to the result. So it's like,
I think doing, and this is what had no impact on the result, like let's, let's promote,
this is just like you said, I want to promote the attributes that led, that helped make
us better off. How do we make more of West, West McKinney, like West McKinney was critical
to the success of Python because of his creation of pandas, which is the roots of that we're,
we're all the way back in a numeric and num array and NumPy, where NumPy created an array
of records, West started to use that almost like a data frame, except it's an array of
records and data frame, the challenge is, okay, if you want to augment it at another column,
you have to insert, you have to do all this memory movement to insert a column. Whereas
data frames became, oh, I'm going to have a loose collection of arrays. So it's a record
of arrays that is a part of a data frame. And we thought about that back in the memory
days, but West ended up doing the work to build it. And then, then also the operations
that were relevant for data processing. What I noticed is just that each of these little
things creates just another tick, another up. So NumPy ultimately took a little while,
six months in, people started joining me, you know, Francesc Alted, Robert Kern, Charles
Harris. And these, these people are many of the unsung heroes, I would say, people who
are, you know, they don't, they sometimes don't get the credit they deserve, because
they were critical both to support, like, you know, it's, it's hard and you want, you
need some support, people need support. And I needed just encouragement and they were
helping encourage by contributing. And once the big thing for me was when John Hunter,
he had previously done kind of a simple thing called numerics to kind of, you know, between
numeric and nummery, he had a little high level tool that would just select each one
for Matplotlib. In 2006, he finally said, we're going to just make NumPy the dependency
of Matplotlib. As soon as he did that, and I remember specifically when he did that,
I said, this, okay, no, we've done it. Like, that was when I knew we had to see success.
Before then it was still, you know, doing, sure, but that kind of started a roller coaster
and then 2006 to 2009. And then I've been floored by the, by what it's done. Like, I
hadn't, I knew it would help. I didn't have no idea how much it would help. Right. So
and it has to do with, again, the language thing, it just people started to think in
terms of NumPy, like, yes. And that opened up a whole new way of thinking. And part of
the story that you kind of mentioned, but maybe you can elaborate is it seems like
at some point in the story, Python took over science and data science. Yes. And not bigger
than that, the scientific community started to think like programmers or started to utilize
the tools of computers to do, like at a scale that wasn't done with Fortran, like, at this
gigantic scale, they started to opening their heart. And then Python was the thing. I mean,
there's a few other competitors, I guess, but Python, I think, really, really took
over.
I agree. There's a lot of stories here that are kind of during this journey because this
is sort of the start of this journey in 2005, six. So my tenure committee, I applied for
tenure in 2006, 2007, it came back, I split the department. I was very polarizing. I had
some huge fans and then some people said no way. Right. So it was very, I was a polarizing
figure in the department. It went all the way up to the university president. Ultimately,
my department chair had the sway. And they didn't say no. They said, come back in two
years and do it again. And I went, at that point, I was like, I said, I had this interest
in entrepreneurship, this interest in not the academic circles, not the, how do we make
industry work? So I do have to give credit to that exploration of economics because
that led me, oh, I had a lot of opinions. I was actually very libertarian at the time.
And I'm still have some libertarian trends, but I'm more of a, I'm more of a collectivist
libertarian.
So you value broadly, philosophically freedom.
I value broadly, philosophically freedom, but I also understand the power of communities,
like the power of collective behavior. And so what's that balance, right? That makes
sense. So by the time I was just, I got to go out and explore this entrepreneurial world.
So I left academia. I said, no thanks. Called my friend Eric here, who had, his company was
going, I said, Hey, could I join you and start this trend? And he, at that time, they were
using sci-fi a lot. They were trying to get clients. And so I came down to Texas and in
Texas is where I sort of, it's my entrepreneur world, right? I left academia and went to
entrepreneur world in 2007. So I moved here in 2007, kind of took a leap, knew nothing
really about business, knew nothing about a lot of stuff there. There's, you know, for
a long time I've kept some connections to a lot of academics because I still value it.
I still love the scientific tradition. I still value the essence and the soul and the heart
of what is possible. Don't like a lot of the administration and the kind of, we can go
into detail about why and where and how this happens. What are the challenges?
I mean, I don't know, but I'm with you. So I'm still affiliated with MIT. I still love
MIT because there's magic there. And there's people I talk to, like researchers, faculty,
in those conversations and the white board and just the conversation, that's magic there.
All the other stuff, the administration, all that kind of stuff seems to, you don't want
to say too harshly criticized sort of bureaucracies, but there's a lag that seems to get in the
way of the magic. And I don't, I'm still have a lot of hope that, that can change because
I don't often see that particular type of magic elsewhere in the industry. So like we
need that and we need that flame going. And it's the same thing as, exactly as you said,
it has the same kind of elements like the open source community does. And, but then if you,
like the reason I stepped away, the reason I'm here, just like you did in Austin is like,
if I want to build one robot, I'll stay at MIT. But if I want to build millions and make
money enough to work and explore the magic of that, then you can't. And I think that
dance is
The translational dance has been lost a bit.
Yeah.
Right. And there's a lot of reasons for that. I'm not, I'm certainly not an expert on this
stuff. I can opine like anybody else, but I realized that I wanted to explore entrepreneurship,
which I, and really figure out, and it's been a driving passion for 20 years, 20, 25 years.
How do we connect capital markets and company? Because again, I fell in love with the notion
of profit seeking on its own is not a bad thing. It's actually a coordination mechanism
for allocating resources that, you know, in an emergent way, right? That respects everybody's
opinions, right? So this is actually powerful. So I say all the time, when I make a company
and we do something that makes profit, what we're saying is, hey, we're collecting the
world's resources and voluntarily people are asking us to do something that they like.
And that's a huge deal. And so I really liked that energy. So that's why I came to do and
to learn and to try to figure out. And that's what I've been kind of stumbling through since
for the past 14 years.
And that's 2007.
2007.
And so you were still working on a robot.
So no pie was just emerging.
Just emerging.
And one of the things I've done, it's worth mentioning because it emphasized the exploratory
nature of my thinking at the time. I said, well, I don't know how to fund this thing.
I've got a graduate student I'm paying for and I got no funding for him. And I had done
some fundraising from the public to try to get public fundraising from my lab. I didn't
really want to go out and just do the fundraising circuit the way it's traditionally done. So
I wrote a book and I said, I'm going to write a book and I'm going to charge for it. It
was called Guide to NumPy. And so ultimately NumPy became documentation driven development
because I basically wrote the book and made sure the stuff worked to the book would work.
So it really helped actually make NumPy become a thing. So writing that book and it was not
a, I mean, it's not a page turner. Guide to NumPy is not a book you pick up and go, oh,
this is great over the fire. But it was, it's where you could find the details. Like how
did all this work?
And a lot of people love that book.
And so a lot of people ended up, so I, but I said, look, I needed, so I'm going to charge
for it. And I got some flak for that. Not that much. Just, just probably five angry messages,
people's yelling at me saying I was, you know, bad guy for, for charging for this book.
It was one of them were just all, no, no, I haven't really had any interaction with him
personally. Like I said, but, but there were a few, but, but I just surprisingly not, there
was actually a lot of people like, you know, it's fine. You can charge for a book. That's
no big deal. We know that's a way you can, you can try to make money around open source.
So, so what I did, but I, I did an interesting way. I said, well, you know, kind of my ideas
around, around IP law and stuff. I love the idea. You can share something. You can spread
it. Like once it's, the fact that you have a thing and copying is free, but the creation
is not free. So how do we, how do you fund the creation and allow the copying, right?
And then software is a little more complicated than that because creation is actually a continuous
thing. You know, it's not like you build a widget that's done. It's sort of a process
of emerging and continue to create. But I wrote the book and had this market determined
price thing. I said, look, I need, I think I said 250,000. If I make 250,000 from this
book, it's, it'll, I'll make it free. So as soon as I get that much money, or I said
five years, right? So there's a time limit. Like forever.
That's cool. I didn't know this story. Yeah. So I released it on this. And it's actually
interesting because one of the people who also thought that was interesting ended up
being Chris White, who was the director of DARPA project that we got funding through
at Anaconda. And the reason he even called us back is because he remembered my name from
this book and he thought that was interesting. And so even though we hadn't gone to the demo
days, we applied and the people said, yeah, nobody ever gets this without coming to the
demo day first. This is the first time I've seen it. But it's because I knew, you know,
Chris had done this and had this interaction. So it did have impact. I was actually really,
really pleased by the result. I mean, I ended up, I ended up in three years, I mean 90,000.
So sold 30,000 copies by myself. I just put it up on, you know, use PayPal and sold it.
And that was my first taste of kind of, okay, this can work to some degree. And I, you know,
all over the world, right? From Germany to Japan to, it was actually, it did work. And
so I appreciated the fact that PayPal existed and had a way to make, to get the money. The
distribution was simple. This is pre Amazon book stuff. So it was just publishing a website.
It was the popularity of sci-fi emerging and getting company usage. I ended up not letting
it go the five years and not trying to make the full amount because, you know, a year or
a half later, I was at Enthought. I had left academia as an Enthought and I kind of had
a full-time job. And then actually what happened is the documentation people, there's a group
that said, Hey, we want to do documentation for sci-fi as a collective. And they were
essentially needing the stuff in the book, right? And so they kind of asked, Hey,
could we just use the stuff in your book? And at that point I said, Yeah, I'll just open it up.
So that's, but it has served its purpose. And the money that I made actually funded my grad
student. Like it was actually, you know, I paid him $25,000 a year out of that money.
The funny thing is if you do a very similar kind of experiment now with NumPy or something like
it, you could probably make a lot more. That's probably true. Because of the tooling and the
community building. Yeah, I agree. Like the, and social media, there's just a virality to that
kind of idea. I agree. There'd be things to do. I've thought about that. But, and really I thought
about a couple of books or a couple of things that could be done there. And I just haven't,
right? Even I tried to hire a ghostwriter this, this year too, to speak if I could help, but it,
it didn't. But part of my problem is this, I've been so excited by a number of things that stemmed
from that. Like, so I came here, worked at and thought for four years, graciously, you know,
Eric made me president and we started to work closely together. We actually helped him buy
out his partner. It didn't end great. Like, unfortunately, Eric and I aren't real, aren't
friends now. I still respect him. I have a lot, you know, I wish we were, but he didn't like the
fact that I, that Peter and I started Anaconda, right? That was not, I mean, so I'm, there's two
sides of that story. So I'm not going to go into it, right? Sure. But you, as human beings, and
you wish you still could be friends. I do. I do. It saddens me. I mean, that's a, that's a story
of great minds building great companies. Yeah. Somehow it's sad that when there's that kind of.
And I hold him in esteem. I'm grateful for him. I think he's, they're doing, you know,
their thoughts still exist. They're doing great work helping scientists. They still run the
sci-fi conference. They're in the, they have an R&D platform they're selling now that's a tool that
you can go get today, right? So they've been a, and thought has played a role in the sci-fi,
in supporting the community around sci-fi. I would say they ended up not being able to,
they ended up building a tool suite to write GUI applications. Like that's where they could
actually make that the business could work. And so the supporting sci-fi and numpy itself
wasn't as possible. Like they didn't, they try. I mean, it was not just because,
it was just because the business aspect. So, and then I wanted to build a company that could do,
that could get venture funding, right? Better for worse. I mean, that's a longer story. We
could talk a lot about that, but. And that's, that's where Anaconda came to be.
That's where Anaconda came to be. So let me, let me ask you, it's a little bit for fun because
you built this amazing thing. And so let's, let's talk about like an old warrior looking over old
battles. You've, you know, there's a sad sort of letter in 2012 that you wrote to the NumPy mailing
list announcing that you're leaving NumPy. And some of the things you've listed is some,
some of the things you regret or not regret necessarily, but some things to think about.
If you could go back and you could fix stuff about NumPy or both sort of in a personal level,
but also like looking forward, what kind of things would you like to see changed?
Good question. So I think there's technical questions and social questions right there.
First of all, you know, I wrote NumPy as a service and I spent a lot of time doing it and then other
people came help make it happen. NumPy succeeded because the work of a lot of people, right? So
it's, it's important to understand that. I'm grateful for the opportunity, the role I had, I
could play and I'm grateful that things I did had an impact, but they only had the impact they had
because the other people that came to the story. And so they were essential, but the way data types
were handled, the way data types, we had a race scalars, for example, that are really just
a substitute for a type concept, right? So we had a race scalars are actual Python objects,
so that there's for every, for a 32-bit float or a 16-bit float or a 16-bit integer, Python
doesn't have a natural, it's just a one integer, there's one float. Well, what about these lower
precision types, these larger precision types? So we had them in NumPy, so that you could have a
collection of them, but then have an object in Python that was one of them. And there's questions
about, like, in retrospect, I wouldn't have created those of an improved the type system,
like made the type system actually a Python type system as opposed to currently, it's a Python one
level type system. I don't know if you know the difference between Python one, Python two, it's
kind of technical, kind of depth, but Python two, one of its big things that Guido did, it was really
brilliant, it was he actually, Python one, all classes, new objects were one, so he was a user,
wrote a class, it was an instance of a single Python type called the class type, right? In Python
two, he used a meta typing hook to actually go, oh, we can extend this and have users write classes
that are new types. So he was able to have your user classes be actual types, and the Python type
system got a lot more rich. I barely understood that at the time that NumPy was written. And so
I essentially in Python, NumPy created a type system that was Python one era. It was every,
every dtype is an instance of the same type, as opposed to having new dtypes be really just Python
types with additional metadata. What's the cost of that? Is it efficiency? Is it usability?
It's usability primarily. The cost isn't really efficiency. It's, it's, it's the fact that it's
clumsy to create new types. It's hard. And then one of the challenges is you want to create new
types. You want a quaternion type, or you want to add a new, you know, posit type, or you want to,
so it's hard. Now, in the, and now, if we had done that well, when Numba came on the scene,
where we could actually compile Python code, it would integrate with that type system much cleaner.
And now all of a sudden, you could do gradual typing more easily. You could actually have Python
when you add Numba plus better typing could actually be a, you'd smooth out a lot of rough edges.
But that's already, there's like, but are you talking about from the perspective of developers
within NumPy or users of NumPy? Because developers of new, not really users of NumPy so much,
it's the development of NumPy. You're thinking about like, how to design NumPy so that it's
contributors. Yeah, the contributors, it's easier. It's easier. It's less work to make it better
and to keep it maintained. And where that's impacted things, for example, is the GPU,
like all of a sudden GPUs start getting added, and we don't have them in NumPy. Like NumPy should
just work on GPUs. The fact that we have to download a whole other object called Koopi to have
arrays on GPUs is just an artifact of history. Like, there's no fundamental reason for it.
Well, that's really interesting. If we could sort of go on that tangent briefly is you have
PyTorch and other library like TensorFlow that basically tried to mimic NumPy. Like, you've
created a sort of platonic form of multi-dimensional. Yeah, exactly. Well, the problem was they didn't
realize that. Yeah, platonic form has a lot of edges. They're like, well, we should cut those
out before we present it. So I mean, I wonder if you can comment, is there like a difference
between their implementations? Do you wish that they were all using NumPy over like in the
subtraction of GPU and sorry to interrupt that there's GPUs, ASICs, there might be other
neuromorphic computing, there might be other kind of, or the aliens will come with a new kind of
computer, like an abstraction that NumPy should just operate nicely over the things that are
more and more and smarter and smarter with this multi-dimensional arrays.
Yeah, yeah. There's several comments there. We are working on something now called data-apis.org,
data-api.org. You can go there today. And it's our answer. It's my answer. It's not just me.
It's me and Rolf and Athen and Aaron and a lot of companies are helping us at Quonsite Labs.
It's not unifying all the arrays. It's creating an API that is unified. So we do care about this
and are trying to work through it. Actually, the chance to go and meet with the TensorFlow team
and the PyTorch team and talk to them after exiting Anaconda, just talking about because the
first year after leaving Anaconda in 2018, I became deeply aware of this and realized that,
oh, this split in the array community that exists today makes what I was concerned about in 2005
pretty parochial. It's a lot worse. Now, there's a lot more people. So perhaps the industry can
sustain more stacks. There's a lot of money, but it makes it a lot less efficient. But I've also
learned to appreciate it's okay to have some competition. It's okay to have different implementations,
but it's better if you can at least refactor some parts. I mean, you're going to be more efficient
if you can refactor parts. It's nice to have competition over things
over which it's nice to have competition. They're innovative. Yeah, innovative. And then maybe on
the infrastructure, whatever, however you define infrastructure, maybe it's nice to have
cooperation. Exactly. I agree. But it was interesting to hear the stories. I mean,
TensorFlow came out of the C++ library. Jeff Dean wrote, I think, that was basically
how they were doing inference. And then they realized, oh, we could do this TensorFlow thing.
That C++ library, then what was interesting to me was the fact that both Google and Facebook
did not, it's not like they supported Python or NumPy initially. They just realized they had to.
They came to this world and then all the users were like, hey, where's the NumPy interface?
Oh, and then they kind of came late to it. And then they had these bolt-ons. TensorFlow's bolt-on,
I don't mean to offend, but it was so bad. It's the first time that I'm usually, I mean,
one of the challenges I have is I don't criticize enough. In the sense that I don't give people
input enough. I think it's universally agreed upon that the bolt-ons and TensorFlow work.
But I went to it. There's a talk given at my orca in Spain, and a great guy came and gave a talk.
I said, you should never show that IPI again at a Python conference. That's terrible. You're taking
this beautiful system you've created and you're corrupting all these poor Python people, forcing
them to write code like that or thinking they should. Fortunately, they adopted Keras as their,
and that Keras is better. And so Keras, TensorFlow is fine, is reasonable. But they bolted it on.
Facebook did too. Facebook had their own C++ library for doing inference, and they also had
the same reaction. They had to do this. One big difference is Facebook, maybe because the way
it's situated in part of FAIR, part of their research library, TensorFlow is definitely used
and they have to make, they couldn't just open it up and let the community change what that is,
because I guess they were worried about disrupting their operations. Facebook's been much more open
to having community input on the structure itself. Whereas Google and TensorFlow, they're really eager
to have community users. People use it and build the infrastructure, but it's much more walled.
Like it's harder to become a contributor to TensorFlow itself.
And it's also, this is a very difficult question to answer and don't need to be throwing shade at
anybody. But you have to wonder, it's the Microsoft question of when you have a tool like PyTorch
or TensorFlow, how much are you tending to the hackers and how much are you tending to the big
corporate clients? Correct. Do you tend to the millions of people that are giving you almost
no money, or do you tend to the few that are giving you a ton of money? I tend to
stand with the people. Because I feel like if you nurture the hackers, you will make the right
decisions in the long term that will make the companies happy. I lean that way, too. I totally
agree. But then you have to find the right dance. But it's a balance. Because you can lean to the
hackers and run out of money. Yeah, exactly. Which has been some of the challenge I've faced
in the sense that I would look at some of the experiments, like NumPy, the fact that we have
this split is a factor of I wasn't able to collect more money towards NumPy development.
Yeah. Right. I mean, it didn't succeed in the early days of getting enough financial contribution
in NumPy so they didn't really work on it. I couldn't work on it full-time. I had to just
catch an hour here, an hour there. And I basically not like that. I've wanted to be able to do
something about that for a long time and try to figure out how, well, there's lots of ways.
I mean, possibly one could say, we had an offer from Microsoft at early days of Anaconda.
In 2014, they offered to come buy us. The problem was the right people at Microsoft
didn't offer to buy us. And we were like a second, they had really bought, they just bought R,
the R company called, it was not R Studio, but it was another R company that was emergent.
And it was kind of a, well, we should also get a Python play. But they were really double it
down on R. And so it was like... It was where you would go to die. So it was before Satya was there.
Satya had just started. And the offer was coming from someone two levels down from him.
Got you.
Right. And if it had come from Scott Guthrie, so I got a chance to meet Scott Guthrie,
great guy, I like him. If it offered to come from him, probably would be at Microsoft right now.
That'd be fascinating. And that would be really nice actually, especially given what Microsoft
has since done for the open source community and all those.
Yes. I think they're doing well. I really like some of the stuff they've been doing.
They're still working, and they've hired Guido now and they've hired a lot of Python developers.
Guido's not Microsoft, I need to...
Which he retired, then he came out of retirement and he's working on...
So I was just talking to him and he didn't mention this part.
Well, I should get this further. Because I know he loved Dropbox,
but I wasn't sure what he was doing, what he was up to.
Well, he was kind of saying he'd retire, but it's literally been five years since I last
sat down and really talked to Guido. Guido is a technology expert. So I came, I was excited
because I'd finally figured out the type system for NumPy. I wanted to kind of talk about that
with him and I kind of overwhelmed him. Could you stay in that just for a brief moment because
you're a fascinating person in the history of programming. He is a fascinating person.
What have you learned from Guido about programming, about life?
Yeah, yeah, a lot actually. I've been a fan of Guido's. We have a chance to talk.
Some, I wouldn't say we talk all the time, not really at all. But we've talked enough to...
I respect his... In fact, when I first started NumPy, one of the first things I did was I asked
Guido for a meeting with him and Paul de Bois in San Mateo and I went and met him for lunch.
And basically to say, maybe we can actually... Part of the strategy for NumPy was to get it
into Python 3 and maybe be part of Python. So we talked about that and about that approach.
I would have loved to be a fly in the water. That was good. And over the years for Guido,
I learned... So he was open. He was willing to listen to people's ideas and over the years.
Now, generally, I'm not saying universally that's been true, but generally that's been true.
So he's willing to listen. He's willing to defer. On the scientific side, he would just
kind of defer. He didn't really always understand what we were doing and he'd defer.
One place where he didn't enough was we missed a matrix multiply operator.
Like that finally got added to Python, but about 10 years later than it should have.
But the reason was because nobody... It takes a lot of effort. And I learned this while I was
writing NumPy. I also wrote tools to... I became a Python dev and I added some pieces to Python,
like the memory view object. I wanted the structure of NumPy into Python. So we didn't get
NumPy into Python, but we got the basic structure of it into Python so you could build on it.
Nobody did for a while, but eventually, database authors started to.
And it's a lot better they did. And also, Antoine Petrot and Stefan Krau actually fixed the memory
view object because I wrote the underlying infrastructure in C, but the Python exposure
was terrible until they came in and fixed it, partly because I was writing NumPy.
And NumPy was the Python exposure. I didn't really care about if you didn't have NumPy installed.
Anyway, Guido opened up ideas, technologically brilliant. I really got a lot of respect from
when I saw what he did with this type class merger thing. It was actually tricky, right?
And then willing to share. Willing to share his ideas. So the other thing, early on in 1998,
I said I start wrote my first extension module. The reason I could is because he'd wrote this
blog post on how to do reference counting, right? And without it, I would have been lost, right?
But he was willing to at least try to write this post. And so he's been motivated, early on with
Python, there's a computer science for everybody, kind of have this early on desire to, oh, maybe
we should be pushing programming to more people. So he had this populist notion, I guess, or populist
sense. So learn that there's a certain skill, and I've seen it in other people too, of engaging
with contributors sufficiently to, because when somebody engages with you and wants to contribute
to you, if you ignore them, they go away. So building that early contributor base requires
real engagement with other people. And he would do that. Can you also comment on this tragic
stepping down from his position as the benevolent dictator for life over the war's,
you know, the walrus operator, the walrus operator was the last battle. I don't know if
that's the cause of it, but there's this for people who don't know, you can look up, there's
the walrus operator, which is looks like a colon and equal sign. Yeah, colon, equal sign. And
it actually does maybe the thing that you, that an equal sign should be doing. Yeah, maybe, right,
exactly. Yeah. But it's just historically, equal sign means something else. It just means assignment.
So he stepped down over this. What do you think about the pressure of leadership?
It's someone that you mentioned the letter I wrote in umpire at the time. That was a hard time,
actually. I mean, you know, there's been really hard times. It was hard. You get criticized,
right? And you get pushed and you get, not everybody loves what you do. Like anytime you
do anything that has impact at all, you're not universally loved, right? You get some real critics.
And that's an important energy because it's impossible for you to do everything right. You
need people to be pushing. But sometimes people can get mean. People can, I prefer to get people
to benefit the doubt. I don't immediately assume they have bad intentions. And maybe for other,
you know, maybe other, maybe that doesn't happen for everybody. They, for whatever reason,
their past, their experience of people, they, they sometimes have bad, they, so they immediately
attribute to you bad intentions. They're like, what are this come from? I mean, I definitely
open to criticism, but I think you're misinterpreting the whole point. Because I would get that,
you know, sort of when I started Anaconda, you know, I've been, sometimes I say to people,
I know I'm, I care enough about entrepreneurship to make some open source people uncomfortable.
And I care enough about open source to make investors uncomfortable. So I sort of, you know,
create, you create kind of doubters on both sides.
So when you have, and this is just a plea to the listener and the public, I've noticed this too
that there's a tendency in social media makes this worse. When you don't have perfect information
about the situation, you tend to fill the gaps with the worst possible, or at least a bad
story that fills those gaps. And I think it's good to live life, maybe not fully naively,
but filling in the gaps with the, with the, with the good, with the best, with the positive,
with the, with the hopeful explanation of why you see this. So if you see somebody like you
trying to make money on a book about NumPy, there's a million stories around that that are positive.
And those are good to think about, to project positive intent on other people. Because
for many reasons, usually because people are good and they do have good intent.
And also when you project that positive intent, people will step up to that too.
Yes. So like it's, it has a great point. It has this kind of viral nature to it. And of course,
what Twitter early on figured out on Facebook is that they can make a lot of money and engagement
from the negative. Yes. So like there's this, we're fighting this mechanism, which is challenging.
It's like easier. It's just easier to be, to be negative. And then for some reason,
something in our mind really enjoys sharing that and getting, getting all excited about the negativity.
We do. Yeah. Some protective mechanism perhaps that we're, we're going to eat and if we don't.
Exactly. For us to be effective as a group of people in a software engineering project,
you have to project positive intent, I think. I totally agree. Totally agree. And I think that's
very, and so that happens in this, in the space, but Python has done a reasonable job in the past.
But here is a situation where I think it, it started to get this pressure where it didn't.
I was, I really didn't, I didn't know enough about what happened. I've, you know,
talked to several people about it. And I know most of the steering committee members today
one, one person nominated me for that role, but it's the wrong role for me right now, right?
I have a lot of respect for the Python developer space and the Python developers.
I also understand the gap between computer science Python developers
and array programming developers or science developers. And in fact,
Python succeeds in the array space, the more it has people in that boundary. And there's often
very few, like I was playing a role in that boundary and, you know, working like everything
to try to keep up with the, with the, what, even what Gita was saying, like I'm a C programmer,
but not a computer scientist. Like I was a engineer and physicist and mathematician.
And I don't, I didn't always understand what they were talking about
and why they would have opinions the way they did. So, you know, you have to listen
and try to understand, then you also have to explain your point of view in a way they can
understand. And that takes a lot of work. And that, that communication is always the challenge.
And it's just what we're describing here about the negativity is just another form of that.
How do we come together? And it does appear we're wired anyway to at least have a,
there's a part of us that will enemy, you know, friend, enemy. And, and we see, yeah, it's like,
why are we wiring on the enemy front? Yeah. So, so why are we pushing that? Why are we
promoting that so deeply? Assume friend until proven otherwise. Yes. Yes. So, because you have
such a fascinating mind and all this, let me just ask you these questions. So one interesting side
on the Python history is the move from Python two to Python three. You mentioned move from
Python one to Python two, but the move from Python two to Python three is a little bit
interesting because it took a very long time. It broke, you know, quite a small way,
backward compatibility, but even that small way seemed to have been very painful for people.
Is there a lessons you draw from how long it took and how painful it seemed to be?
Yeah, tons of lessons. Well, I mentioned here earlier that NumPy was written in 2005. It was
in 2005 that I actually went to Guido to talk about getting NumPy into Python three. Like my
strategy was to, oh, we were moving to Python three, let's have that be. And it seems funny
in retrospect because like, wait, Python three, that was in 2020, right, when we finally ended
support for Python two or at least 2017. The reason it took a long time, a lot of time,
I think it was because one of the things is there wasn't much to like about Python three.
3.0, 3.1, it really wasn't until 3.3, like I consider Python 3.3 to be Python 3.0.
It wasn't until Python 3.3 that I felt there was enough stuff in it to make it worth anybody
using it, right? And then 3.4 started to be, oh, yeah, I want that. And then 3.5 as the
matrix-move-play operator, and now it's like, okay, we got to use that. Plus the libraries
that started leveraging some of the features of Python three. Exactly. So it really, the challenge
was, it was, but it also illustrated a truism that, you know, it's, when you have inertia,
when you have a group of people using something, it's really hard to move them away from it.
You can't just change the world on them. And Python three, you know, made some, I think it
fixed some things Guido had always hated. I don't think he didn't like the fact that print was a
statement. He wanted to make it a function. But in some sense, that's a bit of gratuitous change
to the language. And you could argue, and there's people have, but there was one of the challenges
was there wasn't enough features and too many just changes without features. And so the empathy for
the end user as to why they would switch wasn't there. I think also it illustrated just the
funding realities. Like Python wasn't funded. Like it was also a project with a bunch of volunteer
labor, right? It had more people, so more volunteer labor, but it was still, it was fun to the sense
that at least Guido had a job. And I've learned some of the behind the scenes on that now since
talking to people who live through it. And maybe not on air, we can talk about some. But it's
interesting to see, but Guido had a job, but he, but his full-time job wasn't just work on Python.
Like he had other things to do. It is wild. It is wild, isn't it? It's wild how few people are
funded. Yes. And how much impact they have. Yes. Maybe that's a feature in our bug. I don't know.
Maybe, yes, exactly. At least early on. Like it's sort of, I know. Yeah. It's like Olympic athletes
are often severely underfunded, but maybe that's what brings out the greatness.
Perhaps. Yes, correct. No, exactly. Maybe this is an essential part of it. Because I do think
about that in terms of I currently have an incubator for open source startups. Like what I'm
trying to do right now is create the environment I wish had existed when I was leaving academia with
NumPy and trying to figure out what to do. I'm trying to create those opportunities and environments.
So, and that's what drives me still, is how do I make the world easier for the open source
entrepreneur? So, let me stay, I mean, I could probably stay in NumPy for a long time, but
this is fun question. So, Andre Capati leads the Tesla autopilot team. And he's also one of the
most like legit programmers I know. It's like he builds stuff from scratch a lot. And that's how
he builds intuition about how a problem works. He's built it from scratch and I always love that.
And the primary language he uses is Python for the intuition building. But he posted something
on Twitter saying that they got a significant improvement on some aspect of their like data
loading, I think, by switching away from np.square root. So, the NumPy's implementation of square
root to math that square root. And then somebody else commented that you can, you can get even
a much greater improvement by using the vanilla Python square root, which is like
Power 0.5.
Power 0.5. And it's fascinating to me. I just wanted to, so that that was some shade throwing at some
No, no. And, but also, we're talking about it's a good way to ask the trade off between usability
and efficiency, broadly in NumPy, but also in these like specific weird quirks of like a single
function.
Yep. So, on that point, if you use a NumPy math function on a scalar, it's going to be slower
than using a Python function on that scalar. Because the math object in NumPy is more complicated,
right? Because you can also call that math object on an array. And so effectively it goes
through a similar machine. There aren't enough of the, which you would do in a, in, you could do
like checks and fast paths. So, yeah, if you're basically doing a list, if you run over a list,
in fact, for problems that are less than a thousand, even maybe 10,000 is probably the,
if you're going more than 10,000, that's where you definitely need to be using arrays. But if
you're less than that, and for reading, if you're doing a reading process and essentially it's not
compute bound, it's IO bound. And so you're, you're really taking lists of thousands of time
and then doing work on it. Yeah, you could be faster just using Python, straight up Python.
See, but also, and then this is the, sorry to interrupt, but there's the fundamental questions
when you look at the long arc of history. It's very possible that NP, that square root, is much
faster. It could be. So like in terms of like, don't worry about it, it's the, the evils of
over optimization or whatever, all the different quotes around that is, is sometimes obsessing
about this particular little quirk is not, is not sufficient. For somebody like, if you're,
if you're trying to optimize your path, I mean, I agree, premature optimization creates all kinds
of challenges, right? Because now, but you may have to do it. I believe the quote is, it's the
root of all. It's root of all evils, right? Let's give Don Knuth, I think, or somebody else.
He's the, well, Doc Knuth is kind of like Mark Twain. He was just a tribute.
And it's fine because he's brilliant. So no, I was a La Tech user myself. And so I have a lot
of respect and he did more than that, of course. But yeah, someone I really appreciate in the
computer science space. Yeah, I don't, I think that's appropriate. There's a lot of little
things like that where people actually, if you understood it, you go, yeah, of course, that's
the case. Like, and the other part, and the other part I didn't mention, and Numba was a thing we
wrote early on, and I was really excited by Numba because it's something we wanted, it was a compiler
for Python syntax. And I wanted it from the beginning of writing NumPy because of this function
question, like taking the power of arrays is really that you can write functions using all of it.
It has implicit looping, right? So you don't worry about, I write this n dimensional four loop with,
you know, four loops for four statements, you just say, oh, big four dimensional array,
I'm going to do this operation, this plus, this minus, this reduction. And you get this, it's
called vectorization in other areas, but you can basically think at a high level and get massive
amounts of computation done with the added benefit of, oh, it can be paralyzed easily, it can be put
in parallel, you don't have to think about that. In fact, it's worse to go decompose your, you
write the four loops and then try to infer parallelism from four loops. That's actually
harder problem than to take the array problem and just automatically parallelize that problem.
That's what, and so functions in NumPy are called universal functions, u-funcs. So square root is
an example of a u-func. There are others, sine, cosine, add, subtract. In fact, one of those
first libraries to scipy was something called special where I added Bessel functions and
like all these special functions that come up in physics and I added them as u-funcs so they
could work on arrays. So I understood u-funcs very, very well from day one inside of numeric.
That was one of the things we tried to make better in NumPy was how do they work? Can they do
broadcasting? What does broadcasting mean? But one of the problems is, okay, what do I do with a
Python scaler? So what happens? The Python scaler gets broadcast to a zero-dimensional array and then
it goes through the whole same machinery as if it were a 10,000-dimensional array and then it kind
of unpacks the element and then does the addition. That's not to mention the function it calls in
the case of square root is just the C-lib square root. In some cases, like Python's power,
there's some optimizations they're doing that can be faster than just calling the C-lib square root.
In the interpreter or in the... No, in the C code, in the Python runtime.
In the Python. So they really optimize it and they have the freedom to do that because they don't
have to worry about... It's just a scaler. It's just a scaler. Right? They don't have to worry about
the fact that, oh, this could be an object with many pieces. The u-func machine is also generic
in sense that typecasting and broadcasting. Broadcasting's idea of I'm going to go,
I have a zero-dimensional array, I have a scaler with a four-dimensional array and I add them.
Oh, I have to kind of coerce the shape of this guy to make it work against the whole four-dimensional
array. So it's the idea of I can do a one-dimensional array against a two-dimensional array and have
it make sense. Well, that's what NumPy does is it challenges you to reformulate, rethink your
problem as a multi-dimensional array problem versus like move away from scalers completely.
Right. Exactly. Exactly. In fact, that's where some of the edge cases boundaries are is that,
well, they're still there and this is where array scalers are particular. So array scalers are
particularly bad in the sense that they were written so that you could optimize the math on them,
but that hasn't happened. And so their default is to coerce the array scaler to a zero-dimensional
array and then use the NumPy machinery. That's what you could specialize, but it doesn't happen
all the time. So in fact, when we first wrote NumPy, we do comparisons and say, look, it gets a
thousand x speed up. We're lying a little bit in the sense that, well, first do the 40x slowdown
of using array scalers inside of a loop because if you just use Python scalers, you'd already be
10 times faster. But then we would get 100 times faster over that using just compilation.
And what we do is compile the loop from out of the interpreter to machine code.
And then that's always been the power of Python is this extensibility so you can,
because people say, oh, Python's so slow. Well, sure, if you do all your logic in the runtime
of the Python interpreter, yeah. But the power is that you don't have to. You write all the logic
which you do at the high level is just high-level logic. And the actual calls you're making could
be on gigabyte arrays of data. And that's all done at compile speeds. And the fact that integration
is one can happen, but two is separable. That's one of the, the language like Julia says,
we're going to be all in one. You can do all of it together. And then there's, the juries out is
that possible. I tend to think that you're going to, there's separate concerns there. You want to
pre-compile. In fact, generally, you will want to pre-compile your, some of your loops, like
sci-pi is a compilation step to install sci-pi. It takes about two hours. If you have many
machines, maybe you can get it down to one hour. But to compile those libraries takes about,
takes a while. You don't want to do that at runtime. You don't want to do that all the time. You
want to have this pre-compiled binary available that you're then just linking into. So there's
real questions about the whole, you know, source code, code is running binary code is more than
source code. It's created an object code. It's the linker. It's the loader. It's the, how does
that interpret it inside of the virtual memory space? There's a lot of details there that actually,
I didn't understand for a long time until I read books on the topic. And it led to,
the more you know, the better off you are. And you can do more details, but
sometimes it helps with abstractions too. Well, the problem, as we mentioned earlier with
abstractions is you kind of sometimes assume that whoever implemented this thing had your
case in mind and found the optimal solution. Yes. Or like you assume certain things. I mean,
there's a lot of, one of the really powerful things to me early on, I mean, it sounds silly
to say, but with Python, probably one of the reasons I fell in love with it is dictionaries.
Yes. So obviously probably most languages have some mapping concept, some mapping concept,
but it felt like it was a first class citizen. And it was just my brain was able to think in
dictionaries. But then there is the thing that I guess I still use to this day is order dictionaries
because that seems like a more natural way to construct dictionaries. And from a computer
science perspective, the running time cost is not that significant. But there's a lot of
things to understand about dictionaries that the abstraction doesn't necessarily incentivize you
to understand. Right. Do you really understand the notion of a hash map and how that dictionary
is implemented? But you're right. Dictionaries are a good example of an abstraction that's
powerful. And I agree with you. I love dictionaries too. It took me a while to understand that
once you do, you realize, oh, they're everywhere. And Python uses them everywhere too. It's actually
constructed that one of the foundational things is dictionaries and it does everything with dictionaries.
So it is, it's powerful. Order dictionaries came later, but it is very, very powerful.
It took me a little while coming from just the array programming entirely to understand these
other objects like dictionaries and lists and tuples and binary trees. Like I said,
I wasn't a computer scientist. I studied arrays first. And so I was very array centric. And
you realize, oh, these others don't have purposes and value actually. I agree.
There's a friendliness about like one way to think about arrays is arrays are just
like full of numbers. But to make them accessible to humans and make them less air prone to human
users, sometimes you want to attach names, human interpretable names that are sticky to those
arrays. So that's how you start to think about dictionaries. You start to convert numbers into
something that's human interpretable. And that's actually the tension I've had with NumPy because
I've built so much tooling around human interpretability and also protecting me from a year
later not making the mistakes by being, I wanted to force myself to use English versus numbers.
Yes. So there's a project called label arrays. Like very early, it was recognized that, oh,
we're indexing NumPy with just numbers, all the columns, and particularly the dimensions. I mean,
if you have an image, you don't necessarily need to label each column a row. But if you have a lot
of images, or you have another dimension, you'd at least like to label the dimension as this is x,
this is y, this is z, or this is give us some human meaning or some domainistic meaning.
That was one of the impetuses for pandas, actually, was just, oh, we do need to label
these things. And label array was an attempt to add that lighter weight version of that. And there's
been, like that's an example of something I think NumPy could add, could be added to NumPy.
But one of the challenges again, how do you fund this? Like I said, one of the tragedies I think
is that, so I never had the chance to, I was never paid to work on NumPy. So I've always just done
it my spare time, always taken from one thing, taken from another thing to do it. And at the time,
I mean, today, it would be the wrong day of today, like paying me to work on NumPy now would not be
a good use of effort. But, but we are finally at Quonsite Labs, I'm actually paying people to work
on NumPy and SciPy, which is I'm thrilled with, I'm excited by. I've wanted to do that. That's why
I wanted to do it from day one. It just took me a while to figure out a mechanism to do that.
Even like in the university setting, respecting that, like pushing students, young minds, the
young graduate students to contribute, and then figuring out financial mechanisms that enable
them to contribute. And then sort of reward them for their innovative scientific journey, that would
be nice. But then also, there's just a better allocation of resources. You know, it's 20 year
anniversary since 9-11. And I was just looking, we spent over $6 trillion in the Middle East
after 9-11 in the various efforts there. And sort of to put politics and all that aside is just,
you think about the education system, all the other ways we could have possibly allocated that
money. To me, to take it back, the amount of impact you would have by allocating a little bit of
money to the programmers that build the tools that run the world is fascinating. I don't know.
I think, again, there is some aspect to being broke as somewhat of a feature, not a bug, that
you make sure that you're valued. Well, you can still manage that.
Right. No, I know. But I don't think that's a big part. So it's like, I think you can have enough
money and actually be wealthy while maintaining your values.
Agreed. Agreed. There's an old adage that nations that trade together don't go to war
together. I've often thought about nations that code together.
Yeah, code together. Right. Because one of the things I love about open source is global. It's
multinational. There aren't national boundaries. One of the challenges with business and open
source is the fact that, well, business is national. Businesses are entities that are
recognized in legal jurisdictions and have laws that are respected in those jurisdictions and
hiring. And yet, the open source ecosystem is not there. Currently, one of the problems we're
solving is hiring people all over the world because it's a global effort. And I've had the
chance to work and I've loved the chance. I've never been to Iran, but I once had a
conference where I was able to talk to people there and talk to folks in Pakistan. I've never
been there, but we had a call where there were people there, like just scientists and normal
people. And there's a certain amount of humanizing that gets away from the... We often get the memes
of society that bubble up and get discussed, but the memes are not even an accurate reflection
of the reality of what people are. Well, if you look at the major power centers that are
leading to something like cyber war in the next few decades, it's the United States,
it's Russia, and China. And those three countries in particular have incredible developers.
So if they work together, I think that's one way the politicians can do their stupid bickering,
but there's a layer of infrastructure, of humanity, if they collaborate together that I
think can prevent major conflict, which would, I think, most likely happen at the cyber level
versus the actual hot war level. You're right. I think that's good prediction.
Nations that code together don't go together. That's a hope, right? That's one of the
philosophical hopes, but yeah. So you mentioned the project of Numba, which is
fascinating. So from the early days, there was kind of a pushback on Python that it's not
fast. If you want to write something that's fast, you use CC++. If you want to write something
that's usable and friendly, but slow, you use Python. And so what is Numba? What is its goal?
How does it work? Great. Yeah. Yes, that's what they argument. And the reality was people would
write a high-level code and use compiled code, but there's still a user story, use cases, where
you want to write Python, but then have it still be fast. You still need to write a for loop.
Like before Numba, it was always don't write a for loop. Write it in a vectorized way,
you put it in an array. And often that can make a memory trade off. Quite often you can do it,
but then you may use more memory because you have to build this array of data that you don't
necessarily need all the time. So Numba was, it started from a desire to have kind of a vectorized
that worked. A vectorized was a tool in NumPy. It was released, you give it a Python function,
and it gave you a universal function, a Ufunk that would work on arrays. So you get a function
that just worked on a scalar. Like you could make a, like the classic case was a simple function
that an if-then statement in it. So sine x over x function, sync function. If x equals 0, return
1, otherwise do sine x over x. The challenge is, you don't want that loop going in Python,
so you want a compiled version of that. But the vectorized in NumPy would just give you a Python
function. So it would take the array of numbers and at every call do a loop back into Python.
So it was very slow. It gave you the appearance of a Ufunk, but it was very slow. So I always wanted
a vectorized that would take that Python scalar function and produce a Ufunk working on binary,
native code. So in fact, I had somebody work on that with PyPy and see if PyPy could be used to
produce a Ufunk like that early on in 2009 or something like that, 2010. They didn't work
that well. It was kind of pretty bulky. But in 2012, Peter and I had just started Anaconda.
We had, I had just, I'd learned to raise money. That's a different topic, but I'd learned to,
you know, raise money from friends, family and fools, as they say. Oh, that's a good line.
But so we're trying to do something. We were trying to change the world. Peter and I are
super ambitious. We wanted to make array computing and we had ideas for really what's still the
energy right now. How do you do at scale data science? We had a bunch of ideas there, but one
of them, I had just talked to people by LLVM and I was like, there's a way to do this. I just,
I went, I heard about my friend Dave Beasley at a compiler course. So I was looking at compilers
like, and I realized, oh, this is what you do. And so I wrote a version of Numba that just basically
mapped Python bytecode to LLVM. Nice. Right. So, and the first version is like, this works,
and it produces code that's fast. This is cool for, you know, obviously a reduced subset of Python.
I didn't support all the Python language. There had been efforts to speed up Python in the past,
but those efforts were, I would say, not from the array computing perspective, not from the
perspective of wanting to produce a vectorize improvement. They were from a perspective of
speeding up the runtime of Python, which is fundamentally hard because Python allows for
some constructs that aren't, you can't speed up. Like it's, it's generic, you know, when it does
this variable. So I, from the start, did not try to replicate Python's semantics entirely. I said,
I'm going to take a subset of the Python syntax and let people write syntax in Python, but
it's kind of a new language, really. So it's almost like four loops, like focusing on four
loops, scalar arithmetic, you know, typed, you know, really typed language, a type subset.
That was the key. So, but we wanted to add inference of types. So you didn't have to spell
all the types out because when you call a function, so Python is typed, it's just dynamically typed.
You don't tell it what the types are, but when it runs, every time an object runs,
there's a type for the variables. You know what it is. And so that was the design goals of Numba
were to make it possible to write functions that could be compiled and, and have them used
for NumPyRays, like the need to support NumPyRays.
And so how does it work? Do you add a comment within Python that tells to do, like, how do you
help out a compiler? Yeah, so there isn't much, actually. You don't, it's kind of magical in
the sense that just looks at the type of the objects and then as type inference to determine
any other variables it needs. And then it was also because we had a use case that, that could work
early, like one of the challenges of any kind of new, new, new development is if you have something
that to make it work, it was going to take you a long time. It's really hard to get out off the
ground. If you have a project where there's some incremental story, it can start working today
and solve a problem. Then you can start getting it out there, getting feedback, because Numba
today, now Numba is nine years old today, right? The first two, three versions were not great,
right? But they solved a problem and some people could try it and we could get some feedback on
it. Not great and that it was very focused. Very fragile, very, the subset, the subset
it would actually compile was small. And so if you wrote Python code and said,
to the way, the way it worked is you write a function and you say at JIT, use decorators.
So decorators, just these little constructs, let you decorate code with an app and then the name.
And the app JIT would take your Python function and actually just compile it and replace the Python
function with another function that interacts with this compile function. And it would just
do that. And it would, you know, we went from Python byte code, we then we went to AST. I mean,
writing compiler is actually, I learned a lot about why computer science is taught the way it
is because compilers can be hard to write. They use tree structures. They use all the
concepts of computer science that are needed. And it's actually hard to,
you can, it's easy to write a compiler and then have it be spaghetti code.
Like the passes become challenging. And we ended up with three versions of Numba,
right? Numba got written three times. What's, what programming language is number written in?
Python. Wait, okay. Yeah, Python. So. Really? That's fascinating. Yeah. So Python,
but then the whole goal of Numba is to translate Python byte code to LLVM. And so LLVM actually
does the code generation. In fact, a lot of times they'd say, yeah, it's super easy to write a
compiler if you're not writing the parser, nor the code generator, right? So for people who don't
know LLVM is a compiler itself. So you're compiling. Yeah, it's really badly named low level virtual
machine, which that part of it is not used. It's really low level. He doesn't mean that.
Yeah. Love, Chris. But the name makes you imply that the virtual machine is what it's all about.
It's actually the IR and the library that the code generation, that's the real beauty of it.
The fact that what I love about LLVM was the fact that it was a plateau you could collaborate on,
right? Instead of the internals of GCC or the internals of the Intel compiler, like,
how do I extend that? And it was a place we could collaborate. And we were early. I mean,
people had started before. It's a slow compiler. Like it's not a fast compiler. So for some kind
of jits, like jits are common in the language because one, every browser has a JavaScript
JIT. It does real time compilation of the JavaScript to machine code.
For people who don't know, JIT is just in time compilation.
Thank you. Yeah, just in time compilation. They're actually really sophisticated. In fact,
I got jealous of how much effort was put into the JavaScript JITs.
Yes. Well, it's kind of incredible what they've done with JavaScript JITs.
I completely agree. I'm very impressed. But, you know, number was an effort to make that
happen with Python. And so we used some of the money we raised from Anaconda to do it.
And then we also applied for this DARPA grant and used some of that money to continue the
development. And then we used proceeds from service projects we would do. We get consulting
projects that we would then use some of the profits to invest in Numba. So we ended up with
a team of two or three people working on Numba. It was a fits and starts, right? And ultimately,
the fact that we had a commercial version of it, also we were writing. So part of the way I was
trying to fund Numba is say, well, let's do the free Numba. And then we'll have a commercial
version of Numba called Numba Pro. And what Numba Pro did is it targeted GPUs. So we had the very
first CUDA JIT and the very first at JIT compiler that in 2013, you could run not just a Viewfunk
on CPU, but a Viewfunk on GPUs. And it would automatically paralyze it and get 1000x speed.
And that's an interesting funding mechanism because, you know, large companies or larger
companies care about speed in just this way. So it's exactly a really good way.
Yeah, there's been a couple of things you know people will pay for. One, they'll pay for really
good user interfaces, right? And so I'm always looking for what are the things people will
pay for that you could actually adapt to the open source infrastructure. One is definitely
user interfaces. The second is speed, like a better run time, faster run time.
And then when you say people, you mean like a small number of people pay a lot of money,
but then there's also this other mechanism that a ton of people pay a little bit. First,
we mentioned Anaconda, we mentioned friends, family and fools. So Anaconda is yet another.
So there's a company, but there's also a project that is exceptionally impactful
in terms of for many reasons, but one of which is bringing a lot more people into the
community of folks who use Python. So what is Anaconda? What is its goals? Maybe what is
Konda versus Anaconda? Yeah, I'll tell you a little bit of the history of that because Anaconda,
we wanted to scale Python because Peter and I had the goal of when we started Anaconda. We
actually started as Continuum Analytics, was the name of the company that started. It got renamed
Anaconda in 2015, but we said we want to scale analytics. NumPy is great, Pan is emerging,
but these need to run at scale with lots of machines. The other thing we wanted to do was
make user interfaces that were web. We wanted to make sure the web did not pass by the Python
community, that we had a ways to translate your data science to the web. So those are the two
kind of technical areas and we thought, oh, we'll build products in this space. And that was the
idea. Very quickly in, but of course the thing I knew how to do was to do consulting to make money
and to make sure my family and friends and the whole city invested didn't lose their money.
So it's a little different than if you take money from a venture fund. If you take money from a
venture fund, the venture fund, they want you to go big or go home. They're kind of like expecting
9 out of 10 to fail or 99 out of 100 to fail. It's different. I was out of a barbell strategy.
I was like, I can't fail. I mean, I may not do super well, but I cannot lose their money. So I'm
going to do something I know can return a profit, but I want to have exposure to an upside. So that's
what happened in Anaconda. There was lots of things we did not well in terms of that structure,
and I've learned from since to have it better. But we did a really good job of kind of attracting
the interest around the area to get good people working and then get funneled some money on
some interesting projects. Super excited about what came out of our energy there. Like a lot did.
So what are some of the interesting projects? So DASC, Numba, Bokeh, Kanda, there was a data
shader, panel, HoloViz. These are all tools that are extremely relevant in terms of helping you
build applications, build tools, build faster code. There's a couple I'm beginning.
Oh, JupyterLab. JupyterLab came out of this too. Fascinating, actually. Okay, so Bokeh does
plotting. Bokeh does plotting. So Bokeh was one of the foundational things to say, I want to do
plot in Python, but have the things show up in a web. Right. That's right. That's right. That's
right. And plotting to me still, with all due respect to Matplotlib and Bokeh, it feels like
still an unsolved problem. It is. It's a big problem. Right. Because you're, I mean, I don't
know, it's visualization broadly. Yes. I think we've got a pretty good API story around certain
use cases of plotting. But there's a difference between static plots versus interactive plots,
versus I'm an end user. I just want to write a simple, for, you know, pandas started the idea
of here's a data frame on a dot plot. I'm just going to attach plot as a method to my object,
which was a little bit controversial, right? But works pretty well, actually, because
there's a lot less you have to pass in, right? You can just say, here's my object, you know what
you are, you tell the visualization what to do. So that, and there's things like that that have
not been, you know, super well developed entirely, but Bokeh was focused on interactive plotting.
So you could, it's a short path between interactive plotting and application dashboard
application. And there's some incredible work that got done there, right? And it was a hard
project because then you're basically doing JavaScript and Python. So we wanted to tackle
some of these hard problems and try to just go after them. We got some DARPA funding to help,
and it was super helpful. Funny story there, we actually did two DARPA proposals, but one,
we were five minutes late for. And DARPA has a very strict cutoff window. And so I, we had two
proposals, one for the Bokeh and one for actually Numba and the other work. Which one were you late
for? The foundational numerical work. So Bokeh got funded. Fortunately, Chris let us use some of
the money to fund still some of the other foundational work, but it wasn't as, yeah,
his hands were tied. He couldn't do anything about it. That was a whole interesting story.
So one of the incredible projects that you worked on is Conda. Yes. So what is Conda?
Yeah, Conda, it was early on, like I said, with SciPy. SciPy was a distribution,
masquerading in the library. And you said, you heard me talking about compiler issues and
trying to get the stuff shipped and the fact that people can use your libraries if they have it.
So for a long time, we'd understood the packaging problem in Python. And one of the first things
you did at Anaconda, Continuum Analytics became Anaconda was organize the PyData ecosystem in
conjunction with NumFocus. We actually started NumFocus with some other folks in the community
the same year we started Anaconda. I said, we're going to build a corporation, but we also got to
reify the community aspect and build a nonprofit. So we did both of those.
Can we pause real quick and can you say what is PyPy, the Python package index,
like this whole story of packaging in Python? Yeah, that's what I'm going to get to,
actually. This is exactly the journey I'm on. It's to sort of explain packaging in Python.
I think it's best expressed to the conversation I had with Guido at a conference where I said,
so, you know, packaging is kind of a problem. And Guido said, I don't ever care about packaging.
I don't use it. I don't install new libraries. I'm like, I guess if you're the language creator
and if you need something, you just put it in the distribution. Maybe you don't worry about
packaging. But Guido has never really cared about packaging, right? And never really cared about
the problem of distribution. Somebody else's problem. And that's a fair position to take,
I think, as a language creator. In fact, there's a philosophical question about,
should you have different development packaging managers? Should you have a package manager
per language? Is that really the right approach? I think there are some answers of,
it is appropriate to have development tools. And there's an aspect of development tool that is
related to packaging. And every language should have some story there to help their developers
create. So you should have language specific development tools that relate to package managers.
But then there's a very specific user story around package management that those language
specific package managers have to interact with and currently aren't doing a good job of that.
That was one of the challenges that did not see that difference and still exists in the
difference today. Kanda always was a user. I'm going to use Python to do data science. I'm going
to use Python to do something. How do I get this installed? It was always focused on that.
So it didn't have a develop. Classic example is PIP has a PIP develop. It's like, I want to install
this into my current development environment today. Now, Kanda doesn't have that concept because
it's not part of the story. For people who don't know, PIP is a Python specific
packaging package manager. That's exceptionally popular. That's probably like the default thing
you learn. It's the default user. So the story there emerged because what happened is in 2012,
we had this meeting at the Google Googleplex and Guido was there to come talk about what we're
going to do. How are we going to make things work better? And Wes McKinney, me, Peter, Peter has a
great photo of me talking to Guido and he pretends we're talking about this story. Maybe we were,
maybe we were, but we didn't at that meeting talk about it and ask Guido, we need to fix packaging
in Python. People can't get this stuff. And he said, go fix it yourself. I don't think we're
going to do it. All right. The origin story right there. All right. You said, okay, you said to do
this ourselves. So at the same time, people did start to work on the packaging story in Python.
It just took a little longer. So in 2012, kind of motivated by our training courses we were
teaching, very similar to what you just mentioned about your mother. It was motivated by the same
purpose. Like, how do we get this into people's hands? And it's this big long process. It takes
too expensive. It was actually hurting NumPy development because I would hear people were
saying, don't make that change to NumPy because I just spent a week getting my Python environment.
And if you change, if you change NumPy after reinstall everything and reinstalling such a
pain, don't do it. I'm like, wait, okay, so now we're not making changes to a library because
of the installation problem that will cause for end users. Okay, there's a problem with pack,
there's a problem with installation, we got to fix this. So we, we said, we're going to make a
distribution of Python. And we'd previously done that, previously done that at end thought.
I wanted to make one that would give away for free that one could just get.
Like, that was critical that we just get it, you know, it wasn't tied to a product, it was just
you could get it. And then we had constantly thought about, well, do we just leverage RPM?
Do we, but the challenge has always been we want a package manager that works on Windows,
macOS 10 and Linux the same. Right. And it wasn't there. Like, you don't have anything like that.
You have for people who don't know RPM is operating system specific package.
Correct. It's an operating specific. Yes. So do you create the design that
question is, do you create an umbrella package manager that cross operating system?
Yes. That was the decision. And a neighboring design question is, do you also create a package
manager that spans multiple programming languages? Exactly. That was the world we faced. And we
decided to go multiple operating systems, multiple and programming language independent,
because even Python and particularly what was important was sci-pi has a bunch of 4chan in it.
Right. And scikit-learn has links to a bunch of C++. There's a lot of compiled code.
And the Python package manager, especially early on, didn't even support that. So in 2000, so we
released Anaconda, which was just a distribution of libraries, but we started to work on conda in
2012. First version of conda came out in early 2013, summer of 2013. And it was a package manager.
So you could say cond install scikit-learn. In fact, that was the scikit-learn was a fantastic
project that emerged. Kind of, it was the classic example of the sci-kits. I talked to
me earlier about sci-pi being too big to be a single library. Well, what the community had
done is said, let's make sci-kits. And there's scikit image, there's scikit-learn, there's a
lot of sci-kits. And it was a fantastic move that the community did. I didn't do it. I was like,
okay, that's good idea. I didn't like the name. I didn't like the fact you type scikit image. I
was like, that's going to be simpler. SKlearn, we got to make that smaller. I like typing all this
stuff from imports. So I was kind of a pressure that way. But I love the energy. I love the fact
that they went out and they did it. And lots of people, Jared Millman, and then of course, Gael,
and there's people I'm not even naming, that scikit-learn really emerged as a fantastic project.
And the documentation around that is also incredible. And the fact is it was incredible.
Exactly. I don't know who did that, but they did a great job. A lot of people in Inria, a lot of
people, a lot of European contributors. Andreas, there's some Andreas in the US. There's a lot of
just people I just adore. I think are amazing people. Awesome use of sci-pi. I love the fact
that they were using sci-pi. Effectively, it's something I love, which is machine learning.
But couldn't install it. Because there's so many, he's just involved. So many dependencies.
Right? Yeah. So our use case of condo was cond install scikit-learn.
Right. And it was the best way to install scikit-learn in 2013 to really 2018, 17, 18. Pip finally
caught up. I still think you should cond install scikit-learn for the pip install scikit-learn,
but you can pip install scikit-learn. The issue is the package they created was wheels. And
pip does not handle the multi-vendor approach. They don't handle the fact you have C++ libraries
you're depending on. They just stop at the Python boundary. And so what you have to do in the wheel
world is you have to vendor. You have to take all of the binary and vendor it. Now, if your change
happens in underlying dependency, you have to redo the whole wheel. So TensorFlow is a good
example, but you should not pip install TensorFlow. It's a terrible idea. People do it because the
popularity of pip, many people think, oh, of course, that's how I install everything in Python.
This is one of the big challenges. You take a GitHub repository or just a
basic blog post, the number of time pip is mentioned over conda is like 100x to one.
Correct. Correct. And that was increasing. It wasn't true earlier because pip didn't exist.
Like conda came first. But that's like the long tail of the internet documentation user generated.
So you think, how do I install TensorFlow? You're just not going to see conda in that first page.
It's not correct, exactly. Not today. You would have in 2016, 2017.
And it's sad because conda solves a lot of usability issues.
Correct. Especially super challenging thing. I don't know. One of the big pain points for me
was just on the computer vision side, OpenCV installation. Perfect example.
I don't know if conda solved that one. Pond has an OpenCV package.
I don't know. I certainly know pip has not solved. I mean, there's complexities there because
I actually don't know. I should probably know a good answer for this. But if you compile OpenCV
with certain dependencies, you'll be able to do certain things. So there's this kind of
flexibility of what options you compile with. And I don't think it's trivial to do that
in conda. So conda has a notion of variance of a package. You can actually
have different compilation versions of a package. So not just the versions different,
but oh, this is compiled with these optimizations on. So conda does have an answer.
Has flavors. Has flavors basically. Well, pip, as far as I know, does not have flavors.
No. Pip generally hasn't thought deeply about the binary dependency problem.
Right? That's why fundamentally, it doesn't work for the sci-fi ecosystem.
It barely, you can sort of paper over it and duct tape and it kind of works until it doesn't
and it falls apart entirely. So it's been a mixed bag. Like, and I've been having lots of
conversations with people over the years because again, it's an area where if you understand some
things, but not all the things, but they've done a great job of community appeal. This is an area
where I think anaconda as a company need to do some things in order to make conda more community
centric. Right? And this is a, I talk about this all the time. There's a balance between
you have every project starts with what I call company backed open source. Even if the company
is yourself, this is one person doing business as, but ultimately for products to succeed
virally and become massive influencers, they have to create, they have to get community
people on board. They have to get other people on board. So it has to become community driven.
And a big part of that is engagement with those people, empowering people, governance around it.
And there was, and what happened with conda in the early days,
like PIP emerged and we did, we did do some good things. Conda Forge,
conda Forge community is sort of the community recipe creation community. But conda itself,
I am still believe in, and you know, Peter is CEO of anacondas, my co-founder. I ran anaconda
till 2017, 2018. Is Peter still an anaconda? Right? We're still great friends. We talk all
the time. I love him to death. There's a long story there about like why and how we can cover in
some, some other podcasts, perhaps sort of more, maybe a more business focused one. But, but
this is one area where I think conda should be more community driven. Like he should be pushing
more to get more community contributors to conda and let, let, let the not like anaconda
shouldn't be fighting this battle. Yeah. Right. It's actually, it's really a developers. Like
you said, like help the developers and then they'll actually move us the right direction.
That was the problem I have as many of the cool kids I know don't use conda.
And that to me is confusing. It is confusing. It's a really a matter of,
conda has some challenges. First of all, conda still needs to be improved. There's lots of
improvements to be made. And that it's that aspect of wait, who's doing this and the fact that then
the Pi PA really stepped up. Like they were not solving the problem at all. And now they kind
of got to where they're solving it for the most part. And then effectively you could get like
conda solved a problem that was there and it still does. And it's still like, you know,
there's still great things it can do. But, and we still use it all the time at Quonsite and
with other clients, but with, but you can kind of do similar things with Pippin Docker. Right.
So, especially with the web development community, that part of it again is this,
is there's a lot of different kind of developers in the Python ecosystem. And
there's still a lack of some clear understanding. I go to the Python conference all the time and
that there's only a few people in the Pi PA who get it. And then others who are just massively
trumpeting the power of PIP, but just do not understand the problem.
Yeah. So one of the obvious things to me from a mom, from a non programmer perspective is the
across operating system usability that's much more natural. So they use windows and just,
it seems much easier to, to recommend conda there. But then you should also recommend it
across the board. So I'll, I'll definitely sort of, but what I recommend now as a hybrid,
I do. I mean, I have no problem. Is it possible to use? Oh, it is. It is. What I, like build the
environment with PIP with conda, build an environment with conda, and then PIP install on
top of that. That's fine. Be careful about PIP installing open CV or TensorFlow or,
because if somebody's allowed that, it's going to be most surely done in a way that
can't be updated that easily. So install like the big packages, the infrastructure with conda,
and then the weirdos that like the weird like implementation for some. I had a,
there's a cool library I used that based on your location and time of day and date,
tells you the exact position of the sun relative to the, to the earth. And it's just like a simple
library, but it's very precise. And I was like, all right, but that was, that was, and it's PIP.
Well, the thing they did really well is Python developers who want to get their stuff published,
they, you have to have a PIP recipe, right? I mean, even if it's, you know, the challenge is,
and there's a key thing that needs to be added to PIP, just simply add the PIP, the ability to
defer to a system package manager, like, because it's, you know, recognized, you're not going to
solve all the dependency problem. So let like give up and allow the, allow a system package
to work that way. Anaconda is installed and it has PIP. It would default to condo to install
and stuff. But Red Hat RPM would default to RPM to install, install more things. Like that's the,
that's a key, not difficult, but somewhat worse. Some work feature needs to be added.
That's an example of something like, I've no, we need to root and do it. I mean,
it's where I wish I had more money. I wish I was more successful in the, in the business side,
trying to get there. But I wish my, you know, my family, friends and full community that I know
was larger, was larger and had more money because I know tons of things to do effectively with
more resources. But, you know, I have not yet been successful a channel. Tons of, you know, some,
you know, I'm happy with what we've done. We've created again at Quonsite, what we created to
get Anaconda started. We created community to get Anaconda started, done it again with Quonsite.
Super excited by that. By the way, it took three years to do it. What is Quonsite? What is its
mission? We've talked a few times about different fascinating aspects of it, but it's like big
picture. What is Quonsite? Big picture, Quonsite. Quonsite is, it's vicious to connect data to
an open economy. So it's basically consulting the pie data ecosystem, right? It's a consulting
company. And what I've said when I started it was we're trying to create products, people, and
technology. So it's divided into two, two groups and a third one as well. The two groups are a
consulting services company that just helps people do data science and data engineering and data
management better and more efficiently. Like full stack, like full stack. Get a science, full thing.
We'll help you build a infrastructure if you're using Jupiter. We need, we do staff augmentation,
need more programmers, help you use DAS more effectively, help you use GPUs more effectively.
Just basically a lot of people need help. So we do training as well to help people, you know,
both immediate help and then get, get, learn from somebody. We've added a bunch of stuff too. We've
kind of separated some of these other things into another company called Open Teams that I'm kind
of currently started. One of the things I loved about we did at Anaconda was creating a community
innovation team. And so I wanted to replicate that. This time, we did a lot of innovation
at Anaconda. I wanted to do innovation, but also contribute to the projects that existed,
like create a place where maintainers, so the sci-pi and num-pi and all these projects we already
started can pay people to work on them and keep them going. So that's labs. QuantSite Labs is a
separate organization. It's a non-profit mission. The profits of QuantSite help fund it. And in
fact, every project that we have at QuantSite, a portion of the money goes directly to QuantSite
Labs to help keep it funded. So we've gotten several mechanisms we keep QuantSite Labs funded.
And currently, I'm really excited about labs because it's been a mission for a long time.
What kind of projects are within labs? So labs is working to make the software better,
like make num-pi better, make sci-pi better. It's only works on open source. So, you know,
if somebody wants to, so, you know, companies do, we have a thing called a community work order,
we call it. If a company says, I want to make spider better, okay, cool. You can pay for a
month of a developer of spider or developer of num-pi or developer of sci-pi. You're not,
you can't tell them what you want them to do. You can give them your priorities and things you
wish existed. And they'll work on those priorities with the community to get what the community
wants and what emerges with the community wants. Is there some aspect on the consulting side that
is helping as we were talking about morphology and so on? Is there a specific application
that are particularly like driving, sort of, inspiring the need for updates to sci-fi and
non-fi models? Correct. Absolutely. Absolutely. GPUs are absolutely one of them.
GPUs. And new hardware beyond GPUs. I mean, Tesla's dojo chip. I'm hoping we'll have a chance
to work on that, perhaps. Things like that are definitely driving it. The other thing is driving
is scalable, like speed and scale. How do I write num-pi code or num-pi light code if I want it to
run across a cluster? You know, oh, that's Dask or maybe it's Ray. I mean, there's sort of ways to
do that now. Or there's modem and there's, so pandas code, num-pi code, sci-fi code, second
learn code that I want to scale. So that's one big area. Have you gotten a chance to chat with
Andre and Elon about, because like... No, I would love to, by the way. I'm not very loved to. I just
saw their Tesla AI days video. Yeah. Super exciting. So this one of the, you know, I love great
engineering, software engineering teams and engineering teams in general, and they're doing
a lot of incredible stuff with Python. They're like revolutionizing so many aspects of the machine
learning pipeline that's operating in the real world. And so much of that is Python. And like you
said, the guy running, you know, Andre Kapathy, running autopilot is tweeting about optimization
of num-pi versus... I would love to talk to him. In fact, we have at Quonsight, we've been fortunate
enough to work with Facebook on PyTorch directly. We have about 13 developers at Quonsight. Some
of them are in labs working directly on PyTorch. On PyTorch. On PyTorch. Right. So I basically
started Quonsight. I went to both TensorFlow and PyTorch and said, hey, I want to help connect what
you're doing to the broader sci-fi ecosystem. Because I see what you're doing. We have this
bigger mission that we want to make sure we don't, you know, lose energy here. So, and Facebook
responded really positively and I didn't get the same reaction. Not yet. Not yet. I love the folks
in TensorFlow. I really love the folks TensorFlow too. They're fantastic. I think it's the just how
it integrates with their business. I mean, like I said, there's a lot of reasons. Just the timing,
the integration with their business, what they're looking for. They're probably looking for more
users. And I was looking to kind of cut up some development effort and they couldn't receive that
as easily, I think. So I'm hoping, I'm really hopeful and love the people there. What's the
idea behind Open Teams? So Open Teams, I'm super excited about Open Teams because it's one of the,
I mentioned my idea for investing directly in open source. So that's a concept called FaroSS.
But one of the things we, when we started Quonsight, we knew we would do is we develop
products and ideas and new companies might come out. At Anaconda, this was clear, right?
At Anaconda, we did so much innovation that like five or six companies could have come out of that.
And we just didn't structure it so they could. But in fact, they have, you look at Dask,
there's two companies coming out of Dask. You know, Bo-K could be a company. There's like
lots of companies that could exist off the work we did there. And so I thought, oh, here's a recipe
for an incubation, a concept that we could actually spawn new companies and new innovations. And then
the idea has always been, well, money they earn should come back to fund the open source project.
So, so labs is, you know, I think there should be a lot of things like Quonsight labs. I think
this concept is one that scales. You could have a lot of open source research labs.
Along the way, so in 2018, when the bigger idea came how to make open source investor,
I said, oh, I need to write it. I need to create a venture fund. So we created a venture fund
called Quonsight Initiate at the same time. It's an angel fund. Really, it's, you know,
we started to learn that process. How do we actually do this? How do we get LPs? How do we
actually go in this direction and build a fund? And I'm like, every venture fund should have an
associated open source research lab. There's just no reason. Like our venture fund, the carried
interest portion of it goes to the lab. It directly will fund the lab.
That's fascinating by the way. So you use the power of the organic formation of teams in the
open source community. And then like naturally, that leads to a business that can make money.
There are some, yeah, correct. And then it always maintains and loops back to the open
source. Loops back to open source. Exactly. I mean, to me, it's a natural fit. There's something,
there's absolutely a repeatable pattern there. And it's also beneficial because, oh, I have,
I have natural connections to the open source. If I have an open source research lab, like
they'll all be out there talking to people. And so we've had a chance to talk to a lot of
early stage companies. And we, and our fund focused on the early stage. So Quantate has the
services, the lab, the fund, right? In that process, a lot of stuff started to happen.
And we're like, oh, you know, we started to do recruiting and support and training.
And I was starting to build a bigger sales team and marketing team and people besides
just developers. And one of the challenges with that is you end up with different cultural aspects.
You know, developers, you know, there's a, in any company you go to, you can kind of go look,
is this a business led company, a developer led company? Do they kind of coexist?
How are they, what's the interface between them? There's always a bit of attention there,
like we were talking about before, you know, what is the tension there with open teams? I
thought, wait a minute, we can actually just create like this concept of Quantate plus labs.
It's, well, we're, well, it's specific to the Pi data ecosystem. The concept is general for all
open source. So open teams emerged as a, oh, we can create a business development company for
many, many Quantsites, like thousands of Quantsites. And it can be a marketplace to connect,
essentially be the enterprise software company of the future. If you look at what enterprise
software wants from the customer side, and during this journey, I've had the chance to work and,
and sell to lots of companies, Exxon and Shell and Davey Morgan Bank of America, like the Fortune
100, and talk to a lot of people in procurement and see what are they buying and why are they buying.
So, you know, I don't know everything, but I've learned a lot about, oh, what are they really
looking for? And they're looking for solutions. They're constantly given products from the,
from enterprise software. Here's open source, leave the enterprise software, now I buy it,
and then they have to stitch it together into a solution. Open source is fantastic for gluing
those solutions together. So, whereas they keep getting new platforms they're trying to buy,
but most open source, what most enterprises want is tools that they can customize that,
that are as inexpensive as they can. Yeah. And so you almost want to maintain the connection
to the open source, because that's, yes. So open teams is about solving enterprise software
problems. Brilliant idea, by the way. With a connect, but we do it, honoring the topology,
we don't hire all the people. We are a network connecting the sales energy and the procurement
energy, and we were on the business side, get the deals closed, and then have a network of partners
like Quonsite and others, who we hand the deals to, right, to actually do the work. And then we
have to maintain, I feel like we have to maintain some level of quality control, so that the client
can rely on open teams to ensure their deliveries. It's not just, here's a lead, go figure that out,
but no, we're going to make sure you get what you need. Right. By the way, it's such a skill,
and I don't know if I have the patience. I will have the patience to talk to the business people,
or more specifically, I mean, there's all kinds of flavors of business people, or like marketing
people. There's a challenge. I hear what you're saying, because I've had the same challenge.
Yeah. And it's true. There's sometimes you think, okay, this is way overwrought.
Yeah. So you have to become an adult and you have to, because the companies have needs,
they have ways to make money, and they're also going to learn and grow, and yet it's your job
to kind of educate them on the best way, like the value of open source, for example.
Right. And I'm really grateful for all my experiences over the past 14 years,
understanding that side of it, and still learning, for sure, but not just understanding from
companies, but also dealing with marketing professionals and sales professionals, and
people that make a career out of that, and understanding what they're thinking about,
and also understanding, well, let's make this better. We can really make a place. Open teams,
I see as the transmission layer between companies and open source communities producing enterprise
software solutions. Like, eventually, we want to, like today, we're taking on SaaS and MATLAB
and tools that we know we can replace for folks. Really, anytime you have a software tool at
organization where you have to do a lot of customization or make it work for you, like,
it's not you're just buying this thing off the shelf and it works. It's like, okay,
you buy this system, and you customize it a lot, usually with expensive consultants
to actually make it work for you. All of those should be replaced by open source foundations
with the same customization. Really, if you're doing such important work, such important work in
these giant organizations that are doing exactly that, taking some proprietary software and hiring
a huge team of consultants that customize it, and then that whole thing gets outdated quick.
Correct. And so, I mean, that's brilliant. The one solution to that is kind of what Tesla is
doing a little bit of, which is basically build up a software engineering team, like build a team
from scratch. Build a team from scratch. And companies are doing it well. That's what they're
doing right now. Yeah, exactly. And that's okay. And you're creating a pathology for some of that.
You're right. You just don't have to do it. That's not the only answer, right? And so,
other companies can access this, be more accessible. We really, let's really say,
open team is the future of enterprise software. We're still early. Like, this idea just percolated
over the past year, as we've kind of grown quantized and realized the extensibility of it.
We just finished in our seed round to help, you know, get more salespeople and then push the
messaging correctly. And there's lots of tools we're building to make this easier. Like,
we want to automate the processes. We feel like a lot of the power is the efficiency of the sales
process. There's a lot of wasted energy in small teams and the sales energy to get into large
companies and make a deal. There's a lot of money spent on that process.
Creating the tools and processes. Make that super seamless. So, a single company can go, oh, I've
got my contract with open teams. We have a subscription they can get. They can make that
procurement seamless. And then the fact they have access to the entire open source ecosystem. And
we have a, you know, we have a part of our work that's embracing open source ecosystems and making
sure we're doing things useful for them. We're serving them. And then companies making sure
they're getting solutions they care about. And then figuring out which, which targets we have,
you know, we're, we're not taking on all of open source, all of enterprise software yet.
But, but we're, we're, well, this feels like the future. The idea and the vision is brilliant.
Can I ask you, why do you think Microsoft bought GitHub? And what do you think is the future of
GitHub? Great point. Great point. I thought it was a brilliant move. I think they did because
Microsoft has always had a developer-centric culture. Like, they always have. Like, one of
things Microsoft's always done well is understand that their power is developers, right? It's been
a bummer didn't necessarily make a, make a good meme about how we approached that. But
names are broadening that. I think that's why, because they recognize GitHub is where developers
are at, right? And so. But do they have a vision like open teams type of situation, right? Think so
yet? Are they just basically throwing money at developers to show their support? I think so.
Without a topology, like you put it like a way to leverage that, like to give developers actual
money. Right. I don't think so. I think they're still, it's an enterprise software company,
and they make a bunch of money. They make a bunch of games. They have a big company. They
sell products. I think part of it is they know there's opportunity to make money from GitHub,
right? There's definitely a business there, you know, to sell to developers or to sell to
people using development. I think there's part of that. I think part of it is also there's,
they had definitely wanted to recognize that you need a value open source to get great developers,
which is, which is an important concept that was emerging over the past 10 years that, you know,
pay by data. We were able to convince JPMorgan to support by data because of that fact, right?
That was where the money for them putting a couple hundred thousand into supporting by data for
several conferences was they want developers, and they realized that developers want to
participate in open source. So enterprise software folks don't always understand how
their software gets used. Having spent a lot of time on the floors at JPMorgan at in shell and
at ExxonMobil, you see, oh, these companies have large development teams. And then you're,
they're kind of dealing with the, what's being delivered to them. So I really feel kind of a
privilege that I had a chance to learn some of these people and see what they're doing and even
work alongside them, you know, as a consultant, using my, using open source and trying to make
this work inside of our large organization. Some of it is actually for a large organization.
Some of it is messaging to the world that you care about developers and you're the cool,
you care. Like, for example, like if Ford, because I talked to them, like car companies,
right, they, they want to attract, you know, you want to take on Tesla and autopilot,
you want to take it, right? And so what do you do there? You show that you're cool, like you,
you try to show off that you care about developers and they have a lot of trouble doing that. And
like one way, I think like Ford should have bought GitHub. They just show off. Like these
old school companies and it's in a lot of different industries. There's probably different
ways. It's probably an art. They show that you care to developers and they developers, it's,
it's exactly what you said. Like for example, just spitballing here, but like Ford or somebody like
that could give $100 million to the development of NumPy and like, like literally look at like
the top most popular projects in Python and just say, we're just going to give money. Right.
Like that's going to immediately make you cool. They could actually. Yeah. And in fact,
we set up NumFocus to make it easy. Yeah. But the challenge was is also you have to have some
business development. Like it's a bit of a, it's a bit of a seeding problem, right? And you look
at how I've talked to the folks at Linux foundation know how they're doing it. I know how, and starting
NumFocus, because we had two babies in 2012, one was Anaconda, one was NumFocus, right? And they
were both important efforts. They had distinct journeys and super grateful that both existed
and still grateful both exist. But there's different energies in getting donations as there
is getting, this is important to my business. Like I'm selling something that this is not a,
this is, I'm going to make money this way. Like if you can tie it, if you can tie the message to
an ROI for the company, that's more effective. It's much more effective, right? So, and there
are rational arguments to make. I've tried to have conversations with marketing, especially
marketing departments. Like very early on, it was clear to me that, oh, you can just take a
fraction of your marketing budget and just spend it on open source development. And you get better
results from your marketing. Like because. How did those, can I, sorry, I'm going to try not to
go around here. No, it's fine. What have you learned from the interaction with the marketing
folks on that kind of, because you gave a great example of something that will obviously
be much better investment in terms of marketing is supporting open source projects.
The challenge is not dissimilar from the challenge you have in academia of the different colleges.
Knowledge gets very specific and very channeled. And so people get, they get a lot of learning
in the thing they know about. And it's hard then to bridge that and to get them to think
differently enough to have a sense that you might have something to offer. Because it's different.
It's like, well, how do I implement that? How do I, what do I do with that? Like,
do I, which budget do I take from? Do I slow down my spend on Google ads or my spend on
Facebook ads? Or do I not hire a content creator instead? Like, there's an operational aspect to
that that's something that you have to be the CMO, right? Or the CEO. You have to get the right level.
So you have to hire at a high position level. People that care about this.
Right. Or they won't know how, right? And because you can also do it very clumsily.
Right. And I've seen, because you can, you absolutely have to honor and recognize the
people you're going to. And the fact that if you just throw money at them,
it could actually create more problems.
Can I just say, this is not you saying, can I just, because I just need, I need to say this.
I've been very surprised how often marketing people are terrible at marketing.
I feel like the best marketing is doing something novel and unique that anticipates the future.
It feels like so much of the marketing practice is like what they took in school or maybe they're
studying from what was the best thing that was done in the past decade. And they're just repeating
that over and over, as opposed to innovating, like taking the risk. To me, marketing is taking
the big risk. That's a great point. And being the first one to risk.
Yeah. There's an aspect of data observation from that risk, right? That's,
that's, I think, shared what they're doing already. But it absolutely, it's about,
I think it's content. Like there's this whole world on content marketing that you could almost
say, well, yeah, it can get over, you can get, you can get inundated with stuff that's not
relevant to you. Whereas what you're saying would be highly relevant and highly useful and highly,
highly beneficial.
Yeah. But it's, it's a risk. I mean, that's why sort of there's a lot of innovative ways of doing
that test. There's an example of people that basically don't do marketing. They do marketing
in a very, like, it's like Elon hired a person who's just good at Twitter for running Tesla's
Twitter account. No, right, right. I mean, that's exactly what you want to be doing. You want to
be constantly innovating in the... Right. There's an aspect of telling, I mean, I've definitely seen
people doing great work where you're not talking about it. Like I would say that's actually a
problem I have right now with Quonsite Labs. Quonsite Labs has been doing amazing work,
really excited about it. But we have not been talking about it enough. We haven't been...
And there's different ways to talk about it. There's different ways to, there's different
channels to which to communicate. There's also like, I'll just throw some shade at companies I love.
So for example, iRobot, I just had a conversation with them. They make Roombas.
Sure. And they, I think I love, they're incredible robots, but like, every time they do
advertisement, not advertisement, but like marketing type stuff, it just looks so corporate.
And to me, the incredible, maybe wrong in the case of iRobot, I don't know. But to me, when
you're talking about engineering systems, it's really nice to show off the magic of the engineering
and the software and the, all the, all the geniuses behind this product and the tinkering and like
the raw authenticity of what it takes to build that system versus the marketing people who want
to have like pretty people, like standing there all pretty with the robots, like moving perfectly.
So to me, there's some aspect, it's like speaking to the hackers, you have to throw some bones,
some, some care towards the engineers, the developers, because there's some aspect, one,
for the hiring, but two, there's an authenticity to that kind of communication that's really
inspiring to the end user as well. Like if they know that brilliant people, the best in the world,
they're working at your company, they start to believe that that product that you're creating
is really good. It's interesting because your initial reaction would be, wait, there's different
users here. Why would you do that? You know, my wife bought a Roomba Roomba, but she, and she,
you know, loves developers, she loves me, but she doesn't care about that culture. So essentially,
what you said is actually the authenticity because everyone has a friend or one knows people,
there's word of mouth. I mean, if you word of mouth is so, so, yeah, exactly. And then I think
it's the lack of that, that realization, there's this halo effect that influences your general
marketing. Interesting. For some stupid reason, I do have a platform and it seems that the reason
I have a platform, many others like me, millions of others, is like the authenticity and like,
we get excited naturally about stuff. And like, I don't want to get excited about that iRobot video
because it's boring, it's marketing, it's corporate, as opposed to I wanted to do some fun,
this is, this is me like a shout out to iRobot, is they're not letting me get into the robot.
Yeah. Well, there's an aspect of, that could be benefiting from a, from a culture of modularity,
like add-ons and that could actually dramatically help. You've seen that over history. I mean,
Apple is an example of a company like that or, or the, like, I can see, I can see what your point
is, is that you have something that needs to be, it needs to be adopted broadly, the concept needs
to be adopted broadly. And if you want to go beyond this one device, you need to engage this community.
Yeah. And connecting to the, the open source, as you said, I got to ask you,
your programmer, one of the most impactful programmers ever, you've led many programmers,
you lead many programmers, what are some from a programmer perspective, what makes a good programmer,
what makes a productive programmer, is there a device you can give to be a great programmer,
as well? Great, great question. And there are times in my life I'd probably answer this even
better than I hope maybe give an answer today, because I thought about this numerous times,
like right now I've spent on so much time recently hiring salespeople that,
that your mind is a little bit on something else.
On something else. But I, I reflected on the past and also, you know, I have some really,
the only way I can do this, I have some really great programmers that I work with,
who lead the teams that they, they lead. And my goal is to inspire them and, and hopefully
help them, encourage them and be, help them encourage with their teams.
I would say there's a number of things, a couple of things. One is curiosity, like you,
I think a programmer without curiosity is mundane, like you'll lose interest, you won't do your best
work. So it's sort of, it's an affect, it's sort of, are you, you can have some curiosity about
things. I think two, don't try to do everything at once, recognize that you're, you know, we're
limited as humans, you're limited as a human. And each one of us are limited in different ways,
you know, we all have our different strengths and skills. So it's adapting the art of programming
to your skills. One of the things that always works is to limit what you're trying to solve.
Right. So if you're part of a team, usually maybe somebody else has put the architecture
together and they've gotten given a portion for you. If you're, if you're young, if you're not
part of a team, it's sort of breaking down the problem into smaller parts is essential for
you to make progress. It's very easy to take on a big project and try to do it all at once,
and you get lost, and then you do it badly. And so thinking about, you know, very concretely,
what you're doing, defining, you know, defining the inputs and outputs, defining what you want to
get done. Even just talking about that and like writing down before you write code is what are
you trying to accomplish. I mean, very specific about it really, really helps. I think using
other people's work, right? Don't, don't be afraid that somehow you're like, you should do it all.
Like nobody does. Stand on the shoulders of giants and copy and paste some Stack Overflow.
Copy and paste Stack Overflow. It's like, but, but don't just copy and paste this particular
relevant in the era of Codex and the, you know, the auto-generated code, which is essentially,
I see as an indexing of Stack Overflow. Right, exactly. Secondly, it's like,
It's a search engine. It's a search engine over Stack Overflow, basically. So it's not,
I mean, it's, we've had this for a while, but really, you want to cut and paste, but, but,
but not blindly. Like, absolutely, I've cut and paste to understand, but then you understand,
Oh, this is what this means. Oh, this is what it's doing. And understand, you know, as much as you
can. So it's critical. That's where the curiosity comes in. If you're just blindly cutting and
basting, you're not going to understand. And so understand and then, you know, be, be sensitive
to hype cycles, right? Every, every few, every few often there's always a, Oh, test driven
development is the answer. Oh, object oriented is the answer. Oh, like there's always an answer,
you know, agile is the answer. Be cautious of jumping onto a hype cycle. Like likely there's
signal, like there's a thing there that's, that's actually valuable you can learn from,
but it's almost certainly not the answer to everything you need.
What lessons do you draw from you having created NumPy and SciPy, like in, in service of sort of
answering the question of what it takes to be a great programmer and giving advice to people?
How can you be the next person to create a SciPy? Yeah. So one is listen,
to listen, to who, to, to people that have a problem, right? Which is everybody, right? But,
but listen and listen to many and, and try to then do, like don't, you're going to have to do an
experiment, you know, do fall down. Don't be afraid to fall down. Don't be afraid. The first thing
you do is probably going to suck and that's okay, right? It's, it's honestly, I think iteration is
the key to innovation. And, and it's that, it's almost that psychological hesitation we have
to just iterate. Like, yeah, we know, we know it's not great, but next someone will be better.
I mean, just, just keep learning and keep proving and keep improving. So it's, it's an attitude.
And then it doesn't take intense concentration, right? Good things don't happen just it's not
quite like TikTok or like Facebook, you know, you can't scroll your way to good programming,
right? There are, you know, sincere like hours of, of deep, don't be afraid of the deep problem.
Like often people will run away from something because, oh, I can't solve this. And you might
be right, but give it an hour, give it a couple of hours and see. And, you know, just five minutes
is not going to, not going to give you that. Was it lonely when you were building Saipai and Napa?
Hugely. Yeah, absolutely lonely in the sense of you have to have an inner drive and that inner
drive for me always comes from, I have to, I have to see that this is right in some angle. I have
to believe it, that this is the right approach, the right thing to do with Saipai. It was like,
oh yeah, the world needs libraries and Python. Clearly, Python is popular enough with enough
influential people to start and it needs more libraries. So that is a good, you know, itself.
So I'm going to go do that good. So find a good, find a thing that you know is good and just,
just work on it. So that has to happen. And it is, and you kind of have to have enough
realization of your mission to be okay with the naysayer or the fact that not everybody joins
you at front. In fact, one thing I've talked to people a lot, I've seen a lot of projects come
and some fail. Like not everything I've done has actually worked perfectly. I've tried a bunch of
stuff that, okay, that didn't really work or this isn't working and why. But you see the patterns and
one of the key things is you can't even know for six months. I say 18 months right now. If you're
starting a new project, you got to give it a good 18 month run before you even know if the feedback
is there. Like it's, you're not going to know in six months. You might have the perfect thing,
but six months from now it's still kind of still emerging. So give it time because you're dealing
with humans and humans have an inertial energy that just doesn't change that quickly. Let me ask
a silly question, but like you said, you're focused on the sales side of things currently,
but back when you were actually programming maybe in the 90s, you talked about IDs. What's your
a setup that you have that brings you joy? Keyboard, number of screens, Linux? I do still
like to program some, but it's not as much as I used to. I have two projects I'm super interested
in trying to find funding for them, trying to figure out some good teams for them, but I could
talk about those. But yeah, I'm an Emacs guy. Great. Thank you. The superior editor, everybody.
I've got, I don't often delete tweets, but one of the tweets I deleted when I said Emacs was
better than Vim and then the hate I got. It is. I was like, I'm walking away from this.
I do too. I don't push it. I mean, I'm just joking, of course. Yeah, exactly. It's kind of like,
but people do take the editor seriously. I did as a joke. It is, but there's something beautiful
to me about Emacs, but for people that love them, there's something beautiful to them about that.
I do use them for quick editing. Like command line, if I say quick editing, I will still sometimes
use it, but not much. Like it's simple, corrective single editor character. So when you were
developing SciPy, you were using Emacs. Yeah. SciPy NumPy are all written in Emacs on that Linux
box and CVS and then SVN version control. Git came later. Like Git has, I love distributed branch
stuff. I think Git is pretty complicated, but I love the concept. And also, of course, GitHub is,
and then GitLab make Git definitely consumable. But that came later. Did you ever touch the list
but all the emotional feelings about all the parentheses? Yeah. So great question. So I find
myself appreciating lists today much more than I did early. Because when I came to programming,
I knew programming, but I was a domain expert. And to me, the parentheses were in the way.
It's like, wow, it's just all this, it just gets in the way of my thinking about what I'm doing.
So why would I have all these? That was my initial reaction to it. And now as I appreciate kind of
the structure that kind of naturally maps to the, to a logical thinking about a program,
I can appreciate them. And why it's actually, you could create editors that make it not so
problematic, right? Honestly. So I actually have a much more appreciation of Lisp and things like
Clojure and there's Hivee, which is a Python, you know, a list that compiles the Python bytecode.
I think it's challenging. Like typically, these languages are, I even saw a whole data
science programming system in Lisp that somebody created, which is cool. But again,
it's the, I think it's the lack of recognition of the fact that there exists what I call occasional
programmers. People that are never going to be programmers for living. They don't want to have
all this the cuteness in their head. They want just, it's why basic, you know, Microsoft had
the right idea with basic in terms of having that be the language of visual basic, the language of
Excel and SQL, SQL server. They should have converted that to Python 10 years ago, but
world would be a better place if they had, but
there's also, there's the beauty and the magic to the history behind a language in Lisp, you know,
some of the most interesting people in the history of computer science and artificial
intelligence have used the Lisp. So you feel, well, it's back to that language. When you,
when you have a language, you can think in it and it helps you think about it and attracts
certain kinds of people that think in a certain kind of way. And then that's, that's there.
Okay. So what about like small laptop with a tiny keyboard, or is there like
three screens? You know, good question. I've never gotten into the big, the many screens,
to be honest. I mean, and maybe it's because in my head, I kind of just, I just swap between
windows. Like, partly because I guess I really can't process three screens at once anyway.
Like, I just am looking at one and I just flip, you know, I flip an application open.
So what about where it's really helpful is actually when I'm trying to, you know, here's
data and I want to input it from here. This is the only time I really need another screen.
So now, because you're both developer, lead developers, but then there's also these businesses
and their sales people in York, working with large companies, operations people, hiring people,
the whole thing, which operating system is your favorite at this, at this point? So Linux was the
early days. So yeah, I love Linux as a server side. And it was early days, I was, I had my own
Linux desktop. I've been on Mac laptops for 10 years now.
Yeah. This is what leadership looks like.
You switch to Mac. Okay, great.
Yeah, pretty much. I mean, just the fact that I had to do PowerPoints, I had to do presentations and,
you know, plug in, I just couldn't mess with plugging in laptops. It wouldn't project.
And yeah.
So you mentioned also Quonset Labs and things like that.
Can you give advice on how to hire great programmers and great people?
Yeah, I would say produce an open source project, get people contributing to it and hire those people.
Yeah. I mean, you're doing it sort of, you might be perhaps a little biased, but that's
probably 100% really good advice.
I find it hard to hire. I still find it hard to hire. Like, in terms of, I don't think,
I mean, it's not hard to hire if I've worked with somebody for a couple of weeks,
but a cup, an hour or two of interviews, I have no idea.
So that instinct, that radar of knowing if you're good or not, you find that you're still not able
to really do it.
It's really hard. I mean, the resume can help, but again, the resume is like a presentation
of the things they want you to see, not the reality of, and there's also,
you have to understand what you're hiring for. There are different stages and different kinds
of skills. And so it isn't just a, one of the things I talk a lot about internally at my
company is that the whole idea of measuring ourselves against a single axis is flawed,
because we're not, it's a multi-dimensional space. And how do you order a multi-dimensional space?
There isn't one ordering. So this whole idea, you immediately have projected into a thing.
And you're talking about hiring or best or worst or better or not better. So what is the thing
you're actually needing? And you can hire for that. There is such a thing. Generally,
I really value people who have the affect that care about open source. So in some cases,
their affinity to open source is simply kind of a filter of an affect. However, I have found
this interesting dichotomy between open source contributors and product creation. I don't
know if it's fully true, but there does seem to be the more, the more experienced, the more
affect somebody has to an open source community, the less ability to actually produce product
that they have. And the opposite is kind of true too. The more product focused are, I find a lot
of people, I've talked to a lot of people who produce really great products. And they, they
have a, they're looking over the open source communities, kind of wanting to participate
in play, but they've played here and they do a great job here. And then they don't necessarily
have some of the same. I don't think that, I don't think that's entirely necessary. I think part
of it is cultural, how that's, how they've emerged. Because one of the things that open
source communities often lack is great product management, like some product management energy
that. That's brilliant. But you want both of those energies in the same place together.
Yes, you really do. And so it's a lot of it's creating these teams of people that have these
needed skills and attributes that are hard. And so, so one of the big things I look for is
somebody that fundamentally recognize their need to learn, like one of the values that we,
we have in all of the things we do is learning. Like, if somebody thinks they know it all,
they're going to struggle. And some of that is just, there's more basic things like humility,
just being humble in the face of all the things you don't know. And that's like step one of
learning. That's step one of learning, right? And, you know, I've spent a lot of time learning,
right? Other people spend a lot more time, but I've spent a lot of time learning. I went,
you know, my whole goal was to get a PhD because I love school and I wanted to be a scientist. And
then what I found is what's been written about elsewhere as well as the more I learned, the
more I didn't know, the more I realized, man, I know about this, but this is such a tiny thing
in the global scope of what I might want to know about. So I need to be listening a whole lot better
than, than I am just talking. That's changed a little bit. Actually, my wife says that I used
to be a better listener. Now that I have, I'm so full of all these ideas I want to do, she kind
of says, you got to give people time to talk. So you've succeeded on multiple dimensions. So one is
the tenure track faculty, the others just creating all these products and building up the businesses
then working with businesses. Do you have advice for young people today in high school, in college,
of how to live a life as non-linear and as successful as yours, a life that could be,
they could be proud of? Well, that's, that's a super compliment. I'm humbled by that actually. I,
I would say a life that can be proud of, honestly, one thing that I've said to people is,
first, find people you love and care about them. Like family matters to me a lot and family means
people you love and have committed to, right? So it's, can be whatever you, you mean by that,
but it's, you need to have a foundation. So find people you love and want to commit to and do that
because it anchors you in a way that nothing else can, right? And then, and then you find
other things and then kind of from out there, you find other kinds of things you can commit to,
whether it's ideas or, or people or groups of people. So, you know, especially in high school,
I would say don't settle on what you think you know, right? Give yourself 10 years to think
about the world. Like there's, I see a lot of high school students who seem to know everything
already. I think I did too. I think it's maybe natural, but, but recognize that the things you
care about, you might change your perspective over time. I certainly have over time is that,
you know, I was really passionate about one specific thing and I was kind of softened.
You know, I was a big, I didn't like the Federal Reserve, right? And there's still,
we can have a longer conversation about monetary policy and finances, but, but
I'm a little more nuanced in my, in my perspective at this point. But, you know, that's, that's
one area where you learn about something, go, I want to attack it, you know, build, don't destroy.
Like build, like so much, so often the tendency is to not like something, they want to go attack it,
build something, build something to replace it, build up, you know, attract people to your new
thing. You'll get far, you'll be far more, far better, right? You don't need to destroy something
to build something else. So that's, I guess, generally. And then, you know, definitely
let curiosity, you know, follow your curiosity and, and let it don't just follow the money.
And all of that, like you said, is grounded in family friendship and ultimately love.
Yes.
Which is a great way to end it. Travis, you're one of the most impactful people in the
engineer and the computer science in the human world. So I truly appreciate everything you've
done. And I really appreciate that you would spend your valuable time with me. It was an honor.
It was a real pleasure for me. I appreciate that.
Thanks for listening to this conversation with Travis Oliphant. To support this podcast,
please check out our sponsors in the description. And now, let me leave you with something that
in the programming world is called Hodgson's law. Every sufficiently advanced LISP application
will eventually be reimplemented in Python. Thank you for listening and hope to see you next time.