This graph shows how many times the word ______ has been mentioned throughout the history of the program.
The following is a conversation with Dmitri Dolgov, the CTO of Waymo, which is an autonomous
driving company that started as Google self-driving car project in 2009 and became Waymo in 2016.
Dmitri was there all along. Waymo is currently leading in the fully autonomous vehicle space
in that they actually have an at-scale deployment of publicly accessible autonomous vehicles
driving passengers around with no safety driver, with nobody in the driver's seat.
This, to me, is an incredible accomplishment of engineering on one of the most difficult
and exciting artificial intelligence challenges of the 21st century.
Quick mention of a sponsor followed by some thoughts related to the episode.
Thank you to Trial Labs, a company that helps businesses apply machine learning
to solve real-world problems. Blinkist, an app I use for reading through summaries of books,
BetterHelp, online therapy with a licensed professional, and CashApp, the app I use to
send money to friends. Please check out the sponsors in the description to get a discount
at the support of this podcast. As a side note, let me say that autonomous and semi-autonomous
driving was the focus of my work at MIT and is a problem space that I find fascinating
and full of open questions from both a robotics and a human psychology perspective.
There's quite a bit that I could say here about my experiences in academia on this topic
that revealed to me, let's say, the less admirable sides of human beings. But I choose
to focus on the positive, on solutions. I'm brilliant engineers like Dmitri and the team
at Waymo who work tirelessly to innovate and to build amazing technology that will define our
future. Because of Dmitri and others like him, I'm excited for this future. And who knows,
perhaps I, too, will help contribute something of value to it. If you enjoy this thing,
subscribe on YouTube, review it with five stars and up a podcast, follow on Spotify,
support on Patreon, or connect with me on Twitter at Lex Freedman. And now here's my
conversation with Dmitri Dolgov. When did you first fall in love with robotics or even computer
science more in general? Computer science first at a fairly young age. Then robotics happened
much later. I think my first interesting introduction to computers was in the late 80s
when we got our first computer. I think it was an IBM, I think IBM AT. Remember those things
that had like a turbo button in the front? The radio precedent, you know, make the thing
goes faster. Did they already have floppy disks? Yeah, yeah, yeah. Like the 5.4-inch ones.
I think there was a bigger inch. So when something, then five inches and three inches.
Yeah, I think that was the five. I don't know, maybe that was before that was the giant plates.
Then it didn't get that. But it was definitely not the three-inch ones.
Anyway, so we got that computer. I spent the first few months just playing video games,
as you would expect. I got bored of that. So I started messing around and trying to figure
out how to make the thing do other stuff. Got into exploring, programming. And a couple of years
later, I got to a point where I actually wrote a game, a lot of games. And a game developer,
a Japanese game developer actually offered to buy it for me for a few hundred bucks,
but for a kid in Russia. That's a big deal. That's a big deal, yeah. I did not take the deal.
Wow, integrity. Yeah, I instead used the bid. Yes, that was not the most acute financial move
that I made in my life. Looking back at it now, I instead put it, well, I had a reason. I put it
online. It was, what did you call it back in the days? It was a freeware thing. It was not open
source, but you could upload the binaries. You would put the game online. And the idea was that
people like it and then they contribute and they send you a little donations. So I did my quick
math of like, of course, thousands and millions of people are going to play my game, send me a
couple of bucks a piece. You should definitely do that. As I said, not the best find. You're
already playing business models at that age. Remember what language it was? What programming
it was? Pascal. Pascal. Pascal. And they had a graphical component. So it's not text based. Yeah,
it was like, I think there are 320 by 200, whatever it was. I think that kind of the earlier
versus VGA resolution, right? And I actually think the reason why this company wanted to buy it is
not like the fancy graphics or the implementation. It was maybe the idea of my actual game. The idea
of the game. One of the things, it's so funny. I used to play this game called Golden Axe. And
the simplicity of the graphics and something about the simplicity of the music, like,
it still haunts me. I don't know if that's a childhood thing. I don't know if that's the
same thing for Call of Duty these days for young kids. But I still think that the simple, one of
the games are simple, that simple purity makes for, like, allows your imagination to take over
and thereby creating a more magical experience. Like now with better and better graphics, it
feels like your imagination doesn't get to create worlds, which is kind of interesting. It could
be just an old man on a porch, like waving at kids these days that have no respect. But I still
think that graphics almost get in the way of the experience. I don't know. Flippy Bird.
Yeah. I don't know if they match it. It's closed. I don't, yeah. But that's more about games that
that's more like Tetris World, where they optimally, masterfully, like, create a fun,
short-term dopamine experience, versus I'm more referring to like, role-playing games where
there's like a story. You can live in it for months or years. Like, there's an Elder Scroll series,
which is probably my favorite set of games. That was a magical experience. And then the
graphics are terrible. The characters were all randomly generated, but they're, I don't know.
It pulls you in. There's a story. It's like an interactive version of an Elder Scrolls
Tolkien world. And you get to live in it. I don't know. I miss it. It's one of the things that
suck about being an adult is there's no, you have to live in the real world as opposed to the Elder
Scrolls world. You know, whatever brings you joy, right? Minecraft, right? Minecraft's a great
example. You create. Like, it's not the fancy graphics, but it's the creation of your own
worlds. Yeah, that one is crazy. You know, one of the pitches for being a parent that people
tell me is that you can like use the excuse of parenting to go back into the video game world.
And like, that's like, you know, father, son, father, daughter time, but really you just get
to play video games with your kids. So anyway, at that time, did you have any ridiculous, ambitious
dreams of where as a creator you might go as an engineer? Did you, what did you think of yourself
as as an engineer? As a tinker? Or did you want to be like an astronaut or something like that?
You know, I'm tempted to make something up about, you know, robots, engineering or,
you know, mysteries of the universe. But that's not the actual memory that pops into my mind
when you when you ask me about childhood dreams. So I actually share the real thing.
When I was maybe four or five years old, I, you know, as we all do, I thought about
no one I wanted to do when I grow up. And I had this dream of being a traffic control cup.
You know, they don't have those todays, I think, but you know, back in the 80s, and you know,
in Russia, you probably are familiar with that legs, they had these, you know, police officers
that would stand in the middle of intersectional day, and they would have their legs striped back
back and wide batons that they would use to, you know, control the flow of traffic. And, you know,
for whatever reason, I was strangely infatuated with this whole process. And like that, that was
my dream. That's what I wanted to do when I grew up. And, you know, my parents, both physics
profs, by the way, I think were, you know, a little concerned with that level of ambition
coming from their child at that age. Well, it's an interesting, I don't know if you can relate,
but I very much love that idea. I have a OCD nature that I think lends itself very close to the
engineering mindset, which is you want to kind of optimize, you know, solve a problem by creating
an automated solution, like a set of rules, the set of rules you can follow, and then thereby
make it ultra efficient. I don't know if that's, it was of that nature. I certainly have that,
there's like factor, like SimCity and factory building games, all those kinds of things,
kind of speak to that engineering mindset. Or did you just like the uniform?
I think it was more of the latter. I think it was the uniform and, you know, the striped baton
there that made cars go in the right directions that drove it. But I guess, you know, it is,
I did end up, I guess, you know, working on the transportation industry one way or another.
No uniform, no, but that's right. Maybe it was my, you know, deep inner infatuation with the,
you know, traffic control batons that led to this career.
Okay, what, when did you, when was the leap from programming to robotics?
That happened later. That was after grad school. After, and actually the, you know,
cell driving cars was, I think, my first real hands-on introduction to robotics.
But I never really had that much hands-on experience in school and training. I, you know,
worked on applied math and physics. Then in college, I did more abstract computer science.
And it was after grad school that I really got involved in robotics, which was actually
cell driving cars. And, you know, that was a big, big flip.
What, what grad school?
So I went to grad school in Michigan, and then I did a postdoc at Stanford,
which is, that was the postdoc where I got to play with cell driving cars.
Yeah, so we'll return there. Let's go back to, to Moscow. So I, you know,
for episode 100, I talked to my dad, and also I grew up with my dad, I guess.
So I had to put up with him for many years. And he, he went to the Fistyev or MIPT.
It's weird to say in English because I've heard all of this in Russian, Moscow Institute of
Physics and Technology. And to me, that was like, I met some super interesting, as a child,
I met some super interesting characters. It felt to me like the greatest university in the world,
the most elite university in the world. And just the, the people that I met that came out of there
were like, not only brilliant, but also special humans. It seems like that place really tested the
soul, both like in terms of technically and like spiritually. So that could be just the
romanticization of that place. I'm not sure. But so maybe you can speak to it. But is it
correct to say that you spent some time at Fistyev? Yeah, that's right. Six years. I got my bachelors
and masters in physics and math there. And it was interesting because my, my dad, and actually
both my parents, went there. And I think all the stories that I heard, like, just like you, Alex,
growing up about the place and, you know, how interesting and special and magical it was,
I think that was a significant, maybe the main reason I wanted to go there for college.
And also that I actually went back to Russia from the U.S. I graduated high school in the U.S.
You went back there. I went back there. Yeah. That, wow. Exactly the reaction
most of my peers in college had. But, you know, perhaps a little bit stronger that, like,
point me out as this crazy kid. Were your parents supportive of that? Yeah. Yeah.
My games, your previous question, they supported me in letting me pursue my passions and the
things that I was interested in. That's a bold move. Wow. What was it like there?
It was interesting. You know, definitely fairly hardcore on the fundamentals of, you know,
math and physics and, you know, lots of good memories from, you know, from those times.
So, okay. So, Stanford, how'd you get into autonomous vehicles?
I had the great fortune and great honor to join Stanford's DARPA urban challenge team
in 2006. This was a third in the sequence of the DARPA challenges. There were two grand
challenges prior to that. And then in 2007, they held the DARPA urban challenge. So, you know,
I was doing my postdoc. I had, I joined the team and worked on motion planning for, you know,
that competition. So, okay. So, for people who might not know, I know from a certain
perspective, autonomous vehicles is a funny world. In a certain circle of people, everybody knows
everything. And then in a certain circle, nobody knows anything in terms of general public. So,
it's interesting. It's a good question of what to talk about. But I do think that the urban challenge
is worth revisiting. It's a fun little challenge, one that, first of all, like sparked so much,
so many incredible minds to focus on one of the hardest problems of our time in artificial
intelligence. So, that's a success from a perspective of a single little challenge.
But can you talk about, like, what did the challenge involve? So, were there pedestrians?
Were there other cars? What was the goal? Who was on the team? How long did it take?
Like, any fun sort of specs? Sure, sure, sure. So, the way the challenge was constructed in just
a little bit of background. And as I mentioned, this was the third competition in that series.
The first two were the Grand Challenge called the Grand Challenge. The goal there was to just
drive in a completely static environment. You know, you had to drive in a desert.
That was very successful. So, then DARPA followed with what they called the Urban Challenge, where
the goal was to build vehicles that could operate in more dynamic environments and share them with
other vehicles. There were no pedestrians there. But what DARPA did is they took over an abandoned
air force base. And it was kind of like a little fake city that they built out there. And they had
a bunch of robots, you know, cars that were autonomous in there all at the same time, mixed
in with other vehicles driven by professional drivers. And each car had a mission. And so,
there's a crude map that they received at the beginning. And they had a mission and go here
and then there and over here. And they kind of all were sharing this environment at the same
time. They had to interact with each other. They had to interact with the human drivers.
So, it's this very first, very rudimentary version of a self-driving car that, you know,
could operate in an environment shared with other dynamic actors. That, as you said,
you know, really, in many ways, you know, kickstarted this whole industry.
Okay. So, who was on the team? And how'd you do? I forget.
I came in second. Perhaps that was my contribution to the team. I think the Stanford team came in
first in the DARPA challenge. But then I joined the team and, you know, you were the one with the
bug in the code. I mean, do you have sort of memories of some particularly challenging things or,
you know, one of the cool things, it's not, you know, this isn't a product. This isn't a thing
that, you know, you have a little bit more freedom to experiment. So, you can take risks. So, you
can make mistakes. So, is there interesting mistakes? Is there interesting challenges that
stand out to you or something like taught you a good technical lesson or a good philosophical
lesson from that time? Yeah. You know, definitely a very memorable time. Not really a challenge,
but like one of the most vivid memories that I have from the time. And I think that was actually
one of the days that, you know, really got me hooked on this whole field was the first time
I got to run my software on the car. And I was working on a part of our planning algorithm
that had to navigate in parking lots. So, you know, something that, you know, called free space
motion planning. So, the very first version of that was, you know, we tried on the car,
it was on Stanford's campus in the middle of the night. And, you know, I had this little,
you know, course constructed with cones in the middle of a parking lot. So, we were there
in like 3 a.m., you know, by the time we got the code to, you know, compile and turn over.
And, you know, it drove. I could actually do something quite reasonable. And, you know,
it was of course very buggy at the time and had all kinds of problems, but it was pretty darn
magical. I remember going back and, you know, later at night trying to fall asleep and just,
you know, being unable to fall asleep for, you know, the rest of the night, just my mind was
blown. And, you know, that's what I've been doing ever since for more than a decade.
In terms of challenges and, you know, interesting memories, like on the day of the competition,
it was pretty nerve-wracking. I remember standing there with Mike Montemarillo, who was
the software lead and wrote most of the code. I think I did one little part of the planner. Mike,
you know, incredibly did, you know, pretty much the rest of it with a bunch of other
incredible people. But I remember standing on the day of the competition, you know,
watching the car, you know, with Mike, and your cars are completely empty, right? They're all,
they're lined up in the beginning of the race. And then, you know, DARPA sends them, you know,
on their mission, one by one, something, leave. And Mike, you just, they had these sirens,
round, round, round. They all had their different silence, right? Each siren had its own
personality, if you will. So, you know, off they go, and you don't see them. You just kind of,
and then every once in a while, they, you know, come a little bit closer to where
the audience is. And you can kind of hear, you know, the sound of your car. And, you know,
it seems to be moving along. So that, you know, gives you hope. And then, you know,
it goes away and you can't hear it for too long. You start getting anxious, right? So it's a little
bit like, you know, sending your kids to college and like, you know, kind of you invested in them.
You hope you, you, you, you, you, you build it properly, but like, it's still
anxiety inducing. So that was an incredibly fun few days. In terms of, you know, bugs,
as we mentioned, you know, one that was my bug that caused us the loss of the first place,
is still a debate that, you know, occasionally have with people on the CMU, team CMU came first.
I should mention that.
See, you haven't heard of them. But yeah.
No, it's some, you know, it's a small school. It's, it's, it's, you know, the really glitch that,
you know, they happen to succeed at something robotics related.
Very scenic though. So most people go there for the scenery. Yeah. It's a beautiful campus.
Unlike, unlike Stanford.
So for people, yeah, that's true. Unlike Stanford. For people who don't know, CMU is one of the
great robotics and sort of artificial intelligence universities in the world. CMU Carnegie Mellon
University. Okay. Sorry. Go ahead. Good. Good. So in the part that I contributed to,
which was navigating parking lots and the way, you know, that part of the mission work is
you in a parking lot, you would get from DARPA an outline of the map. You basically get this,
you know, giant polygon that define the perimeter of the parking lot. And there would be an entrance
and, you know, so maybe multiple entrances or exits to it. And then you would get a goal
within that open space. XY, you know, heading where the car had to park. It had no information
about the obstacles that the car might encounter there. So it had to navigate kind of completely
free space from the entrance to the parking lot into that parking space. And then once
it, you know, parked there, it had to exit the parking lot. And, you know, while, of course,
encountering and reasoning about all the obstacles that it encounters in real time. So
our interpretation, at least my interpretation of the rules was that you had to reverse out of
the parking spot. And that's what our cars did, even if there's no obstacle in front. That's not
what CMU's car did. And it just kind of drove right through. So there's still a debate. And
of course, you know, if you stop and then reverse out and go out the different way, that
costs you some time. And so there's still a debate whether, you know, it was my poor implementation
that cost us extra time or whether it was, you know, CMU violating the important rule of the
competition. And, you know, I have my own opinion here in terms of other bugs. And like, I have to
apologize to Mike Montemarillo for sharing this on air. But it is actually one of the more memorable
ones. And it's something that's kind of become a bit of a metaphor and a label in the industry
since then, I think, at least in some circles, it's called the victory circle or victory lap.
And our cars did that. So in one of the missions in the urban challenge and one of the courses,
there was this big oval right by the start and finish of the race. So the ARPA head,
a lot of the missions would finish in that same location. And it was pretty cool,
because you could see the cars come by, you know, kind of finish that part leg of the trip and
that leg of the mission and then, you know, go on and finish the rest of it. And other vehicles
would, you know, come hit their waypoint and, you know, exit the oval and off they would go.
Our car in the hand which hit the checkpoint and then it would do an extra lap around the oval and
only then, you know, leave and go in its merry way. So over the course of, you know, the full day,
it accumulated some extra time. And the problem was that we had a bug where it wouldn't, you know,
start reasoning about the next waypoint and plan around to get to that next point until it
hit the previous one. And in that particular case, by the time you hit that one, it was too late for
us to consider the next one and kind of make a lane change so that every time we would do like
an extra lap. So, you know, and that's the Stanford victory lap. The victory lap.
Oh, that's, there's, I feel like there's something philosophically profound in there somehow. But
I mean, ultimately, everybody is a winner in that kind of competition. And it's led to sort of
famously to the creation of Google self-driving car project and now Waymo. So can we give an
overview of how is Waymo born? How is the Google self-driving car project born? What is the mission?
What is the hope? What is it is the engineering kind of set of milestones that it seeks to
accomplish? There's a lot of questions in there. Yeah, I don't know.
But you're right. Kind of the DARPA, the Urban Challenge and the previous DARPA Grand Challenges
kind of led, I think, to a very large, you know, degree to that next step. And then, you know,
Larry and Sergey, Larry Page and Sergey Brin, Google Hunter scores. I saw that competition
and believed in the technology that the Google self-driving car project was born.
At that time, and we started in 2009, it was a pretty small group of us, about a dozen people
who came together to work on this project at Google. At that time, we saw an incredible early
result in the DARPA Urban Challenge. I think we're all incredibly excited about
where we got to. And we believed in the future of the technology, but we still had a very
rudimentary understanding of the problem space. So the first goal of this project in 2009 was to
really better understand what we're up against. And, you know, with that goal in mind, when we
started the project, we created a few milestones for ourselves that maximized learning. Well,
the two milestones were, you know, one was to drive 100,000 miles in autonomous mode, which was,
at that time, orders of magnitude that more than anybody has ever done. And the second milestone
was to drive 10 routes. Each one was 100 miles long. They were specifically chosen to be kind
of extra spicy, extra complicated and sampled the full complexity of that domain. And you had to
drive each one from beginning to end with no intervention, no human intervention. So you
would get to the beginning of the course, you would press the button that would engage in
autonomy, and you had to go for 100 miles beginning to end with no interventions.
And it sampled, again, the full complexity of driving conditions. Some were on freeways. We
had one route that went all through all the freeways and all the bridges in the Bay Area.
You know, we had some that went around Lake Tahoe and kind of mountains roads. We had some that
drove through dense urban environments like downtown Palo Alto and through San Francisco.
So it was incredibly interesting to work on. And it took us just under two years,
about a year and a half, a little bit more to finish both of these milestones. And in that
process, it was an incredible amount of fun, probably the most fun I had in my professional
career. And you're just learning so much. You are, you know, the goal here is to learn
and prototype. You're not yet starting to build a production system, right? So you just, you were,
you know, this is when you're kind of working 24 seven and you're hacking things together.
And you also don't know how hard this is. I mean, it's the point, like, so, I mean,
that's an ambitious, if I put myself in that mindset, even still, that's a really ambitious
set of goals, like just those two to picking, picking 10 different difficult, spicy challenges,
and then having zero interventions. So like not saying gradually, we're going to, like,
you know, over a period of 10 years, we're going to have a bunch of roots and gradually
reduce the number of interventions, you know, that literally says, like, by as soon as possible,
we want to have zero and on hard roads. So like to me, if I was facing that, it's unclear that
whether that takes two years or whether that takes 20 years. I mean, it took us under two.
I guess that speaks to a really big difference between doing something once and having a prototype
where you're going after, you know, learning about the problem versus how you go about
engineering a product that where you look at, you properly do evaluation, you look at metrics,
you know, drive dog, and you're confident that you can do that. And I guess that's why it took
a dozen people, you know, 16 months or a little bit more than that, back in 2009 and 2010,
with the technology of, you know, the more than a decade ago, that amount of time to achieve
that milestone of 10 routes, 100 miles each in no interventions. And, you know, it took us a little
bit longer to get to, you know, a full driverless product that customers use. That's another really
important moment. Is there some memories of technical lessons? Or just one, like, what did
you learn about the problem of driving from that experience? I mean, we can now talk about, like,
what you learned from modern day Waymo. But I feel like you may have learned some profound things
in those early days, even more so, because it feels like what Waymo is now is to trying to,
you know, how to do scale, how to make sure you create a product, how to make sure it's like
safe, you know, those things, which is all fascinating challenges. But, like, you were
facing the more fundamental philosophical problem of driving in those early days, like,
what the hell is driving as an autonomous, or maybe I'm again romanticizing it. But is there
some valuable lessons you picked up over there at those two years?
A ton. The most important one is probably that we believe that it's doable. And we've gotten far
enough into the problem that, you know, we had a, I think, only glimpse of the true complexity
of the domain. And it's a little bit like, you know, climbing a mountain where you kind of,
you know, see the next peak and you think that's kind of the summit, but then you get to that and
you kind of see that this is just the start of the journey. But we've tried, we've sampled enough
of the problem space, and we've made enough rapid success, even with technology of 2009-2010,
that it gave us confidence to then pursue this as a real product. So, okay, so the next step,
you mentioned the milestones that you had in those two years. What are the next milestones that
then led to the creation of Waymo and Beyond? It was a really interesting journey. And, you know,
Waymo came a little bit later. Then, you know, we completed those milestones in 2010. That was the
pivot when we decided to focus on actually building a product using this technology.
The initial couple years after that, we were focused on a freeway, you know, what you would
call a driver assist, maybe an L3 driver assist program. Then around 2013, we've learned enough
about the space and thought more deeply about, you know, the product that we wanted to build,
that we pivoted. We pivoted towards this vision of, you know, building a driver and deploying
it fully driverless vehicles without a person. And that's the path that we've been on since then.
It was exactly the right decision for us. So, there was a moment where you're also considered
like, what is the right trajectory here? What is the right role of automation in the task
of driving? It wasn't from the early days, obviously, you want to go fully autonomous.
From the early days, it was not. I think it was around 2013, maybe, that became very clear. And
we made that pivot. And it also became very clear that even the way you go building a
driver assist system is fundamentally different from how you go building a fully driverless vehicle.
So, you know, we've pivoted towards the ladder. And that's what we've been working on ever since.
And so that was around 2013. Then there's a sequence of really meaningful for us,
really important defining milestones since then. In 2015, we had our first, actually the world's
first fully driverless ride on public roads. It was in a custom-built vehicle that we had.
We must have seen those. We called them the Firefly, the funny looking marshmallow looking thing.
And we put a passenger, his name was Steve Mann, a great friend of our project from the
early days. The man happens to be blind. So, we put him in that vehicle. The car had no steering
wheel, no pedals. It was an uncontrolled environment. No lead or chase cars, no police
escorts. And we did that trip a few times in Austin, Texas. So, that was a really big milestone.
But that was in Austin. Yeah. Cool. Okay. And we only, but at that time, we're only,
it took a tremendous amount of engineering. It took a tremendous amount of validation
to get to that point. But we only did it a few times. We only did that. It was a fixed route.
It was not kind of a controlled environment, but it was a fixed route. And we only did a few times.
Then in 2016, end of 2016, beginning of 2017, is when we founded Waymo, the company. That was the
next phase of the project where we believed in kind of the commercial vision of this technology.
And it made sense to create an independent entity within that alphabet umbrella to pursue
this product at scale. Beyond that, in 2017, later in 2017, was another really
huge step for us, really big milestone where we started, I think it was October of 2017,
where when we started regular driverless operations on public roads, that first day of operations,
we drove in one day and that first day, 100 miles in driverless fashion. And then the most
important thing about that milestone was not that 100 miles in one day, but that it was the start
of kind of regular ongoing driverless operations. And when you say driverless,
it means no driver. That's exactly right. So on that first day, we actually had a mix. And
we didn't want to be on YouTube and Twitter that same day. So in many of the rides,
we had somebody in the driver's seat, but they could not disengage, like the car could not
disengage. But actually, on that first day, some of the miles were driven in just a completely
empty driver's seat. And this is the key distinction that I think people don't realize is, you know,
that oftentimes when you talk about autonomous vehicles, there's often a driver in the seat
that's ready to take over what's called a safety driver. And then Waymo is really one of the only
companies that I'm aware of, or at least as like boldly and carefully and all that is
actually has cases. And now we'll talk about more and more, where there's literally no driver.
So that's another interesting case of where the driver is not supposed to disengage. That's like
a nice middle ground. They're still there, but they're not supposed to disengage. But really,
there's the case when there's no, okay, there's something magical about there being nobody in
the driver's seat. Like, just like to me, you mentioned the first time you wrote some code
for free space navigation of the parking lot, that was like a magical moment. To me, just sort of
as an observer of robots, the first magical moment is seeing an autonomous vehicle turn,
like make a left turn, like, apply sufficient torque to the steering wheel to where like,
there's a lot of rotation. And for some reason, and there's nobody in the driver's seat, for some
reason that, that communicates that here's a being with power that makes a decision. There's
something about like the steering wheel, because we perhaps romanticize the notion of the steering
wheel, it's so essential to our conception, our 20th century conception of a car. And it turning
the steering wheel with nobody in driver's seat, that to me, I think maybe to others, it's really
powerful, like this thing is in control. And then there's this leap of trust that you give,
like, I'm going to put my life in the hands of this thing that's in control. So in that sense,
when there's no driver in the driver's seat, that's a magical moment for robots.
So I'm, I'm, I got the chance to last year to take a ride in, in a rainbow vehicle. And that,
that was the magical moment. There's like, nobody in the driver's seat. It's, it's like,
the little details, you would think it doesn't matter whether there's a driver or not. But like,
if there's no driver, and the steering wheel is turning on its own, I don't know, that's magical.
It's absolutely magical. Like, I've taken many of these rides in like, completely empty car,
like no human in the car pulls up, you know, you call it on your cell phone, it pulls up,
you get in, it takes you on its way. There's nobody in the car, but you, right? That's something
called, you know, fully driverless, our rider only mode of operation. Yeah, it is magical. It is,
you know, transformative. This is what we hear from our riders. It really changes your experience.
And not like that, that really is what unlocks the real potential of this technology. But,
you know, coming back to our journey, you know, that was 2017 when we started your truly driverless
operations. Then in 2018, we've launched our public commercial service that we called Waymo1
in Phoenix. In 2019, we started offering truly driverless rider only rides to our
early rider population of users. And then, you know, 2020 has also been a pretty interesting year,
one of the first ones. That's about technology, but more about the maturing and the growth of
Waymo as a company. We raised our first round of external financing this year. We were part of
Alphabet, so obviously we have access to significant resources. But as kind of on the journey of Waymo
maturing as a company, it made sense for us to, you know, partially go externally in this round.
So, you know, we raised about $3.2 billion from that round. We've also started putting our fifth
generation of our driver, our hardware, that is on the new vehicle. But it's also a qualitatively
different set of self-driving hardware that is now on the JLR pace. So that was a very
important step for us. The hardware specs, fifth generation, I think it would be fun to maybe,
I apologize if I'm interrupting, but maybe talk about maybe the generations or the focus on what
we're talking about on the fifth generation in terms of hardware specs, like what's on this car.
Sure. So we separated out, you know, the actual car that we are driving from the self-driving
hardware we put on it. Right now we have, so this is, as I mentioned, the fifth generation.
We've gone through, we started, you know, building our own hardware, you know, many,
many years ago. And that, you know, Firefly vehicle also had the hardware suite that was mostly,
you know, designed, engineered and built in-house. Lighters are one of the more important components
that we design and build from the ground up. So on the fifth generation of our drivers of our
self-driving hardware that we're switching to right now, we have, as with previous generations,
in terms of sensing, we have lighters, cameras and radars. And we have a pretty beefy computer
that processes all that information and makes, you know, decisions in real time on board the car.
So in all of the, and it's really a qualitative jump forward in terms of the capabilities and,
you know, the various parameters and the specs of the hardware compared to what we had before,
and compared to what you can kind of get off the shelf in the market today.
Meaning from fifth to fourth or from fifth to first?
Definitely from first to fifth, but also from the fourth.
World's dumbest question.
Definitely from fourth to fifth, as well as the last step is a big step forward.
So everything's in-house, so like LiDAR is built-in-house and cameras are built-in-house?
You know, it's different, you know, we work with partners and there's some components
that, you know, we, you know, get from our manufacturing and, you know, supply chain partners.
What exactly is in-house is a bit different, if you like, we do a lot of, you know, custom
design on all of our something, LiDAR's radars, cameras, you know, exactly.
There's LiDARs are almost exclusively in-house and some of the technologies that we have,
some of the fundamental technologies there, are completely unique to Weymo.
That is also largely true about radars and cameras.
It's a little bit more of a mix in terms of what we do ourselves versus what we get from partners.
Is there something super sexy about the computer that you can mention that's not top-secret?
Like for people who enjoy computers for, I mean, there's a lot of machine learning involved,
but there's a lot of just basic compute.
There's, you have to probably do a lot of signal processing on all the different sensors.
You have to integrate everything as to be in real time.
There's probably some kind of redundancy type of situation.
Is there something interesting you can say about the computer for the people who love hardware?
It does have all of the characteristics, all the properties that you just mentioned.
Redundancy, very beefy compute for general processing as well as, you know, inference
and ML models.
It is some of the more sensitive stuff that, you know, I don't want to get into for IP reasons,
but it can be shared a little bit in terms of the specs of the sensors that we have on the car.
You know, we've actually shared some videos of what our lighters see in the world.
We have 29 cameras.
We have five lighters.
We have six radars on these vehicles.
And you can kind of get a feel for the amount of data that they're producing that all has to be
processed in real time to do perception, to do complex reasoning.
That kind of gives you some idea of how beefy those computers are.
But I don't want to get into specifics of exactly how we build them.
Okay.
Well, let me try some more questions that you can't get into the specifics of like GPU-wise.
Is that something you can get into?
You know, I know that Google works with GPUs and so on.
I mean, for machine learning folks, it's kind of interesting.
Or is there no...
How do I ask it?
I've been talking to people in the government about UFOs and they don't answer any questions.
So this is how I feel right now, asking about GPUs.
But is there something interesting that you could reveal?
Or is it just, you know, or would leave it up to our imagination, some of the compute?
Is there any, I guess, is there any fun trickery?
Like I talked to Chris Latner for a second time and he was a key person about TPUs.
And there's a lot of fun stuff going on in Google in terms of hardware that optimizes
for machine learning.
Is there something you can reveal in terms of how much you mentioned customization,
how much customization there is for hardware for machine learning purposes?
I'm going to be like that government, you know, you've got a person bought UFOs.
But I guess I will say that compute is really important.
We have very data-hungry and compute-hungry ML models all over our stack.
And this is where both being part of Alphabet as well as designing our own sensors and the
entire hardware suite together, where on one hand, you get access to really rich,
raw sensor data that you can pipe from your sensors into your compute platform and build
like build the whole pipe from sensor, raw sensor data to the big compute as then have
the massive compute to process all that data.
And this is where we're finding that having a lot of control of that hardware part of the stack
is really advantageous.
One of the fascinating magical places to me, again,
I might not be able to speak to the details, but it is the other compute, which is like,
we're just talking about a single car.
But the driving experience is a source of a lot of fascinating data.
And you have the huge amount of data coming in on the car and the infrastructure of storing
some of that data to then train or to analyze or so on.
That's a fascinating piece of it that I understand a single car.
I don't understand how you pull it all together in a nice way.
Is that something that you could speak to in terms of the challenges of
seeing the network of cars and then bringing the data back and analyzing things that want,
like edge cases of driving, be able to learn on them to improve the system to
see where things went wrong, where things went right and analyze all that kind of stuff.
Is there something interesting there from an engineering perspective?
There's an incredible amount of really interesting work that's happening there,
both in the real-time operation of the fleet of cars and the information that they exchange
with each other in real-time to make better decisions, as well as the off-board component
where you have to deal with massive amounts of data for training your ML models,
evaluating the ML models for simulating the entire system and for evaluating your entire system.
This is where being part of Alphabet has, once again, been tremendously
advantageous. We consume an incredible amount of compute for ML infrastructure.
We build a lot of custom frameworks to get good at data mining, finding the interesting edge cases
for training and for evaluation of the system for both training and evaluating some components
and your sub parts of the system in your ML models, as well as evaluating the entire system
and simulation. Is that first piece that you mentioned that cars communicating to each other
essentially through, perhaps through a centralized point, but that's fascinating too.
How much does that help you? If you imagine right now the number of Waymo vehicles is
whatever X, I don't know if you can talk to what that number is, but it's not in the hundreds of
millions yet. Imagine if the whole world is Waymo vehicles. That changes potentially the power
of connectivity. The more cars you have, I guess actually, if you look at Phoenix, because there's
enough vehicles, when there's some level of density, you can start to probably do some really
interesting stuff with the fact that cars can communicate with each other and thereby make
decisions. Is there something interesting there that you can talk to about how does that help
with the driving problem as compared to just a single car solving the driving problem by itself?
Yeah, it's a spectrum. First, I say that it helps and it helps in various ways,
but it's not required right now. The way we build our system, each car can operate independently.
They can operate with no connectivity. I think it is important that you have a fully autonomous,
fully capable driver, computerized driver that each car has. Then they do share information
and they share information in real time. It really helps. The way we do this today is whenever one
car encounters something interesting in the world, whether it might be an accident or a new
construction zone, that information immediately gets uploaded over the air and is propagated to
the rest of the fleet. That's kind of how we think about maps as priors in terms of the knowledge
of our fleet of drivers that is distributed across the fleet and it's updated in real time.
That's one use case. You can imagine as the density of these vehicles go up that they can
exchange more information in terms of what they're planning to do and start influencing
how they interact with each other as well as potentially sharing some observations to help
with if you have enough density of these vehicles where one car might be seeing something that
another is relevant to another car that is very dynamic. It's not part of you're updating your
static prior of the map of the world, but it's more of a dynamic information that could be relevant
to the decisions that another car is making in real time. You can see them exchanging that
information and you can build on that. Again, I see that as an advantage, but it's not a requirement.
What about the human in the loop? When I got a chance to drive with a ride in the Waymo,
there's customer service. There is somebody that's able to dynamically tune in and help you out.
What role does the human play in that picture? That's a fascinating. The idea of tele-operation
be able to remotely control a vehicle. Here, what we're talking about is a human being
able to in a frictionless way help you out. I don't know if they're able to actually control
the vehicle. Is that something you could talk to? Yes. To be clear, we don't do tele-operation.
I believe in tele-operation for various reasons. That's not what we have in our cars.
We do, as you mentioned, have a version of customer support. We call it live health.
In fact, we find it that it's very important for our rider experience, especially if it's
your first trip. You've never been in a fully driverless or only Waymo vehicle. You get in,
there's nobody there. You can imagine having all kinds of questions in your head like how
this thing works. We've put a lot of thought into guiding our riders, our customers,
through that experience, especially for the first time. They get some information on the phone.
If the fully driverless vehicle is used to service their trip, when you get into the car,
we have an in-car screen and audio that guides them and explains what to expect.
They also have a button that they can push that will connect them to a real life human being
that they can talk to about this whole process. That's one aspect of it. There is,
I should mention that there is another function that humans provide to our cars,
but it's not tele-operation. You can think of it as a bit more like fleet assistance,
kind of like traffic control that you have, where our cars, again, they're responsible on
their own for making all of the driving decisions that don't require connectivity.
Anything that is safety or latency critical is done purely, autonomously, by on-board
system. There are situations where, if connectivity is available, in a car encounters a particularly
challenging situation, you can imagine a super hairy scene of an accident. The cars will do their
best. They will recognize that it's an off-nominal situation. They will do their best to come up
with the right interpretation in the best course of action in that scenario. If connectivity is
available, they can ask for confirmation from a human assistant to confirm those actions and
perhaps provide a little bit of contextual information and guidance.
October 8th was when Waymo launched the public version of its fully driverless,
that's the right term, I think, service in Phoenix. Is that October 8th?
That's right. It was the introduction of fully driverless, right, or only vehicles into our
public Waymo One service. That's amazing. Anybody can get into Waymo in Phoenix?
We previously had people in our early writer program taking fully driverless rides in Phoenix,
and just a little while ago, we opened that mode of operation to the public. I can download the app
and go on the right. There's a lot more demand right now for that service. Then we have capacity,
so we're managing that, but that's exactly the way we describe it.
Yeah, well, this is interesting. There's more demand than you can handle. What has been
reception so far? This is a product, right? That's a whole nother discussion of how compelling
of a product it is. Great, but it's also one of the most transformational technologies of the
21st century. It's also a tourist attraction. It's fun to be a part of it. It'd be interesting
to see what do people say? What have been the feedback so far?
Still early days, but so far, the feedback has been incredibly positive. We asked them for
feedback during the ride. We asked them for feedback after the ride as part of their trip.
We asked them some questions. We asked them to rate the performance of our driver. Most of our
drivers give us five stars in our app, which is absolutely great to see. They're also giving us
feedback on things we can improve. That's one of the main reasons we're doing this is Phoenix.
Over the last couple of years and every day today, we are just learning a tremendous amount of new
stuff from our users. There's no substitute for actually doing the real thing, actually having
a fully driverless product out there in the field with users that are actually paying us money to
get from point A to point B. This is a legitimate paid service. The idea is you use the app to go
from point A to point B. What's the freedom of the starting and ending places?
It's an area of geography where that service is enabled. It's a decent size of geography of
territory. It's actually larger than the size of San Francisco. Within that, you have full freedom
of selecting where you want to go. Of course, there are some and on your app, you get a map.
You tell the car where you want to be picked up, where you want the car to pull over and pick you
up, and then you tell it where you want to be dropped off. Of course, there are some exclusions.
In terms of where the car is allowed to pull over, that you can do, but besides that, it's
amazing. It's not like a fixed, just wouldn't be very, I guess I don't know, maybe that's
what's the question behind your question, but it's not a pre-set.
Yes, I guess. Within the geographic constraints, within that area, you can be picked up and
dropped off anywhere. That's right. People use them on all kinds of trips. We have an incredible
spectrum of riders. I think the youngest actually have car seats, and we have people taking their
kids on rides. I think the youngest riders we had on the cars are one or two years old,
and the full spectrum of use cases, people can take them to schools, to go grocery shopping,
to restaurants, to bars, to run errands, go shopping, etc., etc., go to your office.
The full spectrum of use cases, and people use them in their daily lives to get around,
and we see all kinds of really interesting use cases, and that is providing us incredibly
valuable experience that we're then used to improve our product.
So as somebody who's done a few long rants with Joe Rogan and others about the toxicity of the
internet and the comments and the negativity in the comments, I'm fascinated by feedback. I
believe that most people are good and kind and intelligent and can provide, even in disagreement,
really fascinating ideas. So on the product side, it's fascinating to me, how do you get
the richest possible user feedback to improve? What are the channels that you use to measure?
Because you're no longer... That's one of the magical things about autonomous vehicles.
It's frictionless interaction with the human, so you don't get to... It's just giving a ride.
So how do you get feedback from people in order to improve?
Yeah, great question. Various mechanisms. So as part of the normal flow, we ask people for
feedback. As the car is driving around, we have on the phone and in the car, and we have a touch
screen in the car, you can actually click some buttons and provide real-time feedback on how
the car is doing and how the car is handling a particular situation, both positive and negative.
So that's one channel. We have, as we discussed, customer support or life help, where if a customer
has a question or he has some sort of concern, they can talk to a person in real-time. So that
is another mechanism that gives us feedback. At the end of a trip, we also ask them how things
went. They give us comments and a star rating. And we also ask them to explain what went well
and what could be improved. And we have our riders providing very rich feedback there.
A large fraction is very passionate and very excited about this technology, so we get really
good feedback. We also run UXR studies, specific that go more in depth and we run both lateral
and longitudinal studies, where we have deeper engagement with our customers. We have our user
experience research team tracking over time. That's things about logitune. That's cool.
That's exactly right. And that's another really valuable feedback, a source of feedback.
And we're just covering a tremendous amount. People go grocery shopping and they want to load
20 bags of groceries in our cars. That's one workflow that you maybe don't think about
getting just right when you're building that driverless product. I have people who
bike as part of their trip. So they bike somewhere, then they get in our cars, they take a part of
their bike, they load into our vehicle, and that's where we want to pull over and how that
get in and get out process works, provides very useful feedback. In terms of what makes a good
pickup and drop off location, we get really valuable feedback. And in fact, we had to
do some really interesting work with high definition maps and thinking about walking
directions. If you imagine you're in a store, in some giant space, and then you want to be
picked up somewhere, if you just drop a pin at the current location, which is maybe in the
middle of a shopping mall, what's the best location for the car to come pick you up?
And you can have simple heuristics where you're just going to take your cleaning distance
and find the nearest spot where the car can pull over that's closest to you. But oftentimes,
that's not the most convenient one. I have many anecdotes where that heuristic
breaks in horrible ways. One example that I often mention is somebody wanted to be
dropped off in Phoenix and weak pickup car picked a location that was close,
I think the closest to there where the pin was dropped on the map in terms of latitude and
longitude. But it happened to be on the other side of a parking lot that had this row of
cacti. And the poor person had to walk all around the parking lot to get to where they
wanted to be in 110 degree heat. So that was a bummer. So then we took all of that feedback
from our users and incorporated it into our system and improved it.
Yeah, I feel like that requires AGI to solve the problem of when you're, which is a very common
case, when you're in a big space of some kind, like apartment building, it doesn't matter,
it's some large space. And then you call the like the Waymo from there, right? Like,
whatever that doesn't matter, right? Chair vehicle. And like, where is the pin supposed to drop?
I feel like that's, you don't think, I think that requires AGI. I'm going to
okay, the alternative, which I think the Google search engine is taught is like,
there's something really valuable about the perhaps slightly dumb answer, but a really powerful one,
which is like, what was done in the past by others? Like, what was the choice made by others?
That seems to be like in terms of Google search, when you have like billions of searches,
that you could, you could see which, like when they recommend what you might possibly mean,
they suggest based on not some machine learning thing, which they also do, but like on what,
what's successful for others in the past and finding a thing that they were happy with.
Is that integrated at all? Waymo, like what, what pickups work for others?
It is. I think you're exactly right. So there's a real, it's an interesting problem.
Naive solutions have, you know, interesting failure modes. So there's definitely lots of
things that can be done to improve. And both learning from, you know, what works, what doesn't
work in actual hail from, you know, getting richer data and getting more information about the
environment and, you know, richer maps. But you're absolutely right that there's something,
I think there's some properties of solutions that in terms of the effect that they have on users,
so much, much, much better than others, right? And predictability and understandability is
important. So you can have maybe something that is not quite as optimal, but is very
natural and predictable to the user and kind of works the same way all the time. And that matters,
that matters a lot for the user experience. And, but, you know, to get to the basics,
the pretty fundamental property is that the car actually arrives where you told it to write.
Like you can always, you know, change it, see it on the map and you can move it around if you
don't like it. And, but like that property that the car actually shows up reliably is critical,
which, you know, we're compared to some of the human driven analogs, I think, you know, you
can have more predictability. It's actually the fact, if I did a little bit of a detail here,
I think the fact that it's, you know, your phone and the cars two computers talking to each other
can lead to some really interesting things we can do in terms of the user interfaces,
you know, both in terms of function, like the car actually shows up exactly where you told it,
you want it to be, but also some, you know, really interesting things on the user interface,
like as the car is driving, as you, you know, call it, and it's on the way to come pick you up.
And of course, you get the position of the car and the route on the map, but, and they
actually follow that route, of course, but it can also share some really interesting information
about what it's doing. So, you know, our cars, as they are coming to pick you up, if a car is
coming up to a stop sign, it will actually show you that, like, it's there sitting,
because it's set a stop sign, or a traffic light will show you that it's sitting at a red light.
So, you know, they look little things, right? But I find those little touches really interesting,
really magical, and it's just, you know, little things like that that you can do to kind of delight
your users. You know, this makes me think of, there's some products that I just love, like,
there's a, there's a company called Rev, rev.com, where I like for this podcast, for example,
I can drag and drop a video, and then they do all the captioning, it's humans doing the captioning,
but they connect you, they, they automate, automate everything of connecting you to the humans,
and they do the captioning and transcription, it's all effortless. And it like, I remember
when I first started using them, it was like, life is good, like, because it was so painful to,
to figure that out earlier. The same thing with something called Isotope RX, this company I use
for cleaning up audio, like the sound cleanup they do, it's like drag and drop, and it just
cleans everything up very nicely. Another experience like that had with Amazon one-click
purchase, first time, I mean, other places do that now, but just the effortlessness of purchasing,
making it frictionless. It kind of communicates to me, like, I'm a fan of design, I'm a fan of products,
that you can just create a really pleasant experience, the simplicity of it, the elegance
just makes you fall in love with it. So on the, do you think about this kind of stuff? I mean,
we've been, that's exactly what we've been talking about. It's like, the little details that somehow
make you fall in love with the product. Is that, we went from like urban challenge days,
where love was not part of the conversation, probably. And to this point where there's a,
where there's human beings, and you want them to fall in love with the experience,
is that something you're trying to optimize for, trying to think about, like, how do you,
how do you create experience that people love? Absolutely. I think that's, the vision is removing
any friction or complexity from getting our users, our writers, to where they want to go.
Making that as simple as possible. And then, you know, beyond that, just transportation,
making, you know, things and, you know, goods get to their destination as seamlessly as possible.
I talked about, you know, a drag and drop experience where I kind of express your intent,
and then, you know, it just magically happens. And for our writers, that's what we're trying
to get to, is you download an app, and you click, and car shows up. It's the same car.
It's very predictable. It's a safe and high quality experience. And then it gets you,
you know, very reliable, very convenient, frictionless way to where you want to be.
And along the journey, I think we also want to do little things to delight our users.
Like the ride-sharing companies, because they don't control the experience, I think,
they can't make people fall in love necessarily with the experience.
Or maybe they haven't put in the effort. But I think if I would just speak to the ride-sharing
experience I currently have, it's just very, it's just very convenient. But there's a lot of room
for like falling in love with it. Like we can speak to sort of car companies. Car companies
do this well. You can fall in love with a car, right? And be like a loyal car person, like whatever.
Like I like badass hot rods. I guess 69 Corvette. And at this point, you know, you can't really,
cars are so, owning a car is so 20th century, man. But is there something about the Waymo
experience where you hope that people will fall in love with it? Because is that part of it?
Or is it part of, is it just about making a convenient ride, not ride-sharing, I don't know
what the right term is, but just a convenient A to B autonomous transport? Or like do you want them
to fall in love with Waymo? So maybe elaborate a little bit. I mean, almost like from a business
perspective, I'm curious, like how do you want to be in the background invisible? Or do you want to be
a source of joy that's very much in the foreground? I want to provide the best, most enjoyable
transportation solution. And that means building it, building our product and building our service
in a way that people do kind of use in a very seamless, frictionless way in their day-to-day
lives. And I think that does mean, you know, in some way falling in love in that product, right?
It just kind of becomes part of your routine. It comes down in my mind to safety, predictability
of the experience, and privacy, I think, aspects of it, right? Our cars, you get the same car,
you get very predictable behavior, and that is important, and if you're going to use it in your
daily life. Privacy, and when you're in a car, you can do other things. You're spending a bunch,
just another space where you're spending a significant part of your life. And so not having
to share it with other people who you don't want to share it with, I think is a very nice
property. Maybe you want to take a phone call or do something else in the vehicle. And safety
on the quality of the driving, as well as the physical safety of not having to share that ride
is important to a lot of people. What about the idea that when there's somebody
like a human driving, and they do a rolling stop on a stop sign, sometimes you get an Uber,
a Lyft, or whatever, like human driver, and they can be a little bit aggressive as drivers.
It feels like there is not all aggression is bad. Now, that may be a wrong, again, 20th century
conception of driving. Maybe it's possible to create a driving experience. If you're in the back
busy doing something, maybe aggression is not a good thing. It's a very different kind of
experience, perhaps. But it feels like in order to navigate this world, you need to,
how do I phrase this? You need to kind of bend the rules a little bit, or at least test the rules.
I don't know what language politicians use to discuss this, but whatever language they use,
you flirt with the rules. I don't know. But you sort of have a bit of an aggressive way of driving
that asserts your presence in this world, thereby making other vehicles and people respect your
presence, and thereby allowing you to navigate through intersections in a timely fashion.
I don't know if any of that made sense, but how does that fit into the experience of
driving autonomously? This is you're hitting a very important point of a number of behavioral
components and parameters that make your driving feel assertive and natural and comfortable
and predictable. Our cars will follow rules. They will do the safest thing possible in all
situations and be clear on that. But if you think of really, really good drivers, think about
professional limo drivers. They will follow the rules. They're very, very smooth, and yet they're
very efficient, but they're assertive. They're comfortable for the people in the vehicle.
They're predictable for the other people outside the vehicle that they share the environment with,
and that's the kind of driver that we want to build. Maybe there's a sport analogy there.
You can do in many sports, the true professionals are very efficient in their movements. They don't
do hectic flailing. They're smooth and precise, and they get the best results. That's the kind
of driver that we want to build. In terms of aggressiveness, you can roll through the stops
lines. You can do crazy lane changes. It typically doesn't get you to your destination faster,
typically not the safest or most predictable, very most comfortable thing to do. But there is
a way to do both. That's what we're doing, trying to build the driver that is safe, comfortable,
smooth, and predictable. Yeah, that's a really interesting distinction. I think in the early
days of autonomous vehicles, the vehicles felt cautious as opposed to efficient,
and still probably. But when I rode in the Waymo, I mean, it was quite assertive.
It moved pretty quickly. One of the surprising feelings was that it actually
went fast, and it didn't feel awkwardly cautious than autonomous vehicles. I've also
programmed autonomous vehicles, and everything I've ever built felt awkwardly either overly
aggressive, especially when it was my code, or awkwardly cautious is the way I would put it.
And Waymo's vehicle felt assertive, and I think efficient is the right terminology here.
They also like the professional limo driver, because we often think an Uber driver, or a bus
driver, or a taxi. The funny thing is, people think taxi drivers are professionals.
That's like saying I'm a professional walker, just because I've been walking all my life.
I think there's an art to it. And if you take it seriously as an art form,
then there's a certain way that mastery looks like. It's interesting to think about what does
mastery look like in driving? And perhaps what we associate with aggressiveness is unnecessary.
It's not part of the experience of driving. It's like unnecessary fluff, that efficiency.
You can create a good driving experience within the rules. You're the first person to tell me
this, so it's kind of interesting. I need to think about this. But that's exactly what it felt like
with Waymo. I kind of had this intuition. Maybe it's the Russian thing, I don't know,
that you have to break the rules in life to get anywhere. But maybe it's possible that that's
not the case in driving. I have to think about that. But it certainly felt that way on the streets
of Phoenix when I was there in Waymo. That was a very pleasant experience, and it wasn't
frustrating in that, like, come on, move already kind of feeling. That wasn't there.
Yeah. I mean, that's what we're going after. I don't think you have to pick one. I think
truly good driving gives you both efficiency, assertiveness, but also comfort and predictability
and safety. And that's what fundamental improvements in the core capabilities truly unlock.
And you can kind of think of it as precision and recall trade-off. You have certain capabilities
of your model, and then it's very easy when you have some curve of precision and recall,
you can move things around. You can choose your operating point and your trading of precision
versus recall, false positives versus false negatives. And you can tune things on that curve
and be kind of more cautious or more aggressive, but then aggressive is bad or cautious is bad.
But true capabilities come from actually moving the whole curve up. And then you are kind of
on a very different plane of those trade-offs. And that's what we're trying to do here,
is to move the whole curve up. Before I forget, let's talk about trucks a little bit.
So I also got a chance to check out some of the Waymo trucks. I'm not sure if we want to go too
much into that space, but it's a fascinating one. So maybe you can mention at least briefly,
Waymo is also now doing autonomous trucking. And how different philosophically and technically
is that whole space of problems? It's one of our two big products and commercial applications
of our driver, ride-hailing and deliveries. We have Waymo 1 and Waymo Via, moving people
and moving goods. Trucking is an example of moving goods. We've been working on trucking since 2017.
It is a very interesting space. And your question of how different is it,
it has this really nice property that the first order challenges like the science,
the hard engineering, whether it's hardware or onboard software or off-board software,
all of the systems that you build for training your ML models for evaluating your entire system.
Those fundamentals carry over. The true challenges of driving, perception,
semantic understanding, prediction, decision-making, planning, evaluation,
prediction, the simulator, ML infrastructure, those carry over. The data and the application
and the domains might be different, but the most difficult problems, all of that carries over
between the domains. So that's very nice. So that's how we approach it. We're kind of
built investing in the core, the technical core. And then there is specialization of
that core technology to different product lines, to different commercial applications.
So just to tease it apart a little bit, on trucks, starting with the hardware,
the configuration of the sensors is different. They're different physically,
geometrically, different vehicles. So for example, we have two of our main laser
on the trucks on both sides so that we have the blind spots, whereas on the JLR
high-pace, we have one of it sitting at the very top. But the actual sensors are almost the same,
or largely the same. So all of the investment that over the years we've put into building our
custom lighters, custom radars, putting the whole system together, that carries over very nicely.
Then on the perception side, the fundamental challenges of seeing, understanding the world,
whether it's object detection, classification, tracking, semantic understanding, all that
carries over. Yes, there's some specialization when you're driving on freeways. Range becomes
more important. The domain is a little bit different. But again, the fundamentals carry over
very, very nicely. Same, you get into prediction or decision making. The fundamentals of what it
takes to predict what other people are going to do, to find the long tail, to improve your system
and not long tail of behavior prediction and response, that carries over and so on. So
I mean, that's pretty exciting. By the way, does Waymovia include using the smaller vehicles for
transportation of goods? That's an interesting distinction. So I would say there's three
interesting modes of operation. So one is moving humans, one is moving goods, and one is like
moving nothing, zero occupancy, meaning like you're going to the destination, your empty vehicle.
The third is the less of a, if that's the entirety of it, it's the less exciting
from the commercial perspective. Well, I mean, in terms of like, if you think about what's inside
a vehicle as it's moving, because it does some significant fraction of the vehicle's movement
has to be empty. I mean, it's kind of fascinating. Maybe just on that small point, is there
different control and like policies that are applied for a zero occupancy vehicle,
so a vehicle with nothing in it? Or is it just move as if there is a person inside?
What was with some subtle differences? As a first order approximation,
there are no differences. And if you think about safety and comfort and quality of driving,
only part of it has to do with the people or the goods inside of the vehicle. But you don't want
to be, you want to drive smoothly as we discussed, not purely for the benefit of whatever you have
inside the car. It's also for the benefit of the people outside feeding naturally and predictably
into the whole environment. So yes, there are some second order things you can do. You can
change your route and optimize maybe your fleet things at the fleet scale. And you would take
into account whether some of your cars are actually serving a useful trip, whether with
people or with goods, whereas other cars are driving completely empty to that next valuable
trip that they're going to provide. But those are mostly second order effects. Okay, cool.
So Phoenix is an incredible place. And what you've announced in Phoenix is kind of amazing.
But you know, that's just like one city. How do you take over the world?
I mean, I'm asking for a friend one step at a time. Is that a cartoon pinky in the brain?
Yeah. Okay. But you know, gradually is a true answer. So I think the heart of your question
is, can you ask a better question than ask? You're asking a great question. Answer that one.
I'm just going to phrase it in the terms that I want to answer. Perfect. That's exactly right.
Brilliant. Please. Where are we today? And what happens next? And what does it take to go beyond
Phoenix? And what does it take to get this technology to more places and more people
around the world? So our next big area of focus is exactly that larger scale commercialization
and just scaling up. If I think about the main, and Phoenix gives us that platform and gives us
that foundation of upon which we can build. And it's there are a few really challenging aspects
of this whole problem that you have to pull together in order to build a technology in order to
deploy it into the field to go from a driverless car to a fleet of cars that are providing a
service and then all the way to commercialization. So this is what we have in Phoenix. We've taken
the technology from a proof point to an actual deployment and have taken our driver from one
car to a fleet that can provide a service. Beyond that, if I think about what it will take to scale
up and deploy in more places with more customers, I tend to think about three main dimensions,
three main axes of scale. One is the core technology, the hardware and software,
core capabilities of our driver. The second dimension is evaluation and deployment.
The third one is the product, commercial and operational excellence. So you can talk
a bit about where we are along each one of those three dimensions about where we are today
and what will happen next. On the core technology, on the hardware and software,
you know, together, comprised of driver, we obviously have that foundation that is providing
fully driverless trips to our customers as we speak, in fact. And we've learned a tremendous
amount from that. So now what we're doing is we are incorporating all those lessons
into some pretty fundamental improvements in our core technology, both on the hardware side and
on the software side, to build a more general, more robust solution that will enable us to
massively scale and beyond Phoenix. So on the hardware side, all of those lessons
are now incorporated into this fifth generation hardware platform that is being deployed right
now. And that's the platform, the fourth generation, the thing that we have right now
driving in Phoenix. It's good enough to operate fully driverlessly, night and day,
various speeds and various conditions. But the fifth generation is the platform upon which
we want to go to massive scale. We've really made qualitative improvements in terms of the
capability of the system, the simplicity of the architecture, the reliability of the redundancy.
It is designed to be manufacturable at very large scale and provides the right unit economics.
So that's the next big step for us on the hardware side.
That's already there for scale, the version five.
That's right.
And is that a coincidence or should we look into a conspiracy theory that it's the same version
as the Pixel phone? Is that what's the hardware they...
I can neither confirm nor deluxe.
All right, cool. So that's that axis. What else?
So similarly, hardware is a very discreet jump, but similar to how we're making that change from
the fourth generation hardware to the fifth, we're making similar improvements on the software side
to make it more robust and more general and allow us to quickly scale beyond Phoenix.
So that's the first dimension of core technology.
The second dimension is evaluation and deployment. How do you measure your system?
How do you evaluate it? How do you build the release and deployment process where
with confidence, you can regularly release new versions of your driver into a fleet?
How do you get good at it so that it is not a huge tax on your researchers and engineers?
So how do you build all these processes, the frameworks, the simulation, the evaluation,
the data science, the validation, so that people can focus on improving the system
and the releases just go out the door and get deployed across the fleet?
So we've gotten really good at that in Phoenix. That's been a tremendously difficult problem,
but that's what we have in Phoenix right now that gives us that foundation.
And now we're working on incorporating all the lessons that we've learned to make it more
efficient to go to new places and scale up and just stamp things out.
So that's that second dimension of evaluation and deployment.
And the third dimension is product, commercial, and operational excellence.
And again, Phoenix there is providing an incredibly valuable platform.
That's why we're doing things end-to-end in Phoenix. We're learning, as we discussed a little
earlier today, a tremendous amount of really valuable lessons from our users getting really
incredible feedback. And we'll continue to iterate on that and incorporate all those lessons
into making our product even better and more convenient for our users.
So you're converting this whole process of Phoenix into something that could be copy
and pasted elsewhere. So like, perhaps you didn't think of it that way when you were doing the
experimentation in Phoenix, but so how long did you basically, and you can correct me, but you've,
I mean, it's still early days, but you've taken a full journey in Phoenix, right?
As you were saying, of like what it takes to basically automate, I mean, it's not the entirety
of Phoenix, right? But I imagine it can encompass the entirety of Phoenix at some near-term date.
But that's not even perhaps important, like as long as it's a large enough geographic area.
So what, how copy-pasteable is that process currently? And how, like, you know, like when
you copy and paste in Google Docs, I think now, or in Word, you can like apply source formatting
or apply destination formatting. So when you copy and paste the Phoenix into like, say, Boston,
how do you apply the destination formatting? Like how much of the core of the entire process
of bringing an actual public transportation, autonomous transportation service to a city
is there in Phoenix that you understand enough to copy and paste into Boston or wherever?
So we're not quite there yet. We're not at a point where we're kind of massively copy and pasting
all over the place. But Phoenix, what we did in Phoenix, and we very intentionally have chosen
Phoenix as our first full deployment area, you know, exactly for that reason to kind of tease
the problem apart, look at each dimension, you know, focus on the fundamentals of complexity and
de-risking those dimensions, and then bringing the entire thing together to get all the way,
force ourselves to learn all those hard lessons on this technology, hardware and software,
on the evaluation deployment, on operating a service operating a business using,
actually, serving our customers all the way so that we're fully informed about the most
difficult, most important challenges to get us to that next step of massive copy and pasting,
as you said. And that's what we're doing right now. We're incorporating all those things that we
learned into that next system that then will allow us to kind of copy and paste all over the place
and to massively scale to, you know, more users and more locations. And we just talked a little
bit about, you know, what does that mean along those different dimensions. So on the hardware
side, for example, again, it's that switch from the fourth to the fifth generation. And the fifth
generation is designed to kind of have that property. Can you say what other cities you're
thinking about? Like, I'm thinking about, sorry, we're in San Francisco now, I thought I want
to move to San Francisco, but I'm thinking about moving to Austin. I don't know why people are
not being very nice about San Francisco currently, for maybe it's a small, maybe it's in Vogue right
now. But Austin seems I visited there and it was, I was in a Walmart. It's funny, these moments like
turn your life. There's this very nice woman with kind eyes, just like stopped and said,
he looks so handsome in that tie, honey, to me, this is never happening to me in my life, but
just the sweetness of this woman is something I've never experienced, certainly in the streets of
Boston, but even in San Francisco where people wouldn't, that's just not how they speak or think.
I don't know, there's a warmth to Austin that love. And since Waymo does have a little bit of a
history there, is that a possibility? Is this your version of asking the question of like,
you know, Dimitri, I know you can't share your commercial and deployment roadmap,
but I'm thinking about moving to San Francisco of Austin, like in a blink twice, if you think I
should move to it. That's true. That's true. You got me. We've been testing in all over the place,
I think we've been testing in more than 25 cities. We drive in San Francisco, we drive in Michigan
for snow. We are doing significant amount of testing in the Bay Area, including San Francisco.
Which is not like, because we're talking about the very different thing, which is like a full-on,
large geographic area, public service. You can't share. Okay.
What about Moscow? When is that happening? Take on Yandex. I'm not paying attention to those folks.
They're doing, you know, there's a lot of fun. I mean, maybe as a way of a question,
you didn't speak to sort of like policy or like, is there tricky things with government and so on?
Like, is there other friction that you've encountered except sort of technological friction
of solving this very difficult problem? Is there other stuff that you have to overcome
when deploying a public service in a city? That's interesting.
It's very important. We put significant effort in creating those partnerships and those relationships
with governments at all levels, local governments, municipalities, state level,
federal level. We've been engaged in very deep conversations from the earliest days of our
projects. At all of these levels, whenever we go to test or operate in a new area,
we always lead with the conversation with the local officials. But the result of that investment
is that no, it's not challenges we have to overcome, but it is a very important that we
continue to have this conversation. Yeah, a lot of politicians do. Okay, so Mr. Elon Musk said that
LiDAR is a crutch. What are your thoughts? I wouldn't characterize it exactly that way.
I know, I think LiDAR is very important. It is a key sensor that we use just like other
modalities. As we discussed, our cars use cameras, LiDARs, and radars. They are all very important.
They are at the physical level. They are very different. They have very different physical
characteristics. Cameras are passive. LiDARs and radars are active. They use different wavelengths.
That means they complement each other very nicely, and they together combine. They can be used to
build a much safer and much more capable system. To me, it's more of a question, why the heck would
you handicap yourself and not use one or more of those sensing modalities when they undoubtedly
just make your system more capable and safer? What might make sense for one product or one business
might not make sense for another one. If you're talking about driver assist technologies,
you make certain design decisions, and you make certain trade-offs, and you make different ones
if you're building a driver that you deploy in fully driverless vehicles. LiDAR, specifically,
when this question comes up, typically the criticisms that I hear are the counterpoints
that cost and aesthetics. I don't find either of those, honestly, very compelling. On the cost side,
there's nothing fundamentally prohibitive about the cost of LiDARs. Radars used to be very expensive
before people made certain advances in technology and started to manufacture them
at massive scale and deploy them in vehicles, similar with LiDARs. This is where the LiDARs
that we have on our car, especially the fifth generation, we've been able to make some pretty
qualitative discontinuous jumps in terms of the fundamental technology that allow us to
manufacture those things at very significant scale and add a fraction of the cost of both
our previous generation as well as a fraction of the cost of what might be available on the market
off the shelf right now. That improvement will continue. I think cost is not a real issue.
Second one is aesthetics. I don't think that's a real issue either.
Beauty is an eye of the beholder. You can make LiDAR sexy again.
I think you're exactly right. I think it is sexy. Honestly, I think form is full of function.
You know, actually, somebody brought this up to me. I mean, all forms of LiDAR, even
like the ones that are big, you can make look beautiful. There's no sense in which you can't
integrate it into design. There's all kinds of awesome designs. I don't think small and
humble is beautiful. It could be brutalism or it could be harsh corners. Like I said,
like hot rods, I don't necessarily like, oh man, I'm going to start so much controversy with this.
I don't like Porsches. Okay, the Porsche 911, everyone says the most beautiful, no,
no. It's like a baby car. It doesn't make any sense. But everyone, it's beauty is an eye of the
beholder. You're already looking at me like, what is this kid talking about? I'm happy to talk about.
You're digging your own hole. The form and function and my take on the beauty of the hardware that
we put on our vehicles. I will not comment on your Porsche monologues. Okay. All right.
So, but aesthetics, fine. But there's an underlying like philosophical question behind the kind of
lighter question is like, how much of the problem can be solved with computer vision,
with machine learning? So I think without sort of disagreements and so on,
it's nice to put it on the spectrum because Waymo's doing a lot of machine learning as well.
It's interesting to think how much of driving, if we look at five years, 10 years, 50 years down
the road, what can be learned in almost more and more and more end-to-end way. If we look at what
Tesla is doing with as a machine learning problem, they're doing a multitask learning
thing where it's just, they break up driving into a bunch of learning tasks and they have
one single neural network and they're just collecting huge amounts of data that's training
that. I've recently hung out with George Hotz. I don't know if you know George.
I love him so much. He's just an entertaining human being. We were off mic talking about
Hunter S. Thompson. He's the Hunter S. Thompson of the time I was driving. Okay. So he, I didn't
realize this with Comma AI, but they're like really trying to end-to-end. They're the machine,
like looking at the machine learning problem, they're really not doing multitask learning,
but it's computing the drivable area as a machine learning task and hoping that down the line,
this level two system that's driver assistance will eventually lead to allowing you to have a
fully autonomous vehicle. Okay. There's an underlying deep philosophical question there,
technical question of how much of driving can be learned. So LiDAR is an effective tool today
for actually deploying a successful service in Phoenix, right? That's safe, that's reliable,
et cetera, et cetera. But the question, and I'm not saying you can't do machine learning on LiDAR,
but the question is that, like, how much of driving can be learned eventually?
Can we do fully autonomous that's learned? Yeah. You know, learning is all over the place
and plays a key role in every part of our system. As you said, I would decouple the sensing modalities
from the ML and the software parts of it. LiDAR, radar, cameras, it's all machine learning,
all of the object detection classification, of course. That's what these modern deep
nuts and con nuts are very good at. You feed them raw data, massive amounts of raw data,
and that's actually what our custom build LiDARs and radars are really good at.
And radars, they don't just give you point estimates of objects in space,
they give you raw physical observations. And then you take all of that raw information,
whether it's colors or the pixels, whether it's LiDAR's returns and some auxiliary information,
it's not just distance, right? And angle and distance is much richer information that you
get from those returns, plus really rich information from the radars. You fuse it all
together and you feed it into those massive ML models that then lead to the best results in
terms of object detection, classification, state estimation.
So there's a side to it, but there is a fusion. I mean, that's something that people didn't do
for a very long time, which is at the sensor fusion level, I guess, early on fusing the
information together, whether so that the sensory information that the vehicle receives
from the different modalities or even from different cameras is combined before it is fed
into the machine learning models. Yeah. So I think this is one of the trends.
You're seeing more of that. You mentioned N10. There's different interpretations of N10.
There is kind of the purest interpretation. I'm going to have one model that goes from
raw sensor data to like steering torque and guest brakes. That's too much.
I don't think that's the right way to do it. There's smaller versions of N10
where you're kind of doing more end-to-end learning or core training or depropagation
of kind of signals back and forth across the different stages of your system.
There's really good ways. It gets into some fairly complex design choices where on one hand,
you want modularity and the composability of your system. But on the other hand,
you don't want to create interfaces that are too narrow or too brittle to engineered where you're
giving up on the generality of a solution or you're unable to properly propagate signal
you know, rich signal forward and losses and you know, back so you can optimize the whole system
jointly. So I would decouple and I guess what you're seeing in terms of the fusion of the sensing
data from different modalities as well as kind of fusion at in the temporal level going more from
you know, frame by frame where you know, you would have one net that would do frame by frame
detection camera and then you know, something that does frame by frame and lighter and then
radar and then you fuse it in a weaker engineered way later. The field over the last decade has
been evolving in more kind of joint fusion, more end-to-end models that are solving some of these
tasks jointly and there's tremendous power in that. And you know, that's the progression that
you kind of our stack has been on as well. Now, so I would decouple the kind of sensing and how
that information is used from the role of ML and the entire stack. And you know, I guess there's
trade-offs and you know, modularity and how do you inject inductive bias into your system, right?
This is, there's tremendous power in being able to do that. So you know, we have, there's no
part of our system that is not heavily, that does not heavily, you know, leverage
data-driven development or a state of the art ML. But there's mapping, there's a simulator,
there's perception, you know, object level, you know, perception, whether it's semantic
understanding, prediction, decision-making, you know, so forth and so on. It's, and of course,
object detection and classification, like you're finding pedestrians and cars and cyclists and,
you know, cones and signs and vegetation and being very good at estimating kind of detection,
classification and state estimation, there's just stable stakes. Like, like that's step zero
of this whole stack. You can be incredibly good at that, whether you use cameras or light as a
radar, but that's just, you know, that's stable stakes. That's just step zero. Beyond that,
you get into the really interesting challenges of semantic understanding, the perception level.
You get into scene-level reasoning. You get into very deep problems that have to do with
prediction and joint prediction and interaction, so-called interaction between all of the actors
in the environment, pedestrians, cyclists, all the cars, and you get into decision-making, right?
So how do you build a lot of systems? So we leverage ML very heavily in all of these components.
I do believe that the best results you achieve by kind of using a hybrid approach and having
different types of ML, having different models with different degrees of inductive bias
that you can have, and combining kind of model, you know, free approaches with some,
you know, model-based approaches and some rule-based, physics-based systems. So, you know,
one example I can give you is traffic lights. There's a problem of the detection of traffic
light state, and obviously that's a great problem for, you know, computer vision,
kind of nets, or, you know, that's their bread and butter, right? That's how you build that.
But then the interpretation of, you know, of a traffic light, that you're gonna need to learn
that, right? You read, you don't need to build some, you know, complex ML model that, you know,
infers with some, you know, precision and recall that red means stop. Like, it was a, it's a very
clear engineered signal with very clear semantics, right? So you want to induce that bias. Like,
how you induce that bias and that whether, you know, it's a constraint or a cost, you know,
function in your stack, but like, it is important to be able to inject that like clear semantic
signal into your stack. And, you know, that's what we do. And, but then the question of like,
and that's when you apply it to yourself, when you are making decisions, whether you want to
stop for a red light, you know, or not. But if you think about how other people treat
traffic lights, we're back to the ML version of that. Because, you know, they're supposed to stop
for a red light, but that doesn't mean they will. So then you're back in the like very heavy
ML domain where you're picking up on like very subtle keys about, you know, they have to do
with the behavior of objects and pedestrians, cyclists, cars, and the whole thing, you know,
the entire configuration of the scene that allow you to make accurate predictions on whether they
will in fact stop or run a red light. So it sounds like already, for Waymo, like machine learning
is a huge part of the stack. So it's a huge part of like, not just so obviously, the first level
zero or whatever you said, which is like, just the object detection of things that, you know,
with no other machine learning can do, but also starting to do prediction behavior and so on to
model the what other or the other parties in the scene entities in the scene are going to do.
So machine learning is more and more playing a role in that as well. Of course. Oh, absolutely.
I think we've been going back to the earliest days, like, you know, DARPA or even the DARPA
Grand Challenge, and team was leveraging, you know, machine learning. I was like pre, you know,
image nut and I was very different type of ML, but and I think actually was it was before my time,
but the Stanford team on during the grand challenge had a very interesting machine
learned system that would, you know, use lighter and camera when driving in the desert.
And it we had build the model where it would kind of extend the range of free space reasoning,
we get a clear signal from lighter. And then it had a model said, hey, like this stuff and camera
kind of sort of looks like this stuff and lighter. And I know this stuff and that I've seen in lighter.
I'm very confident this free space. So let me extend that free space zone into the camera range
that would allow the vehicle to drive faster. And then we've been building on top of that and
kind of staying and pushing the state of the art in ML in all kinds of different ML over the years.
And in fact, from the earliest days, I think, you know, 2010, probably the year where Google,
maybe 2011, probably got pretty heavily involved in machine learning, kind of deep nuts. And at
that time, it was probably the only company that was very heavily investing in kind of state-of-the-art
ML and self-driving cars, right? And they go hand in hand. And we've been on that journey ever since
we're doing pushing a lot of these areas in terms of research, you know, at Waymo and we're
collaborate very heavily with the researchers in alphabet and all kinds of ML, supervised ML,
unsupervised ML, you know, publish some interesting research papers in the space,
especially recently. It's just super super active. Super super active. Of course, there's, you know,
kind of the more mature stuff like, you know, convenants for, you know, object detection.
But there's some really interesting, really active work that's happening in more, you know,
and bigger models and, you know, models that have more structure to them, you know, not just,
you know, large bitmaps and reason about temporal sequences. And some of the interesting breakthroughs
that you've, you know, we've seen in language models, right, you know, transformers, you know,
GPT3N friends. There's some really interesting applications of some of the core breakthroughs
to those problems of, you know, behavior prediction, as well as, you know, decision making and
planning, right? You can think about it kind of the behavior, how, you know, the path of
trajectories, the how people drive, they have kind of a share a lot of the fundamental structure,
you know, this problem. There's, you know, sequential, you know, nature, there's a lot of
structure in this representation. There is a strong locality, kind of like in sentences, you
know, words that follow each other, they're strongly connected. But there's also kind of
larger context that doesn't have that locality, and you also see that in driving, right? What,
you know, is happening in the scene as a whole has very strong implications on,
you know, the kind of the next step in that sequence where whether you're, you know, predicting
what other people are going to do, whether you're making your own decisions, or whether in the
simulator, you're building generative models of, you know, humans walking cyclists, riding
on other cars, driving. Oh, that's, that's all really fascinating. Like how it's fascinating
to think that transforming models and all this, all the breakthroughs in language,
and NLP that might be applicable to like driving at the higher level at the behavior level, that's
kind of fascinating. Let me ask about pesky little creatures called pedestrians and cyclists.
They seem, so humans are a problem if we can get rid of them, I would. But unfortunately,
they're also a source of joy and love and beauty. So let's keep them around. They're also our
customers. Oh, for your perspective, yes, yes, for sure. There's also some money, very good.
But I don't even know where I was going. Oh, yes, pedestrians and cyclists.
You know, they're a fascinating injection into the system of uncertainty of,
of like a game theoretic dance of what to do. And also, they have perceptions of their own,
and they can tweet about your product. So you don't want to run them over
from that perspective. I mean, I don't know, I'm joking a lot, but I think in seriousness,
like pedestrians are a complicated computer vision problem, a complicated behavioral problem.
Is there something interesting you could say about what you've learned from a machine learning
perspective, from also an autonomous vehicle, and a product perspective about just interacting
with the humans in this world? Yeah, just state on record, we care deeply about the safety of
pedestrians, even the ones that don't have Twitter accounts. Thank you. All right, cool. Not me.
But yes, I'm glad somebody does. But in all seriousness, safety of vulnerable road users,
pedestrians or cyclists is one of our highest priorities. We do a tremendous amount of
testing and validation and put a very significant emphasis on the capabilities of our systems that
have to do with safety around those unprotected vulnerable road users.
Cars just discussed earlier in Phoenix, we have completely empty cars, completely
driverless cars, driving in this very large area. And some people use them to go to school,
so they'll drive through school zones. So kids are the very special class of those vulnerable
user road users. You want to be super, super safe and super, super cautious around those.
So we take it very, very, very seriously. And what does it take to be good at it?
An incredible amount of performance across your whole stack. It starts with hardware.
And again, you want to use all sensing of modalities available to you. Imagine driving
on a residential road at night and kind of making a turn and you don't have headlights
covering some part of the space and a kid might run out. And lighters are amazing at that.
They see just as well in complete darkness as they do during the day. So just again,
it gives you that extra margin in terms of capability and performance and safety and quality.
And in fact, we oftentimes in these kinds of situations, we have our system detect something
in some cases, even earlier than our trade operators in the car might do, especially
in conditions like very dark nights. So starts with sensing, then perception has to be incredibly
good. You have to be very, very good at detecting pedestrians in all kinds of situations and all
kinds of environments, including people in weird poses, people kind of running around and being
partially occluded. So that's stuff number one. Then you have to have in very high accuracy and
very low latency in terms of your reactions to what these actors might do. And we've put
a tremendous amount of engineering and tremendous amount of validation in to make sure our system
performs properly. And oftentimes it does require a very strong reaction to do the safe thing. We
actually see a lot of cases like that. That's the long tail of really rare, really crazy events
that contribute to the safety around pedestrians. One example that comes to mind that we actually
happened in Phoenix, where we were driving along and I think it was a 45 mile per hour road,
so pretty high speed traffic. And there was a sidewalk next to it. And there was a cyclist
on the sidewalk. And as we were in the right lane, right next to the side, so it was a multi-lane road.
So as we got close to the cyclist on the sidewalk, it was a woman. She tripped and fell,
just fell right into the path of our vehicle. And our car, this was actually with a test driver,
our test drivers, did exactly the right thing. They kind of reacted and came to stop. It requires
both very strong steering and strong application of the brake. And then we simulated what our
system would have done in that situation. And it did exactly the same thing. And that speaks to
all of those components of really good state estimation and tracking. And imagine a person
on a bike and they're falling over and they're doing that right in front of you. So you have to
be really like, things are changing. The appearance of that whole thing is changing. A person goes
one way, they're falling on the road, they're being flat on the ground in front of you. The
bike goes flying the other direction. The two objects that used to be one are now
are splitting apart. And the car has to detect all of that in milliseconds matter.
And it's not good enough to just brake. You have to like steer and brake and there's traffic around
you. So it all has to come together. And it was really great to see in this case, in other cases
like that, that we're actually seeing in the wild, that our system is performing exactly the way
that we would have liked and is able to avoid collisions like this.
Such an exciting space for robotics. In that split second, to make decisions of life and death,
I don't know. The stakes are high in a sense, but it's also beautiful that
for somebody who loves artificial intelligence, the possibility that an AI system might be able
to save a human life. That's kind of exciting as a problem, like to wake up. It's terrifying
probably for an engineer to wake up and to think about, but it's also exciting because it's like,
it's in your hands. Let me try to ask a question that's often brought up about autonomous vehicles.
And it might be fun to see if you have anything interesting to say, which is about the trolley
problem. So the trolley problem is an interesting philosophical construct of
that highlights and there's many others like it of the difficult ethical decisions that we humans
have before us in this complicated world. So specifically is the choice between
if you were forced to choose to kill a group X of people versus a group Y of people, like one person.
If you did nothing, you would kill one person, but if you would kill five people, and if you
decide to swerve out of the way, you would only kill one person. Do you do nothing or you choose
to do something? You can construct all kinds of sort of ethical experiments of this kind that
I think at least on a positive note inspire you to think about like introspect what are the
physics of our morality. And there's usually not good answers there. I think people love it because
it's just an exciting thing to think about. I think people who build autonomous vehicles usually
roll their eyes because this is not this one as constructed this like literally never comes up
in reality. You never have to choose between killing one or like one of two groups of people.
But I wonder if you can speak to is there some something interesting to use an engineer of
autonomous vehicles that's within the trolley problem or maybe more generally, are there
difficult ethical decisions that you find that algorithm must make?
On the specific version of the trolley problem, which one would you do if you're driving?
The question itself is a profound question because we humans ourselves cannot answer and that's the
very point. I would kill both. I think you're exactly right and that humans are not particularly
good. I think they kind of phrased like what would a computer do but humans are not very good.
And actually oftentimes I think that freezing and kind of not doing anything because you've
taken a few extra milliseconds to just process and then you end up doing the worst of the possible
outcomes. I do think that as you've pointed out, it can be a bit of a distraction and it can be a
bit of a red herring. I think it's an interesting philosophical discussion in the realm of philosophy
but in terms of how that affects the actual engineering and deployment of self-driving vehicles,
it's not how you go about building a system. We've talked about how you engineer a system,
how you go about evaluating the different components and the safety of the entire thing.
How do you kind of inject the various model-based, safety-based arenas? And yes,
your reason at parts of the system, your reason about the probability of a collision,
the severity of that collision. And that is incorporated and you have to properly reason
about the uncertainty that flows through the system. Those factors definitely play a role
in how the cars then behave but they tend to be more of the immersion behavior. And what do you
see? You're absolutely right that these clear theoretical problems that you don't occur in
the system and really kind of being back to our previous discussion. Which one do you choose?
Well, oftentimes you made a mistake earlier. You shouldn't be in that situation in the first
place. And in reality, the system comes up. If you build a very good safe and capable driver, you
have enough clues in the environment that you drive defensively so you don't put yourself in
that situation. And again, if you go back to that analogy of precision and recoil, like, okay,
you can make a very hard trade-off, but like me, the answer is really good. But what instead you
focus on is kind of moving the whole curve up and then you focus on building the right capability
on the right defensive driving so that you don't put yourself in the situation like this.
I don't know if you have a good answer for this, but people love it when I ask this question
about books. Are there books in your life that you've enjoyed, philosophical fiction, technical,
that had a big impact on you as an engineer or as a human being? Everything from science fiction
to a favorite textbook, is there three books that stand out that you can think of?
3 books. So I would, you know, that impacted me. I would say this one is, you probably know it
well, but not generally well known. I think in the US or kind of internationally, The Master and
Margarita. It's one of actually my favorite books. It is by Russian, it's a novel by Russian author
Mikhail Bulgakov. It's a great book. It's one of those books that you can reread your entire life
and it's very accessible. You can read it as a kid. The plot is interesting. It's the devil
visiting the Soviet Union. But you read it at different stages of your life and you enjoy it
for different, very different reasons. And you keep finding like deeper and deeper meaning
and it kind of affected, definitely had an imprint on me mostly from the
probably kind of the cultural, stylistic aspect. It makes you think one of those books that is
good and makes you think, but also has like this really silly, quirky, dark sense of humor.
Okay. It captures the Russian soul more than perhaps many other books. On that slight note,
just out of curiosity, one of the saddest things is I've read that book in English.
Did you by chance read it in English or in Russian?
In Russian, only in Russian. And actually, that is a question I had.
Posted myself every once in a while. I wonder how well it translates. If it translates at all.
And there's the language aspect of it. And then there's the cultural aspect.
And actually, I'm not sure if either of those would work well in English.
Now, I forget their names, but so when the COVID lifts a little bit, I'm traveling to Paris
for, for several reasons. One is just, I've never been to Paris. I want to go to Paris.
But there's the most famous translators of Dostoevsky, Tolstoy of most of Russian literature
live there. There's a couple, they're famous, a man and a woman. And I'm going to sort of have
a series of conversations with them. And in preparation for that, I'm starting to read
Dostoevsky in Russian. So I'm really embarrassed to say that I read this, everything I've read
in Russian literature of like serious depth has been in English. Even though I can also read,
I mean, obviously in Russian, but for some reason, it seemed in the optimization of life,
it seemed the improper decision to do to read in Russian. Like,
like I don't need to, I need to think in English, not in Russian. But now I'm changing my mind on
that. And so the question of how well it translates is a really fundamental one, like even with
Dostoevsky. So from what I understand, Dostoevsky translates easier. Others don't as much. Obviously
the poetry doesn't translate as well. I'm also the music of big fan of Vladimir Vysotsky.
He doesn't obviously translate well. People have tried. But mastermind, I don't know. I don't
know about that one. I just know in English, you know, as fun as how in English. So, but it's
a curious question. And I want to study it rigorously from both the machine learning aspect.
And also because I want to do a couple of interviews in Russia that I'm still unsure of how to
properly conduct an interview across a language barrier. It's a fascinating question that ultimately
communicates to an American audience. There's a few Russian people that I think are truly
special human beings. And I feel like I sometimes encounter this with some incredible scientists
and maybe you encounter this as well at some point in your life that it feels like because
of language barrier, their ideas are lost to history. It's a sad thing. I think about like
Chinese scientists or even authors that like that we don't in English speaking world don't get to
appreciate some like the depth of the culture because it's lost in translation. And I feel like
I would love to show that to the world. Like I'm I'm just some idiot. But because I have this
like at least some semblance of skill in speaking Russian, I feel like and I know how to record
stuff on a video camera. I feel like I want to catch like Gregorio Perlman who's a mathematician,
I'm not sure if you're familiar with him. I want to talk to him like he's a fascinating mind and
to bring him to a wider audience in English speaking will be fascinating. But that requires
to be rigorous about this question of how well Bogakov translates. I mean, I know it's a it's
a silly concept, but it's a fundamental one. Because how do you translate? And that's that's
the thing that Google Translate is also facing as a more machine learning problem. But I wonder is
a more bigger problem for AI. How do we capture the magic that's there in the language?
I think that's a really interesting, really challenging problem. If you do read it,
Master and Marguerita in English, sorry, in Russian, I'd be curious to get your opinion. And
I think part of it is language, but part of it is just centuries of culture that the cultures
are different. So it's hard to connect that. Okay, so that was my first one, right? You had
two more. The second one, I would probably pick the science fiction by the Strogotsky brothers.
It's up there with Isaac Asimov and Ray Bradbury and Company. The Strogotsky brothers kind of
appealed more to me. I think more, it made more of an impression on me growing up. I
apologize if I'm showing my complete ignorance. I'm so weak on sci fi. What were they right? Oh,
roadside picnic. Hard to be a god.
Beetle in an end hill. Monday starts on Saturday. It's not just science fiction. It's also
like has very interesting, you know, interpersonal and societal questions and some of the languages
just completely hilarious. That's right. That's right. Oh, interesting. Monday starts on Saturday.
So I need to read. Okay. Oh, boy. You put that in the category of science fiction.
That one is, I mean, this was more of a silly humorous work. I mean, there is
science fiction, right? It's about this research institute and it has deep parallels to serious
research, but the setting, of course, is that they're working on magic, right? And that's
their style, right? And, you know, other books are very different, right? You know, hard to be a
god, right? It's about kind of this higher society being injected into this primitive world and how
they operate there. Like some of the very deep ethical, you know, questions there, right? And
like they've got this full spectrum, some is, you know, more about kind of more adventure style.
But like, I enjoy all of their books. There's probably a couple, actually one, I think that
they consider their most important work. I think it's the snail on a hill. I'm not exactly sure
how it translates. I tried reading a couple of times. I still don't get it. But everything else
I fully enjoyed. And like for one of my birthdays as a kid, I got like their entire collection,
like occupied a giant shelf in my room and then like over the holidays, I just like,
you know, my parents couldn't drag me out of the room and I read the whole thing cover to cover.
And it really enjoyed it. And that's one more. For the third one, I, you know,
maybe a little bit darker. But you know, it comes to mind is Orwell's 1984.
And I, you know, I asked what made an impression on me and the books that people should read that
one, I think falls in the category of both. Now, you know, definitely it's one of those books that
you read and you just kind of, you know, put it down and you stare in space for a while.
You know, that kind of work. I think there's, you know, lessons there people should not ignore.
And, you know, nowadays with like everything that's happening in the world, I can help it,
but you know, I have my mind jump to some, you know, parallels with what Orwell described. And
like there's this whole, you know, concept of double think and ignoring logic and, you know,
holding completely contradictory opinions in your mind and not have that not bother you and,
you know, stick into the party line at all costs. Like, you know, there's something there.
If anything, 2020 has taught me, and I'm a huge fan of Animal Farm, which is a kind of friendly,
as a friend of 1984 by Orwell, it's kind of another thought experiment of how our society
may go in directions that we wouldn't like it to go. But if anything that's been
kind of heartbreaking to an optimist about 2020 is that that society is kind of fragile. Like,
we have this, this is a special little experiment we have going on. And not, it's not unbreakable.
Like, we should be careful to like preserve whatever special thing we have going on. I mean,
I think 1984 and in these books, Brave New World, they, they're helpful in thinking like,
stuff can go wrong in non-obvious ways. And it's like, it's up to us to preserve it.
And it's like, it's a responsibility. It's been weighing heavy on me because like,
for some reason, like, more than my mom follows me on Twitter. And I feel like I have, I have like
now somehow a responsibility to, to this world. And it dawned on me that like me and millions of
others are like the little ants that maintain this little colony, right? So we have a responsibility
not to be, I don't know what the right analogy is, but put a flamethrower to the place. We want
to not do that. And there's interesting, complicated ways of doing that as 1984 shows.
It could be through bureaucracy. It could be through incompetence. It could be through misinformation.
It could be through division and toxicity. I'm a huge believer in like that love will be
the somehow the solution. So love and robots. Love and robots. Yeah. I think you're exactly
right. Unfortunately, I think it's less of a flamethrower type of an extra. I think it's more
of a, in many cases, it's going to be more of a slow boil. And that, that's the danger.
Let me ask, it's a fun thing to make a world-class roboticist engineer and leader uncomfortable
with a ridiculous question about life. What is the meaning of life? Dimitri from a robotics
and a human perspective. You only have a couple of minutes or one minute to answer. So
I don't know if that makes it more difficult or easier, actually. You know, they're very tempted
to quote one of the stories by Isaac Asimov, actually titled appropriately titled The Last
Question, short story, where the plot is that, you know, humans build this supercomputer, you
know, this AI intelligence. And once it gets powerful enough, they pose this question to it.
How can the entropy in the universe be reduced? And so your computer replies,
hang on, as of yet, insufficient information to give a meaningful answer, right? And then,
you know, thousands of years go by and they keep posing the same question and computer, you know,
gets more and more powerful and keeps giving the same answer, you know, as of yet, insufficient
information to give a meaningful answer or something along those lines, right? And then,
you know, it keeps, you know, happening and happening fast forward like millions of years
into the future and, you know, billions of years. And at some point, it's just the only
entity in the universe. It's like absorbed all humanity and all knowledge in the universe.
And it's like, it keeps posing the same question to itself. And, you know,
finally, it gets to the point where it is able to answer that question. But of course, at that
point, you know, there's, you know, the heat death of the universe has occurred. And that's the only
entity. And there's nobody else to provide that answer to. So the only thing it can do is to,
you know, answer it by demonstration. So it like, you know, recreates the big bang, right? And resets
the clock, right? But I can try to give kind of a different version of the answer, you know, maybe
not on the behalf of all humanity. I think that that might be a little presumptuous for me to
speak about the meaning of life on the behalf of all humans. But at least, you know, personally,
it changes, right? And if you think about kind of what gives you and your life meaning and purpose
and kind of what drives you, it seems to change over time, right? And the lifespan of your existence.
You know, when you just enter this world, right, it's all about kind of new experiences, right?
You get like new smells, new sounds, new emotions, right? And like, that's what's driving you, right?
You're experiencing new amazing things, right? And that's magical, right? That's pretty awesome,
right? That gives you kind of a meaning. Then you get a little bit older, you start more intentionally
learning about things, right? I guess actually, before you start intentionally learning,
probably fun. Fun is a thing that gives you kind of meaning and purpose and purpose and the
thing you optimize for, right? And fun is good. Then you get, you know, start learning. And I
guess that this joy of comprehension and discovery is another thing that gives you meaning and purpose
and drives you, right? Then, you know, you learn enough stuff and you want to give some of it back,
right? And so impact and contributions back to technology or society, people,
you know, local or more globally becomes a new thing that drives a lot of kind of your behavior
and something that gives you purpose and that you derive positive feedback from, right?
Then you go and so on and so forth. You go three areas stages of life. If you have kids,
like that definitely changes your perspective on things, you know, I have three that definitely
flips some bits in your head in terms of what you care about and what you optimize for and,
you know, what matters, what doesn't matter, right? So, you know, and so on and so forth,
right? And it seems to me that, you know, it's all of those things. And as you kind of go through
life, you know, you want these to be additive, right? New experiences, fun, learning impact,
like you want to, you know, be accumulating. Although, you know, I don't want to, you know,
stop having fun or experiencing new things. And I think it's important that, you know,
it just kind of becomes additive as opposed to a replacement or subtraction. But, you know,
those few as far as I got, but, you know, ask me in a few years, I might have one or two more
to add to the list. And before you know it, time is up just like it is for this conversation.
But hopefully it was a fun ride. It was a huge honor to meet you as you know, I've been a fan of
yours and a fan of Google self-driving car and Waymo for a long time. I can't wait. I mean,
it's one of the most exciting. If we look back in the 21st century, I truly believe it'll be one
of the most exciting things we descendants of apes have created on this earth. So, I'm a huge fan,
and I can't wait to see what you do next. Thanks so much for talking to me.
Thanks. Thanks for having me. And it's also a huge fan. Honestly, I really enjoyed it. Thank you.
Thanks for listening to this conversation with Dmitry Dolgov. And thank you to our sponsors,
Trial Labs, a company that helps businesses apply machine learning to solve real-world problems,
Blinkist, an app I use for reading through summaries of books, better help online therapy
with a licensed professional, and Cash App, the app I use to send money to friends.
Please check out these sponsors in the description to get a discount and to support this podcast.
If you enjoy this thing, subscribe on YouTube, review it with five stars on an upper podcast,
follow on Spotify, support on Patreon, or connect with me on Twitter at Lex Freedman.
And now, let me leave you with some words from Isaac Asimov. Science can amuse and
fascinate us all, but it is engineering that changes the world. Thank you for listening
and hope to see you next time.