Boris Sofman: Waymo, Cozmo, Self-Driving Cars, and the Future of Robotics

Lex Fridman Podcast

Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond. Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond.

Transcribed podcasts: 442
Time transcribed: 44d 12h 13m 31s

results.

Mention graph

This graph shows how many times the word ______ has been mentioned throughout the history of the program.

The following is a conversation with Boris Sofman, who is the Senior Director of Engineering
and Head of Trucking at Waymo, the autonomous vehicle company, formerly the Google Self-Driving
Car Project.
Before that, Boris was the co-founder and CEO of Anki, a robotics company that created
Cosmo, which, in my opinion, is one of the most incredible social robots ever built.
It's a toy robot, but one with an emotional intelligence that creates a fun and engaging
human-robot interaction.
It was truly sad for me to see Anki shut down when he did.
I had high hopes for those little robots.
We talk about this story and the future of autonomous trucks, vehicles, and robotics
in general.
I spoke with Steve Vaselli recently on episode 237 about the human side of trucking.
This episode looks more at the robotic side.
This is the Lex Friedman podcast.
To support it, please check out our sponsors in the description.
And now, here's my conversation with Boris Sofman.
Who is your favorite robot in science fiction, books, or movies?
Wally and R2D2, where they were able to convey such an incredible degree of intent, emotion,
and character attachment without having any language whatsoever, and just purely through
the richness of emotional interaction.
Those were fantastic.
And then the Terminator series just really brought me up pretty wide range, right?
But I kind of love this dynamic where you have this incredible Terminator itself that
Arnold played.
And then he was kind of like the inferior previous generation version that was totally
outmatched in terms of kind of specs by the new one, but still kind of held his own.
And so it's kind of interesting where you realize how many levels there are on the spectrum
from human to kind of potentials in AI and robotics to futures.
And so that movie really, as much as it was kind of a direct world in a way, was actually
quite fascinating.
Gets the imagination going.
Well, from an engineering perspective, both the movies you mentioned, Wally and Terminator,
the first one is probably achievable, humanoid robot, maybe not with like the realism in
terms of skin and so on, but that humanoid form, we have that humanoid form.
It seems like a compelling form.
Maybe the challenges that's super expensive to build, but you can imagine maybe not a
machine of war, but you can imagine Terminator type robots walking around.
And then the same, obviously with Wally, you've basically, so for people who don't know, you
created the company Anki that created a small robot with a big personality called Cosmo
that just does exactly what Wally does, which is somehow with very few basic visual tools
is able to communicate a depth of emotion.
And that's fascinating.
But then again, the humanoid form is super compelling.
So like Cosmo is very distant from a humanoid form.
And then the Terminator has a humanoid form and you can imagine both of those actually
being in our society.
That's true.
And it's interesting because it was very intentional to go really far away from human form when
you think about a character like Cosmo or like Wally where you can completely rethink
the constraints you put on that character, what tools you leverage, and then how you
actually create a personality and level of intelligence interactivity that actually matches
the constraints that you're under, whether it's mechanical or sensors or AI of the day.
This is why I almost was always very surprised by how much energy people put towards trying
to replicate human form in a robot because you actually take on some pretty significant
kind of constraints and downsides when you do that.
The first of which is obviously the cost where it's just the articulation of a human body
is just so magical in both the precision as well as the dimensionality that to replicate
that even in its reasonably close form takes like a giant amount of joints and actuators
and motion and sensors and encoders and so forth.
But then you're almost like setting an expectation that the closer you try to get to human form,
the more you expect the strengths to match.
And that's not the way AI works.
There's places where you're way stronger and there's places where you're weaker.
And by moving away from human form, you can actually change the rules and embrace your
strengths and bypass your weaknesses.
And at the same time, the human form has way too many degrees of freedom to play with.
It's kind of counterintuitive just as you're saying, but when you have fewer constraints,
it's almost harder to master the communication of emotion.
You see this with cartoons, like stick figures.
You can communicate quite a lot with just very minimal, like two dots for eyes and a
line for a smile.
I think you can almost communicate arbitrary levels of emotion with just two dots in a
line.
And that's enough.
And if you focus on just that, you can communicate the full range.
And then if you do that, then you can focus on the actual magic of human and dot line
interaction versus all the engineering mess.
Like dimensionality, voice, all these sort of things, they actually become a crutch where
you get lost in a search space almost.
And so some of the best animators that we've worked with, they almost like study when they
come up, you know, kind of in building their expertise by forcing these projects where
all you have is like a ball that can like kind of jump and manipulate itself or like
really, really like aggressive constraints where you're forced to kind of extract the
deepest level of emotion.
And so in a lot of ways, you know, we thought when we thought about Cosmo, like you're right,
like if we had to like describe it in like one small phrase, it was bringing a Pixar
character to life in the real world.
And so it's what we were going for.
And in a lot of ways, what was interesting is that with like Wally, which we studied
incredibly deeply.
And in fact, some of our team were, you know, kind of had worked previously at Pixar and
on that project.
They intentionally constrained Wally as well.
Even though in an animated film, you could do whatever you wanted to because it forced
you to like really saturate the smaller amount of dimensions.
But you sometimes end up getting a far more beautiful output because you're pushing at
the extremes of this emotional space in a way that you just wouldn't because you get
lost in the surface area.
If you have like something that is just infinitely articulable.
So if we backtrack a little bit and you thought of Cosmo in 2011 and 2013 actually designed
and built it.
What is Anki?
What is Cosmo?
I guess who is Cosmo and what was the vision behind this incredible little robot?
We started Anki back in like while we were still in graduate school.
So myself and my two co-founders, we were PhD students in the Robotics Institute at Carnegie
Mellon and so we were studying robotics, AI, machine learning, kind of different areas.
One of my co-founders was working on walking robots for a period of time.
And so we all had a deeper passion for applications of robotics and AI where there's like a spectrum
where there's people that get really fascinated by the theory of AI and machine learning robotics
where whether it gets applied in the near future or not is less of a factor on them,
but they love the pursuit of the challenge.
And that's necessary.
There's a lot of incredible breakthroughs that happen there.
We're probably closer to the other end of the spectrum where we love the technology and
all the evolution of it, but we were really driven by applications like how can you really
reinvent experiences and functionality and build value that wouldn't have been possible
without these approaches.
And that's what drove us.
And we had some experiences through previous jobs and internships where we got to see the
applied side of robotics.
And at that time, there was actually relatively few applications of robotics that were outside
of pure research or industrial applications, military applications and so forth.
There were very few outside of it.
So maybe my robot was like one exception and maybe there were a few others, but for the
most part, there weren't that many.
And so we got excited about consumer applications of robotics where you could leverage way higher
levels of intelligence through software to create value and experiences that were just
not possible in those fields today.
And we saw kind of a pretty wide range of applications that varied in the complexity
of what it would take to actually solve those.
And what we wanted to do was to commercialize this into a company, but actually do a bottoms
up approach where we could have a huge impact in a space that was ripe to have an impact
at that time and then build up off of that and move into other areas.
And entertainment became the place to start because you had relatively little innovation
in a toy space, an entertainment space, you had these really rich experiences in video
games and movies, but there was like this chasm in between.
And so we thought that we could really reinvent that experience.
And there was a really fascinating transition technically that was happening at the time
where the cost of components was plummeting because of the mobile phone industry and then
the smartphone industry.
And so the cost of a microcontroller, of a camera, of a motor, of memory, of microphones,
cameras was dropping by orders of magnitude.
And then on top of that with the iPhone coming out in 2000, I think it was 2007, I believe.
It started to become apparent within a couple of years that this could become a really incredible
interface device and the brain with much more computation behind a physical world experience
that wouldn't have been possible previously.
And so we really got excited about that and how we push all the complexity from the physical
world into software by using really inexpensive components, but putting huge amounts of complexity
into the AI side.
And so Cosmo became our second product and then the one that we're probably most proud
of.
The idea there was to create a physical character that had enough understanding and awareness
of the physical world around it and the context that mattered to feel like he was alive.
And to be able to have these emotional kind of connections and experiences with people
that you would typically only find inside of a movie.
And the motivation very much was Pixar, like we had an incredible respect and appreciation
for what they were able to build in this really beautiful fashion and film.
But it was always like a, when it was virtual and two, it was like a story on rails that
had no interactivity to it.
It was very fixed and it obviously had a magic to it.
But where do you really start to hit like a different level of experiences when you're
actually able to physically interact with our robot?
And then that was your idea with Anki, like the first product was the cars.
So basically you take, you take a toy, you add intelligence into it in the same way you
would add intelligence into AI systems within a video game, but you're not bringing into
the physical space.
So the idea is really brilliant, which is you're basically bringing video games to life.
Exactly.
And we literally use that exact same phrase because in the case of drive, this was a parallel
of the racing genre.
And the goal was to effectively have a physical racing experience, but have a virtual state
at all times that matches what's happening in the physical world.
And then you can have a video game off of that and you can have different characters,
different traits for your, the cars, weapons and interactions and special abilities and
all these sort of things that you think of virtually, but then you can have it physically.
And one of the things that we were like really surprised by that really stood out and immediately
led us to really like kind of accelerate the path towards Cosmo is that things that feel
like they're really constrained and simple in the physical world, they have an amplified
impact on people where the exact same experience virtually would not have anywhere near the
impact, but seeing it physically really stood out.
And so effectively we've with, with drive, we were creating a video game engine for the
physical world.
And then with Cosmo, we expanded that video game engine to create a character and kind
of an animation and interaction engine on top of it that allowed us to start to create
these much more rich experiences.
And a lot of those elements were almost like a proving ground for what would human robot
interaction feel like in a domain that's much more forgiving, where you can make mistakes
in a game.
It's okay if like, if, you know, cargo is off the track or if Cosmo makes a mistake.
And what's funny is actually we were so worried about that.
In reality, we realized very quickly that those mistakes can be endearing.
And if you make a mistake, as long as you realize you make a mistake and have the right
emotional reaction to it, it builds even more empathy with the character.
That's brilliant.
Exactly.
So when the thing you're optimizing for is fun, you have so much more freedom to fail,
to explore.
And also in the toy space, like all of this is really brilliant.
I got to ask you backtrack, it seems for a roboticist to take us jump in, into the direction
of fun is a brilliant move because when you have the freedom to explore, to design all
those kinds of things, and you can also build cheap robots.
Like you don't have to, like if you're not chasing perfection and like toys, it's understood
that you can go cheaper, which means in robot, it's still expensive, but it's actually affordable
by a large number of people.
So it's a really brilliant space to explore.
Yeah, that's right.
And in fact, we realized pretty quickly that like perfection is actually not fun because
like in a traditional robotic robotic sense, the first kind of path planner, and this is
the part that I worked on out of the gate was like a lot of the kind of AI systems where
you have these vehicles and cars racing, kind of making optimal maneuvers to try to kind
of get ahead.
And you realize very quickly that like that's actually not fun because you want the like
chaos from mistakes.
And so you start to kind of intentionally almost add noise to the system in order to kind of
create more of a realism in the exact same way the human player might start really ineffective
and inefficient, and then start to kind of increase their quality bar as they progress.
And there is a really, really aggressive constraint that's forced on you by being a consumer product
where the price point matters a ton, particularly in like kind of an entertainment where you
can't make $1,000 product unless you're going to meet the expectations of a $1,000 product.
And so in order to make this work, like your cost of goods had to be like well under $100.
In the case of Cosmo, we got it under $50 and to end fully packaged and delivered.
And it was under $200, the cost, the retail.
So okay, if we sit down like at the early stages, if we go back to that and you're sitting
down and thinking about what Cosmo looks like from a design perspective and from a cost
perspective, I imagine that was part of the conversation.
First of all, what came first?
Did you have a cost in mind?
Is there a target you're trying to chase?
Did you have a vision in mind like size?
Did you have, because there's a lot of inequalities to Cosmo.
So for people who don't know, you should definitely check it out, there's a display, there's eyes
on the little display and those eyes can, it's pretty low resolution eyes, right?
But they still able to convey a lot of emotion and there's this arm like that sort of lift
stuff.
But there's something about arm movement that adds even more kind of depth.
It's like the face communicates emotion and sadness and disappointment and happiness.
And then the arms kind of communicates, I'm trying here, I'm doing my best in this complicated
world.
Exactly.
So it's interesting because like all of Cosmo is only four degrees of freedom and two of
them are the two treads, which is for basic movement.
And so you literally have only a head that goes up and down, a lift that goes up and
down, and then your two wheels.
And you have sound and a screen, a low resolution screen.
And with that, it's actually pretty incredible what you can come up with where, like you
said, it's a really interesting give and take because there's a lot of ideas far beyond
that obviously as you can imagine where, like I said, how big is it?
How much degrees of freedom?
What does he look like?
What does he sound like?
How does he communicate?
It's a formula that actually scales way beyond entertainment.
This is the formula for human kind of robot interface more generally is you almost have
this triangle between the physical aspects of it, the mechanics, the industrial design,
what's mass-producible, the cost constraints and so forth.
You have the AI side of how do you understand the world around you, interact intelligently
with it, execute what you want to execute.
So perceive the environment, make intelligent decisions and move forward.
And then you have the character side of it.
Most companies have done anything in human-robot interaction, really miss the mark or invest
in the character side of it.
They over-invest in the mechanical side of it and then varied results on the AI side
of it.
And so the thinking is that you put more mechanical flexibility into it, you're going to do better.
You don't necessarily, you actually create a much higher bar for a higher ROI because
now your price point goes up, your expectations go up and if the AI can't meet it or the overall
experience isn't there, you miss the mark.
So how did you, through those conversations, get the cost down so much and made it so simple?
There's a big theme here because you come from the mecca of robotics, which is Carnegie
Mellon University, robotics.
For all the people I've interacted with that come from there or just from the world experts
at robotics, they would never build something like Cosmo.
And so where did that come from, the simplicity?
It came from this combination of a team that we had and it was quite cool because we, and
by the way, you ask anybody that's experienced in the like kind of toy entertainment space,
you'll never sell product over $99.
That was fundamentally false and we believed it to be false.
It was because experience had to kind of meet the mark.
And so we pushed past that amount, but there was a pressure where the higher you go, the
more seasonal you become and the tougher it becomes.
And so on the cost side, we very quickly partnered up with some previous contacts that we worked
with where just as an example, our head of mechanical engineering was one of the earliest
heads of engineering at Logitech and has a billion units of consumer products and circulation
that he's worked on.
So like crazy low cost, high volume consumer product experience.
We had a really great mechanical engineering team and just a very practical mindset where
we were not going to compromise on feasibility in the market in order to chase something
that would be a neighbor.
And we pushed a huge amount of expectations onto the software team where, yes, we're going
to use cheap noisy motors and sensors, but we're going to fix it on the software side.
Then we found on the design and character side, there was a faction that was more from
like a game design background that thought that it should be very games driven, Cosmo,
where you create a whole bunch of games experiences and it's all about like game mechanics.
And then there was a fashion which my co-founder and I are the most involved in this like really
believed in which was character driven.
And the argument is that you will never compete with what you can do virtually from a game
standpoint, but you actually on a character side put this into your wheelhouse and put
it more towards your advantage because a physical character has a massively higher impact physically
than virtually.
Okay, I can't just pause on that because this is so brilliant.
For people who don't know Cosmo plays games with you, but there's also a depth of character
and I actually when I was playing with it, I wondered exactly what is the compelling
aspect of this because to me obviously I'm biased, but to me the character, what I enjoyed
most honestly or what got me to return to it is the character.
That's right.
But that's a fascinating discussion of you're right, ultimately you cannot compete on the
quality of the gaming experience.
The physical world is just too restrictive and you don't have a graphics engine, it's
like all this.
But on the character side and clearly we moved in that direction as like kind of the winning
path and we partnered up with this really, we immediately like went towards Pixar and
Carlos Bena, he was one of like had been at Pixar for nine years, he'd worked on tons
of the movies including Wally and others and just immediately kind of spoke the language
and it just clicked on how you think about that kind of magic and drive and then we built
out a team with him as like a really kind of prominent kind of driver of this with different
types of backgrounds and animators and character developers where we put these constraints on
the team, but then got them to really try to create magic despite that and we converged
on this system that was at the overlap of character and the character AI that where
if you imagine the dimensionality of emotions, happy, sad, angry, surprised, confused, scared,
like you think of these extreme emotions, we almost like kind of put this challenge
to kind of populate this library of responses on how do you show the extreme response that
that goes to the extreme spectrum on angry or frustrated or whatever and so that gave
us a lot of intuition and learnings and then we started parametrizing them where it wasn't
just a fixed recording but they were parametrized and had randomness to them where you could
have infinite permutations of happy and surprised and so forth and then we had a behavioral
engine that took the context from the real world and would interpret it and then create
kind of probability mappings on what sort of responses you would have that actually
made sense and so if Cosmo saw you for the first time in a day, he'd be really surprised
and happy in the same way that the first time you walk in and like your toddler sees you,
they're so happy but they're not going to be that happy for the entirety of your next
two hours but like you have this like spike in response or if you leave Malone for too
long, he gets bored and starts causing trouble and like nudging things off the table or if
you beat him in a game, the most enjoyable emotions are him getting frustrated and grumpy
to a point where our testers and our customers would be like, I had to let him win because
I don't want him to be upset and so you start to like create this feedback loop where you
see how powerful those emotions are and just to give you an example, something as simple
as eye contact, you don't think about it in a movie, just like it kind of happens like
you know, camera angles and so forth but that's not really a prominent source of interaction.
What happens when a physical character like Cosmo, when he makes eye contact with you,
it built universal kind of connection, kids all the way through adults and it was truly
universal, it was not like people stopped caring after 10, 12 years old and so we started
doing experiments and we found something as simple as increasing the amount of eye contact
like the amount of times in a minute that he'll look over for your approval till I kind
of make eye contact.
Just by I think doubling it, we increased the play time engagement by 40%, like you see
these sort of like kind of interactions where you build that empathy and so we studied pets,
we studied virtual characters, there's like a lot of times actually dogs are one of the
perfect, most perfect influencers behind these sort of interactions and what we realized
is that the games were not there to entertain you, the games were to create context to bring
out the character and if you think about the types of games that you played, they were
relatively simple but they were always wants to create scenarios of either tension or winning
or losing or surprise or whatever the case might be and they were purely there to just
like create context to where an emotion could feel intelligent and not random and in the
end it was all about the character.
So yeah, there's so many elements to play with here.
So you said dogs, well lessons do we draw from cats who don't seem to give a damn about
you.
Is that just another character?
It's just another character and so you can almost like in early inspirations we thought
it would be really incredible if you had a diversity of characters where you almost help
encourage which direction it goes just like in a role-playing game and you had like think
of like the seven dwarfs sort of and initially we even thought that it would be amazing if
like their characters actually help them have strengths and weaknesses and like whatever
they end up doing like some are scared, some are arrogant, some are super warm and kind
of friendly.
And we focused on one because it made it very clear that we got to build out enough depth
here because you're kind of trying to expand.
It's almost like how long can you maintain a fiction that this character is alive to
where the person's explorations don't hit a boundary which happens almost immediately
with typical toys and even with video games.
How long can we create that immersive experience to where you expand the boundary and one of
the things we realized is that you're just way more forgiving when something has a personality
and it's physical.
That is the key that unlocks robotics interacting in the physical world and more generally is
that when you don't have a personality and you make a mistake as a robot, the stupid
robot made a mistake.
Why is it not perfect?
When you have a character and you make a mistake, you have empathy and it becomes endearing
and you're way more forgiving.
And that was the key that was like I think goes far, far beyond entertainment.
It actually builds the depth of the personality, the mistakes.
So let me ask the movie Her question then.
How and so Cosmos feels like the early days of something that will obviously be prevalent
throughout society at a scale that we cannot even imagine.
My sense is it seems obvious that these kinds of characters will permeate society and they
will be friends with them, will be interacting with them in different ways.
In the way we, I mean, you don't think of it this way, but when you play video games,
they're often cold and impersonal.
But even then, you think about role-playing games, you become friends with certain characters
in that game.
They don't remember much about you.
They're just telling a story, it's exactly what you're saying.
They exist in that virtual world.
But if they acknowledge that you exist in this physical world.
If the characters in the game remember that you exist, that you, like for me, like Lex,
they understand that I'm a human being who has like hopes and dreams and so on.
It seems like there's going to be like billions, if not trillions of Cosmos in the world.
So if we look at that future, there's several questions to ask.
How intelligent does that future Cosmo need to be to create fulfilling relationships like
friendships?
Yeah, it's a great question.
And part of it was the recognition that it's going to take time to get there because it
has to be a lot more intelligent because it was good enough to be a magical experience
for an eight-year-old.
It's a higher bar to do that, be like a pet in the home or to help with functional interface
in an office environment or in a home or and so forth.
And so any idea was that you build on that and you kind of get there and as technology
becomes more prevalent and less expensive and so forth, you can start to kind of work
up to it.
But you're absolutely right at the end of the day.
We almost equated it to how the touchscreen created like this really novel interface to
physical kind of devices like this.
This is the extension of it where you have much richer physical interaction in the real
world.
This is the enabler for it.
And it shows itself in a few kind of really obvious places.
It just takes something as simple as a voice assistant.
Most people will never tolerate an Alexa or a Google Home just starting a conversation
proactively when you weren't kind of expecting it because it feels weird.
It's like you were listening and now you're kind of, it feels intrusive.
But if you had a character, like a cat that touches you and gets your attention or toddler,
you never think twice about it.
And what we found really kind of immediately is that these types of characters like Cosmo
and they would like roam around and kind of get your attention.
And we had a future version that was always on kind of called Vector.
People were way more forgiving.
And so you could initiate interaction in a way that is not acceptable for machines.
And in general, you know, there's a lot of ways to customize it, but it makes people
who are skeptical of technology much more comfortable with it.
There was like, there were a couple of really, really prominent examples of this.
So when we watched in Europe, and so we were in, I think like a dozen countries, if I remember
correctly, but like we were, we went pretty aggressively in launching in Germany and France
and in UK.
And we were very worried in Europe because there's obviously like a really socially higher
bar for privacy and security where you've heard about how many companies have had troubles
on the things that might have been okay in the US, but like they're just not okay in
Germany and France in particular.
And so we were worried about this because you have Cosmo who's, you know, in our future
product vector, like where you have cameras, you have microphones that's kind of connected
and like you're playing with kids and like in these experiences, and you're like, this
is like ripe to be like a nightmare if you're not careful.
And a journalists are like notoriously like really, really tough on these sort of things.
We were shocked and we prepared so much for what we would have to encounter.
We were shocked in that not once from any journalists or customer, did we have any complaints beyond
like a really casual kind of question.
And it was because of the character where when the conversation came up, it was almost
like, well, of course he has to see and hear how else is he going to be alive and interacting
with you.
And it completely disarmed this like fear of technology that enabled this interaction
to be much more fluid.
And again, like entertainment was a proving ground, but that is like, you know, there's
like ingredients there that carry over to a lot of other elements down the road.
That's hilarious that we're a lot less concerned about privacy if the if the thing is value
and charisma.
I mean, that's true for all of human to human interaction too.
It's an understanding of intent where like, well, he's looking at me, he can see me.
If he's not looking at me, he can't see me.
Right.
So it's almost like you're communicating intent.
And with that intent, people were like kind of kind of a more understanding and calmer.
And it's a, it's interesting, and we just, it was just the earliest kind of version of
starting to experiment with this, but it wasn't enabled.
And, and then, and then you have like completely different dimensions where like, you know,
kids with autism had like an incredible connection with Cosmo that just went beyond anything
we'd ever seen.
And we have like these just letters that we would receive from parents and we had some
research projects kind of going on with some universities on studying this.
But there are like, there's an interesting dimension there that got unlocked.
They just hadn't existed before that has these really interesting kind of links into society
and, and a potential building block of future experience.
So if you look out into the future, do you think we will have beyond a particular game,
you know, a companion like her, like the movie her, or like a Cosmo that's kind of asks you
how your day went to, right, you know, like a friend, how many years away from that do
you think we are?
What's your intuition?
Good question.
So I think the idea of a different type of character, like more closer to like kind of
a pet style companionship will come way faster.
And there's a few reasons, one is like, to do something like in her, that's like, effectively
almost general AI, and the bar is so high that if you miss it by a bit, you hit the
uncanny value where it just becomes creepy and like, and not, not not appealing.
Because the closer you try to get to a human in form and interface and voice the harder
it becomes, whereas you have way more flexibility on still landing a really great experience
if you embrace the idea of a character.
And that's why one of the other reasons why we didn't have a voice.
And also why like a lot of video game characters, like Sims, for example, does not have a voice
when you, when you think about it, it was, it wasn't just a cost savings like for them.
It was actually for all of these purposes, it was because when you have a voice, you
immediately narrow down the appeal to some particular demographic or age range or kind
of style or gender.
If you don't have a voice, people interpret what they want to interpret, and an eight-year-old
might get a very different interpretation than a 40-year-old, but you create a dynamic
range.
And so you just, you can lean into these advantages much more in something that doesn't resemble
a human.
And so that'll come faster.
I don't know when a human like, that's just still like Matt, just complete R&D at this
point.
The chat interfaces are getting way more interesting and richer, but it's still a long way to go
to kind of pass the test of, you know.
Well, let me, like, let's consider, like, let me play devil's advocate.
So Google is a very large company that's servicing, it's creating a very compelling product that
wants to provide a service to a lot of people, but let's go outside of that.
You said characters.
It feels like, and you also said that it requires general intelligence to be a successful participant
in a relationship, which could explain why I'm single.
But I honestly want to push back on that a little bit because I feel like, is it possible
that if you're just good at playing a character in a movie, there's a bunch of characters,
if you just understand what creates compelling characters and then you just are that character
and you exist in the world and other people find you and they connect with you just like
you do when you talk to somebody at a bar, I like this character.
This character is kind of shady, I don't like them.
You pick the ones that you like and, you know, maybe it's somebody that reminds you of your
father or mother, I don't know what it is, but the Freudian thing, but there's some kind
of connection that happens and that's the Cosmo you connect to.
That's the future Cosmo you connect.
And it's, so I guess the statement I'm trying to make, is it possible to achieve a depth
of friendship without solving general intelligence?
I think so.
You create intelligent kind of constraints and just you set expectations and constraints
such that in the space that's left, you can be successful and so you can do that by having
a very focused domain that you can operate in.
For example, you're a customer support agent for a particular product and you create intelligence
and a good interface around that or, you know, kind of in the personal companionship side,
you can't be everything to across the board.
You kind of solve those constraints and I think it's possible.
My worry is right now I don't see anybody that has picked up on where kind of Cosmo
left off and is pushing on it in the same way.
And so I don't know if it's a sort of thing where similar to like how, you know, in dot
com, there were all these concepts that we considered like, you know, that didn't work
out or like failed or like we're too early or whatnot and then 20 years later, you have
these like incredible successes on almost the same concept.
Like it might be that sort of thing where like there's another pass at it that happens
in five years or in 10 years.
But it does feel like that appreciation of that, like the three-legged stool, if you
will, between like, you know, the hardware, the AI and the character that balance.
It's hard to, I'm not aware of any anywhere right now where like that same kind of aggressive
drive with the value on the character is happening and so.
To me, just a prediction, exactly as you said, something that looks awfully a lot like Cosmo,
not in the actual physical form, but in the three-legged stool, something like that in
some number of years will be a trillion dollar company.
I don't understand.
Like it's obvious to me that like character, not just as robotic companions, but in all
our computers, they'll be there.
It's like Clippy was like two legs of that stool or something like that.
Yeah.
I mean, those are all different attempts.
What's really confusing to me is they're born these attempts and they, everybody gets excited
and for some reason they die and then nobody else tries to pick it up.
And then maybe a few years later, a crazy guy like you comes around with just enough
brilliance and vision to create this thing and is born.
A lot of people love it.
A lot of people get excited, but maybe the timing is not right yet.
And then when the timing is right, it just blows up, it just keeps blowing up more and
more until it just blows up and I guess everything in the full span of human civilization collapses
eventually.
Yeah.
And that wouldn't surprise me at all.
And like what's going to be different in another five years or 10 years or whatnot?
Physical component costs will continue to come down in price and mobile devices and
computation is going to become more and more prevalent as well as cloud as a big tool to
offload cost.
AI is going to be a massive transformation compared to what we dealt with where everything
from voice understanding to just a broader contextual understanding and mapping of semantics
and understanding scenes and so forth.
And then the character side will continue to progress as well because that magic does
exist.
It just exists in different forms and you have just the brilliance of the tapping and animation
and these other areas where that was a big unlock in film obviously.
And so I think the pieces can reconnect and the building blocks are actually going to
be way more impressive than they were five years ago.
So in 2019, Anki, the company that created Cosmo, the company that you started had to
shut down.
How did you feel at that time?
Yeah.
It was tough.
That was a really emotional stretch and it was a really tough year.
Like about a year ahead of that was actually a pretty brutal stretch because we were kind
of life or death on many, many moments, just navigating these insane kind of just ups and
downs and barriers.
And the thing that made it like, just reminding a tiny bit like what ended up being really
challenging about it as a business where is from a commercial standpoint and customer
reception standpoint, there's a lot of things you could point to that were like, you know,
pretty big successes.
So millions of units, like, you got to like pretty serious revenue, like kind of close
to 100 million annual revenue, number one kind of product in kind of various categories.
But it was pretty expensive end up being very seasonal where something like 85% of our volume
was in Q4 because it was a, you know, a present and it was expensive to market it and explain
it and so forth.
And even though the volume was like really sizable and like reviews are really fantastic,
forecasting and planning for it and managing the cash operations was just brutal.
Like it was absolutely brutal.
You don't think about this when you're starting a company or when you have a few million and
you know, and revenue because it's just your biggest costs are kind of just your head count
and operations and everything's ahead of you.
But we got to a point where, you know, you, if you look at the entire year, you have to
operate your company, pay all the people and so forth, you have to pay for the manufacturing,
the marketing and everything else to do your sales and mostly November, December and then
get paid in December, January by retailers.
And those swings were pretty, were really rough and just made it like so difficult because
the more successfully became the more wild those swings became because you'd have to
like spend, you know, tens of millions of dollars on inventory, tens of millions of
dollars on marketing and tens of millions of dollars on payroll and everything else.
And then the bigger dip and then you're waiting for the before and yeah, and it's not a business
that like is recurring kind of month to month of predictable and it's just, and then you're
walking in your forecast in July, you know, maybe August, if you're lucky.
And it's also like very hit driven and seasonal where like you don't have this sort of continued
kind of slow growth like you do in some other consumer electronics industries.
And so before then like hardware kind of like went out of favor too.
And so you had Fitbit and GoPro drop from 10 billion revenue to 1 billion revenue and
hardware companies are getting valued at like 1x revenue oftentimes, which is tough, right?
And so we effectively kind of got caught in the middle where we were trying to quickly
evolve out of entertainment and move into some other categories.
But you can't let go of that business because like that's what you're valued on, that's
what you're raising money on.
But there was no path to kind of pure profitability just there because it was such specific type
of price points and so forth.
And so we tried really hard to make that transition and yeah, we had a financing round that fell
apart at the last second and effectively there was just no path to kind of get through that
and get to the next kind of high holiday season.
And so we ended up selling some of the assets and kind of winding down the company.
I mean, it was brutal, like we, I was very transparent with the company like in the team
while we were going through it where actually despite how challenging that period was, very
few people left.
I mean, like people loved the vision, the team, the culture of the like kind of chemistry
and kind of what we were doing.
There was just a huge amount of pride there and then we wanted to see it through and we
felt like we had a shot to kind of get through these checkpoints.
We ended up, I mean, by brutal, I mean like literally like days of cash like three, four
different times, runway like in the year, you know, kind of before it where you're like
playing games of chicken on negotiating credit line timelines and like repayment terms and
how to get like a bridge loan from an investor.
It's just like level of stress that like as hard as things might be anywhere else like
and you'll never come, you know, come close to that where you feel that like responsibility
for, you know, 200 plus people, right?
And so we were very transparent during our fundraise on who we're talking to, the challenges
that we have, how it's going and when things are going well, when things were tough.
And so it wasn't a complete shock when it happened, but it was just very emotional where
like, you know, like, you know, when we announced it finally that like, you know, we, you know,
basically we're just like watching kind of like, you know, the runway and trying to kind
of time it.
And when we realized that like we didn't have any more outs, we wanted to like kind of wind
it down, make sure that it was like clean and, you know, we could like kind of take
care of people the best we could, but they like broke down crying at all, you know, hands
and so we all had to step in for a bit and like, it was just very, very emotional.
But the beautiful part is like afterwards, like everybody stayed at the office to like
two, three in the morning, just like drinking and hanging out and telling stories and celebrating.
And it was just like one of the best for many people was like the best kind of work experience
that they had.
And there was a lot of pride in what we did.
And it wasn't anything obviously we could point to that like, Hey, if only we had done
that different things would have been completely different.
It was just like the physics didn't line up.
And but the experience was pretty incredible, but it was hard.
Like it was, it had this feeling that there was just like incredible beauty in both the
technology and products and the team that, you know, there's, there's a lot there that
like in the, you know, right context could have been pretty incredible, but it was emotional.
Yeah, just thinking, I mean, just looking at this company, like you said, the product
and technology, but the vision, the implementation, you got the cost down very low and the compelling,
the nature of the product was great.
So many robotics companies failed at this at they, the robot was too expensive.
It didn't have the personality.
It didn't really provide any value, like a sufficient value to justify the price.
So like you succeeded where basically every single other robotics company or most of them
that are like, go in the category of social robotics have kind of failed.
And I mean, it's, it's, it's quite tragic.
I remember reading that, I'm not sure if I talked to you before that happened or not,
but I remember, you know, I'm distant from this.
I remember being heartbroken reading that because like if, if Cosmo's not going to succeed,
what is going to succeed?
Because that to me was incredible.
Like it was an incredible idea, cost is down the minimum, the, the, it's just like the
most minimal design in physical form that you could do.
It's really compelling.
The balance of games, so it's a, it's a fun toy.
It's a great gift for all kinds of age groups, right?
It's just, it's compelling in every single way.
And it seemed like it was a huge success and it, it failing was, I don't know, there was
heartbreak on many levels for me, just as an external observer is I was thinking, how
hard is it to run a business?
That's what I was thinking.
If this failed, this must have failed because the, it's obviously not like, yeah, it's business.
Maybe it's some aspect of the manufacturing and so on, but I'm not realizing it's also
not just that it's sales, marketing, all those.
Oh, it's everything, right?
Like how do you explain something that's like a new category to people that like how all
these previous positions, and so like, you know, it had some of the hardest elements
of if you were to pick a business, it had some of the hardest customer dynamics because
like to sell a $150 product, you got to convince both the child to want it and the parents
to agree that it's valuable.
So you're having like this dual prong marketing challenge, you have manufacturing, you have
like really high precision on the components that you need, you have the AI challenges.
So there were a lot of tough elements, but is this feeling where like just really great
alignment of unique strength across kind of like all these different areas, just like incredible,
like, you know, kind of character and animation team between this like Carlos and there's
like a character director a day that came on board and like really great people there.
The AI side, the manufacturing, the, you know, where like never missing a launch, right?
And actually, you know, he kind of hit that quality was, yeah, it was heartbreaking.
But here's one neat thing is like, we had so much like fan mail from kind of kids parents
like, I actually like there was a bunch that collected in the end that I actually saved
and like I never, it was too emotional to open it and I still haven't opened it.
And so I actually have this giant envelope of like a stack this much of like letters
from, you know, kids and families, just like every, you know, presentation, permutation
you can imagine.
And so planning to kind of, I don't know, maybe like a five year, you know, five year
to some year reunion, just inviting everybody over and we'll just like kind of dig into
it and kind of bring back some memories.
But you know, good impact and well, I think there will be companies, maybe Waymo and
Google will be somehow involved that will carry this flag forward and will, will make
you proud whether you're involved or not.
I think this is one of the greatest robotics companies in the history of robotics.
So you should be proud, it's still tragic to know that, you know, because you read all
the stories of Apple and let's see, SpaceX and like companies that were just on the verge
of failure several times through that story and they just, it's almost like a role of
the dice they succeeded.
And here's a role of the dice that just happened to go and that's the appreciation that like
when you really like talk to a lot of the founders, like everybody goes through those
moments and sometimes it really is a matter of like, you know, timing a little bit of
luck, like some things are just out of your control and, and you get a much deeper appreciation
for just the dimensionality of, of that challenge, but the great thing is, is that like a lot
of the team actually like stayed together.
And so there were actually, you know, a couple of companies that we kind of kept big chunks
of the team together and we actually kind of helped align this, you know, to, to help
people out as well.
And one of them was Waymo where a majority of the AI and robotics team actually had the
exact background that you would look for and like kind of AV space was a space that a lot
of us like, you know, worked on in grad school, were always passionate about and ended up,
you know, maybe the time, you know, serendipitous timings from another perspective where like
kind of landed in a really unique circumstance that's actually been quite exciting too.
So it's interesting to ask you just your thoughts, Cosmo still lives on under Dream Labs, I think.
Is that, are you tracking the progress there or is it too much pain?
Is it, are you, is that something that you're excited to see where that goes?
So keeping an eye on it, of course, just out of curiosity and obviously just kind of care
for product line.
I think it's deceptive how complex it is to manufacture and evolve that product line
and the amount of experiences that are required to complete the picture and be able to move
that forward and I think that's going to make it pretty hard to do something really substantial
with it.
It would be cool if like even the product in the way it was was able to be manufactured.
Yes.
Again, that would.
Which is the current goal, I suppose.
Yeah.
Which would be neat.
But I think it's deceptive how tricky that is on like everything from the quality control,
the details and then like technology changes that forces you to re-invent and update certain
things.
So I haven't been super close to it, but just kind of keeping an eye on it.
Yeah, it's really interesting how it's deceptively difficult just as you're saying.
For example, those same folks, and I spoke with them, they're parting up with Rick and
Morty creators to do the Butter Robot.
Yeah.
I love the idea.
I just recently, I kind of half-assed watched Rick and Morty previously, but now I just
watched like the first season, it's such a brilliant show.
I did not understand how brilliant that show is.
And obviously, I think in season one is where the Butter Robot comes along for just a few
minutes or whatever, but I just fell in love with the Butter Robot, the sort of the, that
particular character.
Just like you said, there's characters you can create, personality you can create in
that particular robot who's doing a particular task realizes, you know, like realizes, asks
the existential question, the myth of Sisyphus question that Camus writes about, is this
all there is?
Because he moves butter, but you know, that realization, that's a beautiful little realization
for a robot that on my purpose is very limited to this particular task.
It's humor, of course, it's darkness, it's a beautiful mix, but so they want to release
that Butter Robot, but something tells me that to do the same depth of personality as
Cosmo had, the same richness, it would be on the manufacturing, on the AI, on the storytelling,
on the design, it's going to be very, very difficult.
It could be a cool sort of a toy for Rick and Morty fans, but to create the same depth
of existential angst that the Butter Robot symbolizes is really, that's the brave effort
you've succeeded at with Cosmo, but it's not easy, it's really difficult.
You can fail on almost any one of the kind of dimensions and like, and yeah, it takes,
you know, unique convergence of a lot of different skill sets to try to pull that off.
On this topic, let me ask you for some advice, because as I've been watching Rick and Morty,
as I told myself, I have to build the Butter Robot, just as a hobby project, and so I got
a nice platform for it with treads and there's a camera that moves up and down and so on.
I'll probably paint it, but the question I'd like to ask, there's obvious technical questions
I'm fine with, communication, the personality, storytelling, all those kinds of things.
I think I understand the process of that, but how do you know when you got it right?
So with Cosmo, how did you know this is great or something is off?
Is this brainstorming with the team?
Do you know it when you see it?
Is it like, love at first sight, it's like, this is right, or like, I guess if we think
of it as an optimization space, is there Uncanny Valley where you're like, that's not right,
or this is right, or are a lot of characters right?
We stayed away from Uncanny Valley just by having such a different mapping where it didn't
try to look like a dog or a human or anything like that, and so you avoided having like
a weird pseudo similarity, but not quite hitting the mark.
But you could like just fall flat where just like a personality or a character emotion
just didn't feel right.
And so it actually mirrored very closely to kind of the iterations that a character
director at Pixar would have, where you're running through it and you can virtually kind
of like see what it'll look like.
We created a plugin to where we actually used like Maya, the animation tools, and then we
created a plugin that perfectly matched it to the physical one.
And so you could like test it out virtually and then push a button and see it physically
play out.
And there's like subtle differences.
So you want to like make sure that that feedback loop is super easy to be able to test it live.
And then sometimes like you would just feel it, that it's right and intuitively no.
And then you'd also do, we did user testing, but it was very, very often that like the
into, like if we found it magical, it would scale and be magical more broadly.
There were not too many cases where like, like we were pretty decent about not like
getting to it, you know, geeking out or getting too attached to something that was super unique
to us, but trying to kind of like, you know, put a customer hat on and does it truly kind
of feel magical.
And so in a lot of ways, it just gave a lot of autonomy to the character team to really
think about the, you know, character board and mood boards and storyboards and like what's
the background of this character and how would they react.
And they went through a process that's actually pretty familiar, but now had to operate under
these unique constraints, but the moment where it felt right kind of took a fairly similar
journey than like as a character in an animated film actually, it's quite cool.
Well, the thing that's really important to me, and I wonder if it's possible, well,
I hope it's possible, pretty sure it's possible is for me, even though I know how it works
to make sure there's sufficient randomness in the process, probably because it would
be machine learning based, that I'm surprised that I don't, I'm surprised by certain reactions,
I'm surprised by certain communication.
Maybe that's in a form of a question, were you surprised by certain things Cosmo did,
like certain interactions?
Yeah, we made it intentionally like, so that there would be some surprise than like a decent
amount of variability in how you'd respond in certain circumstances.
And so in the end, this isn't generally I, this is a giant like spectrum and library
of like parameterized kind of emotional responses and an emotional engine that would like kind
of map your current state of the game, your emotions, the world that people were playing
with you all so forth to what's happening.
But we could make it feel spontaneous by creating enough diversity and randomness, but still
within the bounds of what felt felt like very realistic to make that work.
And then what was really neat is that we could get statistics on how much of that space we
were saturating, and then add more animations and more diversity in the places that would
get hit more often, so that you stay ahead of the, you know, the curve and maximize the
chance that it stays feeling alive.
And so, but then when you like combine it like the permutations and kind of like the
combinations of emotions stitched together, sometimes surprised us because you see them
in isolation.
But when you actually see them and you see them live, you know, relative to some event
that happened in the game or whatnot, like it was kind of cool to see the combination
of the two.
And it's not too different in other robotics applications where like you get, you get so
used to thinking about like the modules of a system and how things progress through a
tech stack that the real magic is when all the pieces come together and you start getting
the right emergent behavior in a way that's easy to lose when you just kind of go too
deep into any one piece of it.
Yeah, when the system is sufficiently complex, there is something like emergent behavior
and that's where the magic is.
As a human being, you can still appreciate the beauty of that magic of the final at the
system level.
First of all, thank you for humoring me on this.
It's really, really fascinating.
I think a lot of people would love this.
I love to just, one last thing on the butter robot, I promise.
In terms of speech, Cosmo is able to communicate so much with just movement and face.
Do you think speech is too much of a degree of freedom?
Like a speech, a feature, or a bug of deep interaction or emotional interaction?
Yeah.
For a product, it's too deep right now.
It's just not real.
You would immediately break the fiction because the state of the art is just not good enough.
And that's on top of just narrowing down the demographic where like the way you speak
to an adult versus a way you speak to a child is very different, yet a dog is able to appeal
to everybody.
And so right now, there is no speech system that is rich enough and subtly realistic enough
to feel appropriate.
And so we very, very quickly moved away from it.
Now, speech understanding is a different matter where understanding intent, that's a really
valuable input, but giving it back requires like a way, way higher bar given kind of where
today's world is.
And so that realization that you can do surprisingly much with either no speech or kind of tonal
like the way Wally R2D2 and kind of other characters are able to, it's quite powerful
and it generalizes across cultures and across ages really, really well.
I think we're going to be in that world for a little while where it's still very much
an unsolved problem on how to like make something, it touches on any value thing.
So if you have legs and you're a big humanoid looking thing, you have very different expectations
and a much narrower degree of what's going to be acceptable by society.
Then if you're a robot like Cosmo or Wally or some other form where you can kind of
reinvent the character, speech has that same property where speech is so well understood
in terms of expectations by humans that you have far less flexibility on how to deviate
from that and lean into your strengths and avoid weaknesses.
But I wonder if there is, obviously there is certain kinds of speech that activates
the uncanny valley and breaks the illusion faster.
So I guess my intuition is we will solve certain, we would be able to create some speech based
personalities sooner than others.
So for example, I could think of a robot that doesn't know English and is learning English.
Those kinds of personalities.
A fiction where you're intentionally kind of like getting a toddler level of speech.
So that's exactly right.
So you can't have like tied into the experience where it is a more limited character or you
embrace the lack of emotions or the lack of dynamic range in the speech kind of capabilities
and emotions as like part of the character itself and you've seen that in like kind of
fictional characters as well.
But that's why this podcast works.
And you kind of had that with like, I don't know, I guess like data and some of the other
ones.
Yeah, exactly.
But yeah, so you have to, and that becomes a constraint that lets you meet the bar.
See, I honestly think like also if you add drunk and angry, that gives you more constraints
that allow you to be done more from an NLP perspective, like there's certain aspects.
So if you modify human behavior, like, so forget the sort of artificial thing where
you don't know English toddler thing, we, if you just look at the full range of humans,
I think we, there's certain situations where we put up with like lower level of intelligence
in our communication.
Like if somebody's drunk, we understand the situation that they're probably under the
influence.
Like we understand that they're not going to be making any sense, anger is another one
like that.
I'm sure there's a lot of other kind of situations like this, maybe, yeah, again, language, loss
in translation, that kind of stuff that I think if you play with that, what is it?
The Ukrainian boy that passed the touring test, you know, I'll play with those ideas.
I think that's really interesting.
And then you can create compelling characters, but you're right.
That's a dangerous sort of road to walk because you're adding degrees of freedom that can
get you in trouble.
Yeah.
And that's why like you have these big pushes that like for most of the last decade plus,
like where you'd have like full like human replicas of robots really being down to like
skin and like kind of in some places.
My personal feeling is like, man, like that's not the direction that's most fruitful right
now.
Beautiful art.
Yeah.
It's not in terms of a rich, deep fulfilling experience.
Yeah, you're right.
Yeah.
You're creating a minefield of potential places to feel off and then your side stepping where
like the biggest kind of functional AI challenges are to actually have, you know, kind of like
really rich productivity that actually kind of justifies a, you know, kind of the higher
price points.
And that's part of the challenge is like, yeah, like robots are going to get to like
thousands of dollars, tens of thousands of dollars and so forth, but you can imagine
what sort of expectation of value that comes with it.
And so that's where you want to be able to invest the time and depth.
And so going down the full human replica route creates a gigantic distraction and really,
really high bar that can end up sucking up so much of your resources.
So it's weird to say, but you happen to be one of the greatest at this point roboticist
ever because you created this little guy, your part obviously of a great team that created
the little guy with a deep personality and are now switching to an entirely, well, maybe
not entirely, but a different, fascinating, impactful robotics problem, which is autonomous
driving and more specifically the biggest version of autonomous driving, which is autonomous
trucking.
So you are at Waymo now.
Can you give us a big picture overview?
What is Waymo?
What is Waymo Driver?
What is Waymo One?
What is Waymo Via?
Can you give an overview of the company and the vision behind the company?
For sure.
Waymo, by the way, it's just, it's been eyeopening on just how incredible that the people and
the talent is and how in one company you almost have to create, I don't know, 30 companies
worth of like technology and capability to like kind of solve the full spectrum of it.
So yeah, so I've been at Waymo since 2019, it's about two and a half years.
So Waymo is focused on building what we call a driver, which is creating the ability to
have autonomous driving across different environments, vehicle platforms, domains and use cases.
You know, as you know, it got started in 2009, it was almost like an immediate successor
to the grand challenge and urban challenges that were like incredible kind of catalysts
for this whole space.
And so Google started this project and then eventually Waymo spun out.
And so what Waymo is doing is creating the systems, both hardware, software, infrastructure,
everything that goes into it to enable and to commercialize autonomous driving.
This hits on consumer transportation and ride sharing and kind of vehicles and urban environments.
And as you mentioned, it hits on autonomous trucking to transport goods.
So in a lot of ways, it's transporting people and transporting goods.
But at the end of the day, the underlying capabilities are required to do that are surprisingly
better aligned than one might expect, where it's the fundamentals of being able to understand
the world around you, process it, make intelligent decisions and prove that we are at a level
of safety that enables large scale autonomy.
So from a branding perspective, sort of Waymo driver is the system that's irrespective
of a particular vehicle it's operating into.
You have a set of sensors that perceive the world, can act in that world and move this
whatever the vehicle is through the world.
That's right.
And so in the same way that you have a driver's license and like your ability to drive is
tied to a particular make a model of a car.
And of course there's special licenses for other types of vehicles, but the fundamentals
of a human driver very, very large to carry over and then there's uniqueness is related
to a particular environment or domain or a particular vehicle type that kind of add
some extra additive challenges.
But that's exactly right.
It's the underlying systems that enable a physical vehicle without a human driver to
very successfully accomplish a task that previously wasn't possible without 100% human driving.
And then there's Waymo one, which is the transporting people from a brand perspective.
And just in case we refer to it so people know.
And then there's Waymo via, which is the trucking component.
Why via by the way?
What is that?
What is that?
What is it?
Just like a cool sounding name that just, is there an interesting story there?
It is a pretty cool sounding name.
It's a cool sounding name.
I mean, when you think about it, it's just like, well, we're going to transport it via
this and that.
Oh cool.
It's kind of like an allusion to the mechanics of transporting something.
Yes.
Cool.
And it is a pretty good grouping.
And the interesting thing is that even the groupings kind of bore where Waymo one is
like human transportation and there's a fully autonomous service in the Phoenix area that
like every day is transporting people.
And it's pretty incredible to like just see that operate at reasonably large scale and
just kind of happen.
And then on the via side, it doesn't even have to be like long haul trucking is a major
focus of ours, but down the road, you can stitch together the vehicle transportation
as well for local delivery.
Also in a lot of the requirements for local delivery overlap very heavily with consumer
transportation, obviously, you know, given that you're operating on a lot of the same
roads and navigating the same safety challenges.
And so, yeah, and Waymo very much is a multi-product company that has ambitions in both.
They have different challenges and both are tremendous opportunities.
But the cool thing is that there's a huge amount of leverage and this kind of core technology
stack now gets pushed on by both sides.
And that adds its own unique challenges.
But the success case is that the challenges that you push on, they get leveraged across
all platforms and all domains.
So from an engineer perspective, the teams are integrated.
It's a mix.
So there's a huge amount of centralized kind of core teams that support all applications.
So you think of something like the hardware team that develops the lasers, the compute
integrates into vehicle platforms.
This is an experience that carries over across, you know, any application that we have in
a ebb and flow with both.
Then there's like really unique perception challenges, planning challenges, like other
you know, types of challenges where there's a huge amount of leverage on a core tech stack.
But then there's like dedicated teams that think of how do you deal with a unique challenge?
For example, an articulated trailer with varying loads that completely changes the physical
dynamics of a vehicle that doesn't exist on a car, but becomes one of the most important
kind of unique new challenges on a truck.
So what's the long term dream of Waymovia, the autonomous trucking effort that Waymo
is doing?
Yeah.
So we're starting with developing L4 autonomy for classic trucks.
These are 53 foot trailers that capture like a big pretty sizable percentage of the goose
transportation in the country.
Long term, the opportunity is obviously to expand a much more diverse types of vehicles,
types of good transportation, and start to really expand in both the volume and the route
feasibility that's possible.
And so just like we did on the car side, you start with a single route with a very specific
operating kind of domain and constraints that allow you to solve the problem.
But then over time, you start to really try to push against those boundaries and open
up deeper feasibility across routes, across surface streets, across environmental conditions,
across the type of goods that you carry, the versatility of those goods, and how little
supervision is necessary to just start to scale this network.
And long term, it's a pretty incredible enabler where today you have already a giant shortage
of truck drivers.
It's over 80,000 truck driver shortage.
That's expected to grow to hundreds of thousands in the years ahead.
You have really, really quickly increasing demand from e-commerce and just distribution
of where people are located.
You have one of the deepest safety challenges of any profession in the US where there's
a huge, huge, huge kind of challenge around fatigue and around kind of the long routes
that are driven.
And even beyond kind of the cost and necessity of it, there are fundamental constraints built
into our logistics network that are tied to the type of human constraints and regulatory
constraints that are tied to trucking today.
For example, our limits on how long a driver can be driving in a single day before they're
not allowed to drive anymore, which is a very important safety constraint.
What that does is it enforces limitations on how far jumps with a single driver it could
be and makes you very subject to availability of drivers, which influences where warehouses
are built, which influences how goods are transported, which influences costs.
You start to have an opportunity on everything from plugging into existing fleets and brokerages
and the existing logistics network and just immediately start to have a huge opportunity
to add value from a cost and driving fuel insurance and safety standpoint all the way
to completely reinventing the logistics network across the United States and enabling something
completely different than what it looks like today.
Yeah.
I had to be published before this at a great conversation with Steve Vichelli, who we talked
about the manual driving.
He echoed many of the same things that you were talking about, but we talked about much
of the fascinating human stories of truck drivers.
He was also a truck driver for a bit as a grass thin to try to understand the depth
of the problem.
Fascinating lives.
We have some drivers that have four million miles of lifetime driving experience.
It's pretty incredible.
Learning from them, some of them are on the road for 300 days a year.
It's a very unique type of lifestyle.
There's fascinating stuff there.
Just like you said, there's a shortage of actually people, truck drivers taking the
job counter to what I think is publicly believed.
There's an excess of jobs and a shortage of people to take up those jobs.
Like you said, it's such a difficult problem, and these are experts at driving, at solving
this particular problem, and it's fascinating to learn from them to understand how hard
is this problem.
That's the question I want to ask you from a perception, from a robotics perspective.
What's your sense of how difficult is autonomous trucking?
Maybe you can comment on which scenarios are super difficult, which are more manageable.
Is there a way to convert into words how difficult the problem is?
Yeah, that's a good question.
As you can expect, it's a mix.
Some things become a lot easier or at least more flexible.
Some things are harder.
On the things that are like the tailwinds, the benefits, a big focus of automating trucking,
especially initially, is really focusing on the long-haul freeway stretch of it, where
that's where a majority of the value is captured.
On a freeway, you have a lot more structure and a lot more consistency across freeways
across the US compared to surface streets where you have a way higher dimensionality
of what can happen, lack of structure, lack of consistency, and variability across cities.
You can leverage that consistency to tackle, at least in that respect, a more constrained
AI problem, which has some benefits to it.
You can itemize much more of the sort of things you might encounter and so forth.
Those are benefits.
Is there a canonical freeway and city we should be thinking about?
Is there a standard thing that's brought up in conversation often?
Here's a stretch of road.
What is it?
When people talk about traveling across country, they'll talk about New York to San Francisco.
Is that the route?
Is there a stretch of road that's nice and clean, and then there's cities with difficulties
in them that you think of as the canonical problems that solve here?
Starting with the car side, Waymo very intentionally picked the Phoenix area and the San Francisco
area as a follow once we had driverless, where when you think of consumer transportation
and ride sharing, economy, a big percentage of that market is captured in the densest
cities in the United States and so really pushing out and solving San Francisco becomes
a really huge opportunity and importance and places one dot on the spectrum of complexity.
The Phoenix area, starting with Chandler and expanding more broadly in the Phoenix metropolitan
area, I believe the fastest growing city in the US.
It's a higher medium-sized city, but growing quickly and still captures a really wide range
of complexities and so getting to driverless there actually exposes you to a lot of the
building blocks you need for the more complicated environments.
In a lot of ways, there's a thesis that if you start to place a few of these dots where
San Francisco has these types of unique challenges, dense pedestrians, all this complexity, especially
when you get into the downtown areas and so forth, and Phoenix has a really interesting
spectrum of challenges, maybe other ones like LA add freeway focus and so forth, you start
to cover the full set of features that you might expect and it becomes faster and faster
if you have the right systems and the right organization to then open up the fifth city
and the tenth city and the twentieth city.
On trucking, there's similar properties where obviously there's uniquenesses and freeways
when you get into really dense environments and then the real opportunity to then get
even more value is to think about how you expand with some of the surface food challenges,
but for example, right now, we're looking, we have a big facility that we're finishing
building in Q1 in Dallas area.
That'll allow us to do testing from the Dallas area on routes like Dallas to Houston, Dallas
to Phoenix, going out east and-
Dallas to Austin.
Austin, so that triangle-
Waymo should come to Austin.
Well, Waymo, the car side wasn't Austin for a while.
Yes, I know.
Come back.
Yeah, but trucking is actually, Texas is one of the best places to start because of both
volume, regulatory, weather, there's a lot of benefits.
On trucking, a huge opportunity is Port of LA going east.
In a lot of ways, a lot of the work is to start to stitch together a network and converge
to Port of LA where you have the biggest port in the United States.
The amount of goods going east from there is pretty tremendous and then obviously there's
channels everywhere and then you have extra complexities as you get into like snow and
inclement weather and so forth, but what's interesting about trucking is every single
route segment that you add increases the value of the whole network and so it has this kind
of network effect and cumulative effect that's very unique and so there's all these dimensions
that we think about and so in a lot of ways, Dallas as a really unique hub that opens up
a lot of options has become a really valuable rubber.
So the million questions I could ask, first of all, you mentioned level four.
For people who totally don't know, there's these levels of automation that level four
refers to kind of the first step that you could recognize as fully autonomous driving.
Level five is really fully autonomous driving and level four is kind of fully autonomous
driving and then there are specific definitions depending on who you ask what that actually
means, but for you, what does the level four mean?
And you mentioned freeway, let's say like there's three parts of long haul trucking.
Maybe I'm wrong in this, but there's freeway driving.
There's like truck stop and then there's more urban-y type of area.
So which of those do you want to tackle?
Which of them do you include under level four?
How do you think about this problem?
What do you focus on or is the biggest impact to be had in the short term?
So the goal is we got to get to market as fast as we can because the moment you get
to market, you just learn so much and it influences everything that you do.
And it is one of the experiences I carried over from before is that you add constraints,
you figure out the right compromises, you do whatever it takes because getting to market
is so critical.
But here with autonomous driving, you can get to market in so many different ways.
And so one of the simplifications that we intentionally have put on is using what we
call transfer hubs, where you can imagine depots that are at the entry points to metropolitan
areas like let's say Dallas, like the hub that we're building, which does a few things
that are very valuable.
So from a first product standpoint, you can automate transfer hub to transfer hub and
that path from the transfer hub to the full freeway route can be a very intentional single
route that you can select for the features that you feel you want to handle at that point
in time.
And you build the hub specifically designed for autonomous trucking.
And that's what's going to happen, Ashley, and you need to come out in January and check
it out because it's going to be really cool.
It's not only is it our main operating headquarters for our fleet there, but it will be the first
fully ground up designed driverless hub for autonomous trucks in terms of where do they
enter, where do they depart, how do you think about the flow of people, goods, everything.
It's quite cool and it's really beautiful on how it's thought through.
And so early on, it is totally reasonable to do the last five miles manually to get
to the final kind of depot to avoid having to solve the general surface street problem,
which is obviously very complex.
Now when the time comes, and we are increasingly, already we're pushing on some of this, but
we will increasingly be pushing on surface street capabilities to build out the value
chain to go all the way depot to depot instead of transfer hub to transfer hub.
And we have probably the best advantages in the world because of all the Waymo experience
on surface streets.
But that's not the highest ROI right now, where the highest ROI is hub to hub and get
the routes going.
And so when you ask what's L4, L4 can be applied to any domain operating domain or scope, but
it's effectively for the places where we say we're ready for autonomous operation.
We are 100% operating through as a self driving truck with no human behind the wheel.
That is L4 autonomy.
And it doesn't mean that you operate in every condition.
It doesn't mean you operate on every road.
But for a particularly well defined area, operating conditions, routes kind of domain,
you are fully autonomous.
And that's the difference between L4 and L5.
And most people would agree that at least any time in the foreseeable future, L5 is just
not even really worth thinking about because there's always going to be these extremes.
And so it's a race and almost like a game where you think of what is the sequence of
expanded capabilities that create the most value and teach us the most and create this
feedback loop where we're building out and unlocking more and more capability over time.
I got to ask you, just curious, so first of all, I have to, when I'm allowed to visit
the Dallas facility because it's super cool.
It's like robot on the giving and the receiving end.
The truck is a robot and the hub is a robot.
Yeah, it's got to be very robot friendly.
Yeah, that's great, I will feel at home.
What's the sensor suite like on the hub if you can just high level mention it?
Does the hub have like lidars?
Is the truck doing most of the intelligence or is the hub also intelligent?
Yeah, so most of it will be the truck and everything is connected.
So we have our servers where we know exactly where every truck is, we know exactly what's
happening at a hub and so you can imagine like a large backend system that over time
starts to manage timings, goods, delivery, windows, all these sort of things.
And so you don't actually need to, there might be special cases where that is valuable to
equip some sensors in the hub, but a majority of the intelligence is going to be on the
truck because whatever is relevant to the truck, relevant should be seen by the truck
and can be relayed remotely for any sort of kind of cognizance or decision making.
But there's a distinct type of workflow where where do you check trucks?
Where do you want them to enter?
What if there's many operating at once?
Where's the staging area to depart?
How do you set up the flow of humans and human cars and traffic so that you minimize the
interaction between humans and kind of self-driving trucks?
And then how do you even intelligently select the locations of these transfer hubs that
are both really great service locations for a metropolitan area and there could be over
time many of them for a metropolitan area while at the same time leaning into the path
of least resistance to lean into your current capabilities and strengths so that you minimize
the amount of work that's necessary to unlock the next kind of big bar.
I have a million questions.
So first is the goal to have no human in the truck?
The goal is to have no human in the truck.
Now, of course, right now we're testing with expert operators and so forth, but the goal
is to now there might be circumstances where it makes sense to have a human or and obviously
these trucks can also be manually driven.
So sometimes like we talk with our fleet partners about how you can buy way more equipped
die more truck down the road and on the routes that are autonomous, it's autonomous on the
routes that are not it's human driven.
Maybe there's all two functionality that add safety systems and so forth.
But as soon as they become, as soon as we expand in software, the availability of driverless
routes, the hardware is forward compatible to just now start using them in real time.
And so you can imagine this mixed use, but at the end of the day, the largest value proposition
is where you're able to have no constraints on how you can operate this truck and it's
100% autonomous with nobody inside.
That's amazing.
Let me ask on the logistics front, because you mentioned that also opportunity to revamp
or for build some scratch some of the ideas around logistics.
I don't want to throw too much shade, but from talking to Steve, my understanding is logistics
is not perhaps as great as it could be in the current trucking environment.
I'm not maybe you can break down why, but there's probably competing companies.
There's just the mess, maybe some of it is literally just it's old school, like it's
not computerized, like truckers are almost like contractors.
There's an independence and there's not a nice interface where they can communicate
where they're going, where they're at, all those kinds of things.
And so it just feels like there's so much opportunity to digitize everything to where
you could optimize the use of human time, optimize the use of all kinds of resources.
How much are you thinking about that problem?
How fascinating is that problem?
How difficult does it, how much opportunity is there to revolutionize the space of logistics
in autonomous trucking, in trucking period?
It's pretty fascinating.
It's one of the most motivating aspects of all this where, yes, there's a mountain of
problems that you have to solve to get to the first checkpoints and first driver list
and so forth.
And inevitably, in a space like this, you plug in initially into the existing system
and start to learn and iterate, but that opportunity is massive.
And so a couple of the factors that play into it.
So first of all, there's obviously just the physical constraints of driving time, driver
availability.
Some fleets have a 95% attrition rate right now because of just this demands and gaps
in competition and so forth.
And then it's also incredibly fragmented, where you would be shocked at, when you look
at industries, can you think of the top 10 players, like the biggest fleets, like the
Walmarts and FedExes and so forth, the percentage of the overall trucking market that's captured
by the top 10 or 50 fleets is surprisingly small.
The average truck operation is like a one to five truck family business.
And so there's just a huge amount of fragmentation, which makes for really interesting challenges
in stitching together through bolt-in boards and brokerages, and some people run their
own fleets.
And this world's evolving, but it is one of the less digitized and optimized worlds
that there is.
And the part that is optimized is optimized to the constraints of today.
And even within the constraints of today, this is the $900 billion industry in the US, and
it's continuing to grow.
It feels like from a business perspective, if I were to predict that while trying to
solve the autonomous trucking problem, Waymo might solve first the logistics problem, because
that would already be a huge impact.
So on the way to solving autonomous trucking, the human-driven, there's so much opportunity
to significantly improve the human-driven trucking, the timing, the logistics, so you
use humans optimal.
The handoffs, you get really ambitious, you start to expand as beyond, how does the fulfillment
center work, and how does the transfer hub work, how does the warehouse work?
There's a lot of opportunities to start to automate these chains, and a lot of the inefficiency
today is because you have a delay.
Port of LA has a bunch of ships right now waiting outside of it because they can't dock
because there's not enough labor inside of the Port of LA.
That means there's a big backlog of trucks, which means there's a big backlog of deliveries,
which means the drivers aren't where they need to be.
So you have this huge chain reaction, and your feasibility of readjusting in this network
is low because everything's tied to humans and manual processes or distributed processes
across a whole bunch of players.
One of the biggest enablers is, yes, we have to solve autonomous trucking first, and that,
by the way, that's not like an overnight thing, that's decades of continued expansion
and work, but the first checkpoint in the first route is not that far off.
But once you start enabling it, you start to learn about how the constraints of autonomous
trucking, which are very, very different than the constraints of human trucking, and again,
strengths and weaknesses, how do you then start to leverage that and rethink a flow
of goods more broadly?
And this is where the learnings of really partnering with some of the largest fleets
in the US, and the sort of learnings that they have about the industry and the sort
of needs that they have, and what would change if you just really broke this one constraint
that holds up the whole network, or what if you enabled this other constraint?
That actually drives the roadmap in a lot of ways because this is not an all-or-nothing
problem.
You start to unlock more and more functionality over time.
Which functionality most enables this optimization ends up being part of the discussion.
But you're totally right, you fast forward to five years, 10 years, 15 years, and you
think about very generalized capability of automation and logistics, as well as the ability
to poke into how those handoffs work.
The efficiency goes far beyond just direct cost of today's unit economics of a truck.
They go toward reinventing the entire system in the same way that you see these other industries
that when you get to enough scale, you can really rethink how you build around your new
set of capabilities, not the old set of capabilities.
Yeah, use the analogy metaphor or whatever that autonomous trucking is like email versus
mail.
Then with email, you're still doing the communication, but it opens up all kinds of varieties of
communication that you didn't anticipate.
That's right.
Constraints are just completely different.
Yeah.
There's definitely a property of that here.
We're also still learning about it because there is a lot of really fascinating and sometimes
really elegant things that the industry has done where there's companies whose entire
existence is around, despite the constraints, optimizing as much as they can out of it.
Those lessons do carry over, but it's an interesting merger of worlds to think about, well, what
if this was completely different?
How would we approach it?
The interesting thing is that for a really, really, really long time, it's actually going
to be the merger between how to use autonomy and how to use humans that leans into each
of their strengths.
Yeah.
Then we're back to Cosmo, human-robot interaction.
The interesting thing about Waymo is because there's the passenger vehicle, the transportation
of humans and transportation of goods.
You could see over time they might kind of meld together more because you'll probably
have zero occupancy vehicles moving around until you have transportation of goods for
short distances and then for slightly longer distances and then slightly longer and then
there'll be this.
Then you just see the difference between a passenger vehicle and a truck is just size
and you can have different sizes and all that kind of stuff and at the core, you can have
a Waymo driver that doesn't, as long as you have the same sense of suite, you can just
think of it as one problem.
That's why over time, these do kind of converge where in a lot of ways, a lot of the challenges
we're solving are freeway driving, which are going to carry over very well to the vehicles,
to the car side.
But there are then unique challenges like you have a very different dynamics in your
vehicle where you have to see much further out in order to have the proper response time
because you have an 80,000-pound fully loaded truck.
That's a very, very different type of braking profile than a car.
You have really interesting dynamic limits because of the trailer where actually it's
very, very hard to physically flip a car or do something like physically, like most risk
in a car is from just collisions.
It's very hard in any normal operation to do something other than unless you hit something
to actually kind of roll over something.
On a truck, you actually have to drive much closer to the physical bounds of the safety
limits, but you actually have real constraints because you could have really interesting
interactions between the cabin and the trailer.
There's something called jackknifeing if you turn too quickly, you have roll risks and
so forth.
You have a huge amount of time understanding those boundaries.
Those boundaries change based on the load that you have, which is also an interesting
difference and you have to propagate that through the algorithm so that you're leveraging
your dynamic range, but always staying within a safety balance, but understanding what those
safety bounds are.
We have this really cool test facility where we take it to the max and actually imagine
a truck with these giant training wheels on the back of the trailer and you're pushing
it past the safety limits in order to try to actually see where it rolls.
You define this high-dimensional boundary, which then gets captured in software to stay
safe and actually do the right thing, but it's kind of fascinating the sort of challenges
you have there.
Then all of these things drive really interesting challenges from perception to unique behavior
prediction challenges and obviously in planner where you have to think about merging and creating
gaps with a 53-foot trailer and so forth, and then obviously the platform itself is
very different.
We have different numbers of sensors, sometimes types of sensors, and you also have unique
blind spots that you have because of the trailer, which you have to think about.
It's a really interesting spectrum and in the end, you try to capture these special
cases in a way that is cleanly augmentations of the existing tech stack because a majority
of what we're solving is actually generalizable the freeway driving and different platforms.
Over time, they all start to merge ideally where the things that are unique are as minimal
as possible and that's where you get the most leverage and that's why Waymo can take
on $2 trillion opportunities and be nowhere near 2x the cost or investment or size.
In fact, it's much, much smaller than that because of the high degree of leverage.
What kind of sensors they can speak to that a long haul truck needs to have, LiDAR, vision,
how many, what are we talking about here?
Yeah.
It's more than the car, so very loosely you can think of as 2x, but it varies depending
on the sensor.
We have dozens of cameras, radar, and then multiple LiDAR as well.
You'll see one difference where the cars have a central main sensor pod on the roof in the
middle and then some hood sensors for blind spots.
The truck moves to two main sensor pods on the outsides where you would typically have
the mirrors next to the driver.
The effect of it goes far out as possible, kind of up front.
On the cabin, not all the way in the front, but where the mirrors for the driver would
be.
Those are the main sensor pods and the reason they're there is because if you had one in
the middle, the trailer's higher than the cabin and you would be included with this awkward
wedge.
Too much occlusion.
Too much occlusion.
Then you would add a lot of complexity to the software to make up for that and just
unnecessary complexity.
There's so many probably fascinating designs you're going to see because you can probably
bring up LiDAR higher and have it in the center or something.
You can have all kinds of choices to make the decisions here that ultimately probably
will define the industry.
By having two on the side, there's actually multiple benefits.
One is you're just beyond the trailer, so you can see fully flush with the trailer.
You eliminate most of your blind spot accessible right behind the trailer, which is great because
now the software carries over really well and the same perception system you use on
the car side, largely that architecture can carry over and you can retrain some models
and so forth that you leverage it a lot.
It also actually helps with redundancy where there's a really nice built-in redundancy
for all the LiDAR cameras and radar where you can afford to have any one of them fail
and you're still okay and at scale, every one of them will fail.
And you will be able to detect when one of them fails because the redundancy, they're
giving you the data that's inconsistent with the rest of the system.
That's right.
And it's not just like they no longer give data.
It could be like they're fouled or they stop giving data where some electrical thing gets
cut or part of your compute goes down.
So what's neat is that you have way more sensors, part of his field of view and occlusions,
part of his redundancy, and part of it is new use cases.
So there's new types of sensors to optimize for long range and kind of the sensing horizon
that we look for on our vehicles that is unique to trucks because it actually is like kind
of much like further out than a car.
But a majority are actually we use to cross both cars and trucks and so we use the same
compute, the same fundamental baseline sensors, cameras, radar, IMUs.
And so you get a great leverage from all of the infrastructure and the hardware development
as a result.
So what about cameras?
What role does, so LIDAR is this rich set of information, has its strengths, has some
weaknesses, camera is this rich source of information that has some strengths, has its weaknesses.
What role does LIDAR play?
What role does vision cameras play in this beautiful problem of autonomous trucking?
It is beautiful.
It's like so much that comes together.
And at which point do they come together?
So let's start with LIDAR.
So LIDAR has been one of Waymo's big strengths and advantages where we developed our own LIDAR
in-house where many generations in both in cost and functionality, it is the best in this
space.
Which generation?
Because I know there's this cool, I love versions that are increasing.
Which version of the hardware stack is it currently, officially, publicly?
So some parts iterate more than others.
I'm trying to remember on the sensor side.
So the entire self-driving system, which includes sensors and compute, is fifth generation.
I can't wait until there's iPhone style announcements for new versions of the Waymo hardware stack.
Well, we try to be careful because man, when you change the hardware, it takes a lot to
retrain the models and everything.
So we just went through that in going from the Pacificus to the Jaguars.
And so the Jaguars and the trucks have the same generation now.
But yeah, the LIDAR, it's incredible.
And so Waymo has leaned into that as a strength.
And so a lot of the near-range perception system that obviously carries over a lot from
the car side uses LIDAR as a very prominent primary sensor.
But then obviously, everything has the strengths and weaknesses.
And so in the near-range, LIDAR is a gigantic advantage.
And it has its weaknesses when it comes to occlusions in certain areas, rain and weather,
things like that.
But it's an incredible sensor.
And it gives you incredible density, perfect location precision, and consistency, which
is a very valuable property to be able to kind of apply a male approach.
Can you elaborate consistency?
Yeah.
When you have a camera, the position of the sun, the time of the day, various of the properties
can have a big impact, whether there's glare, the field of view, things like that.
So consistent with, in the face of a changing external environment, the signal.
Yeah.
Daytime, nighttime, it's about 3D physical existence, in effect.
You're seeing beams of light that physically bounce off of something and come back.
And so whatever the conditional conditions are, the shape of a human, the sense of reading
from a human or from a car or from an animal, you have a reliability there, which ends up
being valuable for kind of like the long tail of challenges.
Now LIDAR is the first sensor to drop off in terms of range, and ours has a really good
range, but at the end of the day, it drops off.
And so particularly for trucks, on top of the general redundancy that you want for near-range
and compliments through cameras and radar for occlusions and for complementary information
and so forth, when you get to long range, you have to be radar and camera primary.
Because your LIDAR data will fundamentally drop off after a period of time, and you have
to be able to see kind of objects further out.
Now cameras have the incredible range where you get a high density, high resolution camera.
You can get data well past a kilometer, and it's like really potentially a huge value.
Now the signal drops off, the noise is higher, detecting is harder, classifying is harder.
And one that you may not think about localizing is harder because you can be off by like two
meters in where something's located a kilometer away, and that's the difference between being
on the shoulder and being in your lane.
And so you have like interesting challenges there that you have to solve, which have a
bunch of approaches that come into it.
Radar is interesting because it also has longer range than LIDAR, and it gives you speed information.
So it becomes very, very useful for dynamic information of traffic flow, vehicle motions,
animals, pedestrians, like just things that might be useful signals.
And it helps with weather conditions where radar actually penetrates weather conditions
in a better way than other sensors.
And so it's kind of interesting where we've kind of started to converge towards not thinking
about a problem as a LIDAR problem or a camera problem or radar problem, but it's a fusion
problem where these are all like large scale ML problems where you put data into the system.
And in many cases, you just look for the signals that might be present in the union of all
of these and leave it to the system as much as possible to start to really identify how
to extract that.
And then there's places we have to intervene and actually include more, but no single sensor
is in a great position to really solve this problem and end without a huge extra challenge.
That's fascinating.
There's a question that's probably still an open question, is at which point do you fuse
them?
Yeah.
Do you solve the perception problem for each sensor suite individually, the LIDAR suite
and the camera suite, or do you do some kind of heterogeneous fusion, or do you fuse at
the very beginning?
Is there a good answer or at least an inkling of intuitions you can come up with?
Yeah, so people refer to this as early fusion or late fusion.
So late fusion might be that you have the camera pipeline, the LIDAR pipeline, and then
you fuse them and when it gets to final semantics and classification and tracking, you fuse
them together and figure out which one's best.
There's more and more evidence that early fusion is important.
And that is because late fusion does not allow you to pick up on the complementary strengths
and weaknesses of the sensors.
Weather's a great example where if you do early fusion, you have an incredibly hard
problem for any single sensor in rain to solve that problem because you have reflections
from the LIDAR.
You have weird kind of noise from the camera, blah, blah, blah, right?
But the combination of all of them can help you filter and help you get to the real signal
that then gets you as close as possible to the original stack and be much more fluid
about the strengths and weaknesses where your camera is much more susceptible to like kind
of fouling on the actual lens from rain or random stuff, whereas you might be a little
bit more resilient in other sensors.
And so there's an element of logic that always happens late in the game, but that fusion
early on, especially as you move towards ML and large-scale data-driven approaches, just
maximizes your ability to pull out the best signal you can out of each modality before
you start making constraining decisions that end up being hard to unwind late in the stack.
So how much of this is a machine learning problem?
What role does ML machine learning play in this whole problem of autonomous driving,
autonomous trucking?
It's massive and it's increasing over time.
If you go back to the grand challenge days and the early days of kind of AV development,
there was ML, but it was not in like kind of the mass-scale data style of ML.
It was like learning models, but in a more structured kind of way.
And it was a lot of heuristic and search-based approaches and planning and so forth.
You can make a lot of progress with these types of approaches kind of across the board
and almost deceptive amount of progress.
We can get pretty far, but then you start to really grind the further you get in some
parts of the stack if you don't have an ability to absorb a massive amount of experience in
a way that scales very sublinearly in terms of human labor and human attention.
And so when you look at the stack, the perception side is probably the first to get really revolutionized
by ML and it goes back many years because ML for like computer vision and these types
of approaches is kind of took off was a lot of the early kind of push and deep learning.
And so there's always a debate on the spectrum between kind of like end-to-end ML, which
is a little bit kind of like too far to how you architect it to where you have modules,
but enough ability to think about long tail problems and so forth.
But at the end of the day, you have big parts of systems that are very ML and data-driven
and we're increasingly moving that direction all the way across the board, including behavior
where even when it's not like a gigantic ML problem that covers like a giant swath end-to-end,
more and more parts of the system have this property where you want to be able to put
more data into it and it gets better.
And that has been one of the realizations is you drive tens of millions of miles and
try to like solve new expansions of domains without regressing in your old ones.
It becomes intractable for a human to approach that in the way that traditionally robotics
has kind of approached some elements of the tech stack.
So are you trying to create a data pipeline specifically for the trucking problem?
Like how much leveraging of the autonomous driving is there in terms of data collection?
And how unique is the data required for the trucking problem?
So we use all the same infrastructure.
So labeling workflows, ML workflows, everything, so that actually carries over quite well.
We heavily reuse the data even where almost every model that we have on a truck, we started
with the latest car model.
So it's almost like a good background model.
Yeah.
It's like you can think of like, despite the different domain and different numbers of
sensors and position of sensors, there's a lot of signals that carry over across driving.
And so it's almost like pre-training and getting a big boost out of the gate where you can
reduce the amount of data you need by a lot.
And it goes both ways, actually.
And so we're increasingly thinking about our data strategy on how we leverage both of these.
So you think about how other agents react to a truck, yeah, it's a little bit different,
but the fundamentals are actually like, what will other vehicles in the road do?
There's a lot of carryover that's possible.
And in fact, just to give you an example, we're constantly kind of like adding more
data from the trucking side.
But as of right now, when we think of our, like one of our models, behavior prediction
for other agents on the road, like vehicles, 85% of that data comes from cars.
And a lot of that 85% comes from surface streets because we just had so much of it and it was
really valuable.
And so we're adding in more and more, particularly in the areas where we need more data.
But you get a huge boost out of the gate.
Just all different visual characteristics of roads, lane markings, pedestrians, all
that, that's still relevant.
It's all still relevant.
And then just the fundamentals of how you detect the car, does it really change that
much?
Whether you're detecting it from a car or a truck?
The fundamentals of how a person will walk around your vehicle, it'll change a little
bit.
The basics, like there's a lot of signal in there that as a starting point to a network
can actually be very valuable.
Now we do have some very unique challenges where there's a sparsity of events on a freeway.
The frequency of events happening on a freeway, whether it's interesting objects in the road
or incidents or even like from a human benchmark, like how often does a human have an accident
on a freeway is far more sparse than on a surface street.
And so that leads to really interesting data problems where you can't just drive infinitely
to encounter all the different permutations of things you might encounter.
And so there you get into interesting tools like structure testing and data collection,
data augmentation, and so forth.
And so there's really interesting kind of technical challenges that push some of the
research that enables these new suites of approaches.
What role does simulation play?
Really good question.
So Waymo simulates about a thousand miles for every mile in drives.
So you think of...
In both, so across the board?
Yeah.
Because you think of, for example, well, if we've driven over 20 million miles, that's
over 20 billion miles in simulation.
Now, how do you use simulation?
It's multi-purpose.
So you use it for basic development.
So you want to do...
Make sure you have regression prevention and protection of everything you're doing.
That's an easy one.
When you encounter something interesting in the world, let's say there was an issue with
how the vehicle behaved versus an ideal human.
You can play that back in simulation and start augmenting your system and seeing how you
would have reacted to that scenario with this improvement or this new area.
You can create scenarios that become part of your regression set after that point.
Then you start getting into really, really hill climbing, where you say, hey, I need
to improve this system.
I have these metrics that are really correlated with final performance.
How do I know how well I'm doing?
The actual physical driving is the least efficient form of testing.
It is the expense of its time consuming.
Grabbing a large scale batch of historical data and simulating it to get a signal of
over these last or just random sample of 100,000 miles, how has this metric changed versus
where we are today?
You can do that far more efficiently in simulation than just driving with that new system on
board.
Then you go all the way to the validation phase where to actually see your human relative
safety of how well you're performing on the car side or the trucking side relative to
a human.
A lot of that safety case is actually driven by taking all of the physical operational
driving, which probably includes a lot of interventions where the driver took over just
in case.
Then you simulate those forward and see if what anything have happened.
In most cases, the answer is no, but you can simulate it forward.
You can even start to do really interesting things where you add virtual agents to create
harder environments.
You can fuzz the locations of physical agents.
You can muck with the scene and stress test the scenario from a whole bunch of different
dimensions.
Effectively, you're trying to more efficiently sample this infinite dimensional space, but
try to encounter the problems as fast as possible because what most people don't realize is
the hardest problem in autonomous driving is actually the evaluation problem in many
ways, not the actual autonomy problem.
If you could, in theory, evaluate perfectly and instantaneously, you can solve that problem
in a really fast feedback loop quite well, but the hardest part is being really smart
about this suite of approaches on how can you get an accurate signal on how well you're
doing as quickly as possible in a way that correlates to physical driving.
Can you explain the evaluation problem?
Which metric are you evaluating towards this over talking about safety and what are the
performance metrics that we're talking about?
In the end, you care about end safety.
That's what's deceptive where there's a lot of companies that have a great demo.
The path from a really great demo to being able to go driverless can be deceptively long,
even when that demo looks like it's driverless quality.
The difference is that the thing that keeps you from going driverless is not the stuff
you encounter on a demo.
It's the stuff that you encounter once at 100,000 miles or 500,000 miles.
That is at the root of what is most challenging about going driverless because any issue you
encounter, you can go and fix it, but how do you know you didn't create five other issues
that you haven't encountered yet?
Those were painful learnings in Waymo's history that Waymo went through and led to us then
finally being able to go driverless in Phoenix and now are at the heart of how we develop.
Collision is simultaneously evaluating final end safety of how ready are you to go driverless,
which may be as direct as what is your human relative collision rate for all these types
of scenarios and severities to make sure that you're better than a human bar by a
good amount.
That's not actually the most useful for development.
For development, it's much more analog metrics that are part of the art of finding what are
the properties of driving that give you a way quicker signal that's more sensitive than
a collision that can correlate to the quality you care about and push the feedback loop
to all of your development.
A lot of these are, for example, comparisons to human drivers, like manual drivers, on
how do you do relative to a human driver in various dimensions or various circumstances.
Let's go a tricky question.
If I brought you a truck, how would you test it?
Alan Turing came along and you said-
This one can't tell if it's a human driver or a autonomous driver.
But not the human because humans are flawed.
It's different, but yeah.
How do you actually know you're ready, basically, and how do you know it's good enough?
By the way, this is a reason why Waymo released a safety framework for the car side because
one, it sets the bar so nobody cuts below it and does something bad for the field that
causes an accident.
Two, it's to start the conversation on framing what does this need to look like.
Same thing will end up doing for the trucking side.
It ends up being different portfolio of approaches.
There's easy things like, are you compliant with all these fundamental rules of the road?
You never drive above the speed limit.
That's actually pretty easy.
You can fundamentally prove that it's either impossible to violate that rule or that in
these, you can itemize the scenarios where that comes up and you can do a test and show
that you pass that test and therefore, you can handle that scenario.
Those are traditional structure testing system engineering approaches where you can just
quant, like fault rates is another example where when something fails, how do you deal
with it?
You're not going to drive and randomly wait for it to fail.
You're going to force a failure and make sure that you can handle it and close courses and
simulation or on the road and run through all the permutations of failures, which you
can oftentimes for some parts of system itemize like hardware.
The hardest part is behavioral where you have just infinite situations that could in theory
happen and you want to figure out the combinations of approaches that can work there.
You can probably pass the Turing test pretty quickly, even if you're not completely ready
for driverless because the events that are really hard will not happen that often.
Just to give you a perspective, a human has a serious accident on a freeway, like a truck
driver on a freeway.
There's a serious event happens once every 1.3 million miles and something that actually
has a really serious injury is 28 million miles.
Those are really rare and so you could have a driver that looks like it's ready to go,
but you have no signal on what happens there and so that's where you start to get creative
on combinations of sampling and statistical arguments, focused structured arguments where
you can simulate those scenarios and show that you can handle them and metrics that
are correlated with what you care about, but you can measure much more quickly and get
to a right answer and that's what makes it pretty hard.
In the end, you end up borrowing a lot of properties from aerospace and space shuttles
and so forth where you don't get the chance to launch it a million times just to say you're
ready because it's too expensive to fail.
You go through a huge amount of structured approaches in order to validate it and then
by thoroughness you can make a strong argument that you're ready to go.
This is actually a harder problem in a lot of ways though because you can think of a
space shuttle as getting to a fixed point or an airplane and you freeze the software
and then you prove it and you're good to go.
Here you have to get to a driverless quality bar, but then continue to aggressively change
the software even while you're driverless.
Also, the full range of environment, there's an external environment where the shuttle,
you're basically testing the systems, the internal stuff and you have a lot of control
on the external stuff.
The hard part is how do you know you didn't get worse in something that you just changed?
A lot of ways like the Turing test starts to fail pretty quickly because you start to
feel driverless quality pretty early in that curve.
If you think about it, in most really good AV demos, maybe you'll sit there for 30 minutes.
You've driven 15 miles or something like that.
To go driverless, what's the rate of issues that you need to have you won't even encounter?
Let's try something different then.
Let's try a different version of the Turing test, which is like an IQ test.
There's these difficult questions of increasing difficulty.
They're designed.
You don't know them ahead of time.
Nobody knows the answer to them.
Is it possible to, in the future, orchestrate basically really-
Off the course, almost, of that maybe change every year and that represent, if you can
pass these, they don't necessarily represent the full spectrum.
That's it, yeah.
They won't be conclusive, but you can at least get a really quick read and filter.
You're able to, because you didn't know them ahead of time, I don't know.
Probably-
Like construction zones, failures.
Or driving anywhere in Russia.
Yeah.
Weather, cut-ins, dense traffic, merging, lane closures, animal, foreign objects on
a road that pop out on short notice, mechanical failures, sensor, braking, tire popped, weird
behaviors by other vehicles, a heartbreak, something reckless that they've done, fouling
of sensors like bugs or birds, poop or something.
You have these extreme conditions where you have a nasty construction zone where everything
shuts down and you have to get pulled to the other side of the freeway with a temporary
lane like that.
Those are conditions where we do that to ourselves.
We itemize everything that could possibly happen to give you a starting point to how
to think about what you need to develop and at the end of the day, there's no substitute
for real miles.
If you think of traditional ML, you know how there's a validation set where you hold out
some data and real world driving is the ultimate validation set.
That's in the end the cleanest signal, but you can do a really good job on creating an
obstacle course.
You're absolutely right.
At the end, if there was such a thing as automating and a readiness, it would be these extreme
conditions like a red light runner, a really reckless pedestrian that's jaywalking, a cyclist
that makes a really awkward maneuver.
That's actually what keeps you from going driverless.
In the end, that is the long tail.
Yeah, and it's interesting to think about that to me is the touring test.
Touring test means a lot of things, but to me, in driving, the touring test is exactly
this validation set that is handcrafted.
I don't know if you know him.
There's a guy named François Chollet.
He thinks about how to design a test for general intelligence.
He designs these IQ tests for machines.
The validation set for him is handcrafted.
It requires human genius or ingenuity to create a really good test, and you truly hold it
out.
It's an interesting perspective on the validation set, which is make that as hard as possible.
Not a generic representation of the data, but this is the hardest thing.
The hardest stuff.
Yeah.
It's like go.
You'll never out fully itemize all the world states that you'll expand, and so you have
to come up with different approaches.
This is where you start hitting the struggles of ML, where ML is fantastic at optimizing
the average case.
It's a really unique craft to think about how you deal with the worst case, which is
what we care about in AV space when using an ML system on something that occurs super
infrequently.
You don't care about the worst case really on ads, because if you miss a few, it's not
a big deal, but you do care about it on the driving side.
Typically you'll never fully enumerate the world.
You have to take a step back and abstract away what are the signals that you care about
and the properties of a driver that correlate to defensive driving and avoiding nasty situations
that even though you'll always be surprised by things you'll encounter, you feel good
about your ability to generalize from what you've learned.
All right.
Let me ask you a tricky question.
To me, the two companies that are building at scale some of the most incredible robots
ever built is Waymo and Tesla.
There's very distinct approaches technically, philosophically in these two systems.
Let me ask you to play sort of devil's advocate and then the devil's advocate to the devil's
advocate.
It's a bit of a race.
Of course, everyone can win, but if Waymo wins this race to level four, why would they
win?
What aspect of the approach do you think would be the winning aspect?
If Tesla wins, why would they win and which aspect of their approach would be the reason?
Just building some intuition, almost not from a business perspective, for many of that,
technically.
Yeah.
Yeah.
We could summarize, I think, maybe you can correct me, one of the more distinct aspects
is Waymo has a richer suite of sensors as lidar and vision.
Tesla now removed radar.
They do vision only.
Tesla has a larger fleet of vehicles operated by humans, so it's already deployed on the
field in its larger, what do you call it, operational domain, and then Waymo is more
focused on a specific domain and growing it with fewer vehicles.
The both are fascinating approaches.
I think there's a lot of brilliant ideas.
Nobody knows the answer, so I'd love to get your comments on the sleigh of the land.
Yeah, for sure.
Maybe I'll start with Waymo and you're right, both incredible companies and just a gigantic
respect to everything Tesla has accomplished and how they pushed the field forward as well.
On the Waymo side, there is a fundamental advantage in the fact that it is focused and
geared towards L4 from the very beginning.
We've customized the sensor suite for it, the hardware, the compute, the infrastructure,
the tech stack, and all of the investment inside the company.
That's deceptively important because there's a giant spectrum of problems you have to solve
in order to really do this from infrastructure to hardware to autonomy stack to the safety
framework.
That's an advantage because there's a reason why it's the fifth generation hardware and
why all of those learnings went into the Dymor program become such an advantage because
you learn a lot as you drive and you optimize for the best information you have.
But fundamentally, there's a big, big jump, every order of magnitude that you drive in
numbers of miles in what you earn and the gap from really decent progress for L2 and
so forth to what it takes to actually go L4.
At the end of the day, there's a feeling that Waymo has, there's a long way to go.
Nobody's won, but there's a lot of advantages in all of these buckets where it's the only
company that has shipped a fully driverless service where you can go and you can use it
instead of a decently sizable scale and those learnings can feed forward how to solve the
more general problems.
You see this process, you've deployed it in Chandler.
You don't know the timeline exactly, but you could see the steps.
They seem almost incremental, the steps don't.
It's become more engineering than totally bind R and D.
Because it works in one place and then you move to another place and you grow it this
way.
And just to give you an example, we fundamentally changed our hardware and our software stack
almost entirely from what went driverless in Phoenix to what is the current generation
of the system on both sides.
Because the things that got us to driverless, even though it got to driverless, way beyond
human relative safety, it is fundamentally not well set up to scale in an exponential
fashion without getting into huge scaling pains.
And so those learnings, you just can't shortcut.
And so that's an advantage.
And so there's a lot of open challenges to get through, technical organizational, how
do you solve problems that are increasingly broad and complex like this, work on multiple
products.
But there's a few in that, okay, like balls in our court, there's a head start there.
Now we got to go and solve it.
And I think that focus on L4, it's a fundamentally different problem.
If you think about it, like, let's say we were designing an L2 truck that was meant to be
safer and help a human.
You could do that with far less sensors, far less complexity, and provide value very quickly.
Arguably with what we already have today, just packaged up in a good product.
But you would take a huge risk in having a gap from even the compute and sensors, not
to mention the software, to then jump from that system to an L4 system.
So it's a huge risk, basically.
So again, let me allow me to be the person that plays a devil's advocate on the argue
for the Tesla approach.
So what you just laid out makes perfect sense and is exactly right.
There's some open questions here, which is, it's possible that investing more in faster
data collection, which is essentially what Tesla is doing, will get us there faster.
If the sensor suite doesn't matter as much, and machine learning can do a lot of the work.
This is the open question is, how much is the thing you mentioned before, how much of
driving can be end to end learned?
That's the open question.
Obviously the Waymo and the vision-only machine learning approach will solve driving eventually,
both.
Yeah.
The question is of timeline.
What's faster?
That's right.
And what you mentioned, if I were to make the opposite argument, what puts Tesla in the
strongest position?
It's data.
That is their superpower, where they have an access to real-world data, effectively with
a safety driver.
They found a way to get paid by safety drivers versus safer safety drivers.
It's brilliant.
But all joking aside, one, it is incredible that they've built a business that's incredibly
successful that can now be a foundation and bootstrap really aggressive investment in
autonomy space.
If you can do it, that's always an incredible advantage.
And in the data aspect of it, it is a giant amount of data.
If you can use it the right way to then solve the problem, but the ability to collect and
filter through the things that matter at real-world scale at a large distribution, that is huge.
It's a big advantage.
And so then the question becomes, can you use it in our right way, and do you have the
right software systems and hardware systems in order to solve the problem?
And you're right that in the long term, there's no reason to believe that pure camera systems
can't solve the problem that humans obviously are solving with vision systems.
But...
Questions one.
It's a big risk.
So there's no argument that it's not a risk, right?
And it's already such a hard problem.
And so much of that problem, by the way, is even beyond the perception side, some of the
hardest elements of the problem are on behavioral side and decision-making and the long-tail
safety case.
If you are adding risk and complexity on the input side from perception, you're now making
a really, really hard problem, which on its own is still almost insurmountably even harder.
And so the question is just how much?
And this is where you can easily get into a little bit of a trap where, similar to how
do you evaluate how good an AV company's product is, you go and you do a trial, kind
of a test run with them, a demo run, which they've kind of optimized like crazy and so
forth, and it feels good.
Do you put any weight in that?
You know that that gap is pretty large still.
Same thing on the perception case.
The long tail of computer vision is really, really hard.
And there's a lot of ways that that can come up.
And even if it doesn't happen that often at all, when you think about the safety bar
and what it takes to actually go full driverless, not like incredible assistance driverless,
but full driverless, that bar gets crazy high.
And not only do you have to solve it on the behavioral side, but now you have to push
computer vision beyond, arguably, where it's ever been pushed.
And so you're now on top of the broader AV challenge, you have a really hard perception
challenge as well.
So there's perception, there's planning, there's human-robot interaction.
To me, what's fascinating about what Tesla is doing is in this March towards level four,
because it's in the hands of so many humans, you get to see video, you get to see humans.
I mean, forget companies, forget businesses.
It's fascinating for humans to be interacting with robots.
That's incredible.
And they're actually helping kind of push it forward.
And that is valuable, by the way, where even for us, a decent percentage of our data is
human driving.
We intentionally have humans drive higher percentage than you might expect, because
that creates some of the best signals to train the autonomy.
And so that is on its own value.
So together, we're kind of learning about this problem in an applied sense, just like
yet with Cosmo.
When you're chasing an actual product that people are going to use, robot-based product
that people are going to use, you have to contend with the reality of what it takes
to build a robot that successfully perceives the world and operates in the world, and
what it takes to have a robot that interacts with other humans in the world.
And that's like, to me, one of the most interesting problems humans have ever undertaken.
Because you're in trying to create an intelligent agent that operates in a human world.
You're also understanding the nature of intelligence itself.
How hard is driving, it's still not answered to me.
I still don't understand that.
And all the subtle cues, even little things like your interaction with a pedestrian where
you look at each other and just go, okay, go.
That's hard to do without a human driver, right?
And you're missing that dimension, how do you communicate that?
So there's really, really interesting elements here.
Now, here's what's beautiful.
Can you imagine that when autonomous driving is solved, how much of the technology foundation
of that space can go and have tremendous just transformative impacts on other problem areas
and other spaces that have subsets of these same problems?
It's just incredible to talk about that.
Well, it's both a pro and a con is, autonomous driving is so safety critical.
So once you solve it, it's beautiful because there's so many applications that are a lot
less safety critical.
But it's also the con of that is it's so safety, it's so hard to solve.
And the same journalists that you mentioned to get excited for a demo are the ones who
write long articles about the failure of your company if there's one accident that's based
on a robot.
And it's just society's so tense and waiting for failure of robots.
You're in such a high stake environment, failure has such a high cost.
And it slows down development.
It slows down development.
Yeah.
Like the team definitely noticed that once you go driverless, we're driverless in Phoenix
and you continue to iterate, your iteration pace slows down because your fear of regression
forces so much more rigor that obviously you have to find a compromise on how often do
we release driverless builds because every time you release a driverless build, you have
to go through this validation process, which is very expensive and so forth.
So it is interesting.
It's like it is one of the hardest things.
There's no other industry where you wouldn't release products way, way quicker when you
start to provide even portions of the value that you provide, healthcare maybe is the
other one.
That's right.
But at the same time, we've gotten there where you think of surgery.
You have surgery, there's always a risk, but it's really, really bounded.
You know that there's an accident rate when you go out and drive your car today.
And you know what the fatality rate in the US is per year.
We're not banning driving because there was a car accident, but the bar for us is way
higher and we hold ourselves very serious to it where you have to not only be better
than a human, but you probably have to at scale be far better than a human by a big
margin and you have to be able to really, really thoughtfully explain all of the ways
that we validate that becomes very comfortable for humans to understand because a bunch of
jargon that we use internally just doesn't compute at the end of the day.
We have to be able to explain to society how do we quantify the risk and acknowledge
that there is some non-zero risk, but it's far above a human relative safety.
Here's the thing to push back a little bit and bring Cosmo back in the conversation.
You said something quite brilliant at the beginning of this conversation that I think
probably applies for autonomous driving, which is there's this desire to make autonomous
cars much safer than human driven cars, but if you create a product that's really compelling
and is able to explain both the leadership and the engineers and the product itself can
communicate intent, then I think people may be able to be willing to put up with a thing
that might be even riskier than humans because they understand the value of taking risks.
You mentioned the speed limit.
Humans understand the value of going over the speed limit.
Humans understand the value of going fast through a yellow light when you're in Manhattan
streets pushing through crossing pedestrians.
They understand that.
This is a much more tense topic of discussion, so this is just me talking.
With Cosmo's case, there was something about the way this particular robot communicated,
the energy it brought, the intent it was able to communicate to the humans that you understood
that of course it needs to have a camera.
Of course it needs to have this information, and in that same way, to me, of course a car
needs to take risks.
Of course there's going to be accidents.
If you want a car that never has an accident, have a car that just doesn't go anywhere,
but that's tricky because that's not a robotics problem.
Many accidents are not even due to you.
There's a big difference though.
That's not a personal decision.
You're also impacting obviously the rest of the road, and we're facilitating it.
There's a higher ethical and moral bar, which obviously then translates into, as a society
and from a regulatory standpoint, what comes out of it, where it's hard for us to ever
see this even being a debate in the sense that you have to be beyond reproach from a
safety standpoint because if you're wrong about this, you could set the entire field
back a decade.
This is me speaking.
I think if we look into the future, I personally believe, this is me speaking, that there will
be less and less focus on safety.
They're still very, very high.
Yeah.
Meaning like after autonomy is very common and accepted.
Yeah.
But not so common as everywhere, but there has to be a transition because I think for
innovation, just like you were saying, to explore ideas, you have to take risks.
I think if autonomy in the near term is to become prevalent in society, I think people
need to be more willing to understand the nature of risk.
The value of risk.
It's very difficult.
You're right, of course, with driving, but that's the fascinating nature of it.
It's a life and death situation that brings value to millions of people.
So you have to figure out what do we value about this world?
How much do we value?
How deeply do we want to avoid hurting other humans?
That's right.
There is a point where you can imagine a scenario where Waymo has a system that is even when
it's beyond human relative safety and provably statistically will save lives, there is a thoughtful
navigation of the that fact versus society readiness and perception and education of
society and regulators and everything else where it's multi-dimensional and it's not
a purely logical argument, but ironically, the logic can actually help with the emotions.
Just like any technology, there's early adopters and there's kind of like a curve that happens
after it.
And eventually, celebrities, you get the rock in a Waymo vehicle and then everybody just
comes along.
And then everybody just calms down because of a rock like that.
If you post the hymns.
Yeah.
And it's like, it's an open question on how this plays out.
I mean, maybe we're pleasantly surprised and it's just like people just realize that this
is such an enabler of life and like efficiency and cost and everything that there's a pull,
like at someone has you fully believe that this will go from a thoughtful kind of movement
and tiptoeing and like kind of like a push to society realizes how wonderful of an enabler
this could become and it becomes more of a pull and hard to know exactly how that'll
play out.
But at the end of the day, like both the goods transportation and the people transportation
side of it has that property where it's not easy.
There's a lot of open questions and challenges to navigate and there's obviously the technical
problems to solve as a kind of prerequisite, but they have such an opportunity that is
on a scale that very few industries in the last 20, 30 years have even had a chance to
tackle that maybe we're pleasantly surprised by how much that tipping point like in a really
short amount of time actually turns into a societal pull to kind of embrace the benefit
to this.
Yeah.
I hope so.
It seems like in the recent few decades, there's been tipping points for technologies where
like overnight things change.
It's like from taxis to ride sharing services, all that shift, I mean, there's just shift
after shift after shift that requires digitization and technology.
I hope we're pleasantly surprised on this.
So there's millions of long haul trucks now in the United States.
Do you see a future where there's millions of Waymo trucks and maybe just broadly speaking
Waymo vehicles just like ants running around the United States freeways and local roads?
Yeah.
In other countries too.
Like you look back decades from now and it might be one of those things that just feels
so natural and then it becomes almost like this kind of interesting kind of oddity that
we had none of it like, you know, kind of decades earlier.
And it'll take a long time to grow and scale.
Many different challenges appear at every stage, but over time, like this is one of
the most enabling technologies that we have in the world today.
It'll feel like, you know, how is the world before the internet?
How is the world before mobile phones?
Like it's going to have that sort of a feeling to it on both sides.
It's hard to predict the future, but do you sometimes think about weird ways it might
change the world, like surprising ways.
So obviously there's more direct ways where like there's increases efficiency, it'll enable
a lot of kind of logistics optimizations kind of things.
It will change our, probably our roadways and all that kind of stuff, but it could also
change society in some kind of interesting ways.
Do you ever think about how might change cities, how might change our lives, all that kind
of stuff?
Yeah.
Obviously where people live versus work becoming more distributed because the pain of commuting
becomes different, just easier.
And I don't know, there's a lot of options that open up.
The layout of cities themselves and how you think about car storage and parking, obviously
just enables a completely different type of experience in urban environments.
I think there was like a statistic that something like 30% of the traffic in cities during rush
hours caused by a pursuit of parking or something like some really high stats.
So those obviously kind of open up a lot of options.
Flexibility on goods will enable new industries and businesses that never existed before
because now the efficiency becomes more palatable.
Good delivery timing, consistency and flexibility is going to change.
The way we distribute the logistics network will change.
The way we then can integrate with warehousing, with shipping ports.
You can start to think about greater automation through the whole stack and how that supply
chain, the ripples become much more agile versus very grindy the way they are today.
The adaptation is very tough and there's a lot of constraints that we have.
I think it'll be great for the environment, it'll be great for safety.
Where probably about 95% of accidents today statistically are due to just attention or
things that are preventable with the strengths of automation.
It'll be one of those things where industries will shift, but the net creation is going
to be massively positive and then we just have to be thoughtful about the negative implications
that will happen in local places and adjust for those.
But I'm an optimist in general for the technology where you could argue a negative on any new
technology, but you start to see that if there is a big demand for something like this, in
almost all cases, it's an enabling factor that's going to propagate through society.
Particularly as life expectancies get longer and so forth, there's just a lot more need
for a greater percentage of the population to just be serviced with a high level of
efficiency because otherwise we're going to have a really hard time scaling to what's
ahead in the next 50 years in front of us.
You're absolutely right.
Every technology has negative consequences, positive consequences.
We tend to just focus on the negative a little bit too much.
In fact, autonomous trucks are often brought up as an example of artificial intelligence
and robots in general taking our jobs.
As we've talked about briefly here, we talk a lot with Steve.
It is a concern that automation will take away certain jobs.
We'll create other jobs, so there's temporary pain, hopefully temporary, but pain is pain
and people suffer and that human suffering is really important to think about.
But trucking is, I mean, there's a lot written on this is, I would say, far from the thing
that will cause the most pain.
Yeah.
There's even more positive properties about trucking where not only is there just a huge
shortage which is going to increase, the average age of truck drivers is getting closer to
50 because the younger people aren't wanting to come into it.
They're trying to incentivize, lower the age limit, all these sort of things, and the
demand is just going to increase.
The least favorable, it depends on the person, but in most cases, the least favorable types
of routes are the massive long haul routes where you're on the road away from your family
300 plus days a year.
Steve's talked about the pain of those kind of routes from a family perspective.
You're basically away from family, it's not just hours, work and stay in hours, but it's
also just time away from family and just-
Obesity rate is through the roof because you're just sitting all day.
It's really, really tough and that's also where the biggest kind of safety risk is because
of fatigue.
When you think of the gradual evolution of how trucking comes in, first of all, it's
not overnight.
It's going to take decades to kind of phase in, there's just a long, long road ahead,
but the routes and the portions of trucking that are going to require humans the longest
and benefit the most from humans are the short haul and most complicated kind of more urban
routes, which are also the more pleasant ones, which are less continual driving time, more
flexibility on geography and location, and you get to kind of sleep at your own home.
Very importantly, if you optimize the logistics, you're going to use humans much better and
thereby pay them much better because one of the biggest problems is truck drivers currently
are paid by how much they drive, so they really feel the pain of inefficient logistics because
if they're just sitting around for hours, which they often do not driving, waiting, they're
not getting paid for that time.
Logistics has a significant impact on the quality of life of a truck driver.
And a high percentage of trucks are empty because of inefficiencies in the system.
Yeah, it's one of those things where the other thing is when you increase the efficiency
of a system like this, the overall net volume of the system tends to increase.
The entire market cap of trucking is going to go up when the efficiency improves and
facilitates both growth in industries and better utilization of trucking, and so that
on its own just creates more and more demand, which of all the places where AI comes in
and starts to really kind of reshape an industry, this is one of those where there's just a
lot of positives that for at least any time in the foreseeable future seem really lined
up in a good way to kind of come in and help with the shortage and start to kind of optimize
for the routes that are most dangerous and most painful.
Yeah, so this is true for trucking, but if we zoom out broader, automation in AI does
technology broadly, I would say, but automation is a thing that has a potential in the next
couple of decades to shift the kind of jobs available to humans.
And so that results in, like I said, human suffering because people lose their jobs,
there's economic pain there, and there's also a pain of meaning.
So for a lot of people, work is a source of meaning, it's a source of identity, of pride,
pride in getting good at the job, pride in craftsmanship and excellence, which is what
truck drivers talk about, but this is true for a lot of jobs.
And is that something you think about as a sort of a roboticist zooming out from the
trucking thing?
Like where do you think it would be harder to find activity and work that's a source
of identity, a source of meaning in the future?
I do think about it because you want to make sure that you worry about the entire system,
not just the part that Hanmi plays in it, but what are the ripple effects of it down
the road.
And on enough of a time window, there's a lot of opportunity to put in the right policies
and the right opportunities to kind of reshape and retrain and find those openings.
And so just to give you a few examples, both trucking and cars, we have remote assistance
facilities that are there to interface with customers and monitor vehicles and provide
like very focused kind of assistance on kind of areas where the vehicle may want to request
help in understanding an environment.
So those are jobs that kind of get created and supported.
I remember like taking a tour of one of the Amazon facilities where you've probably seen
the Kiva Systems robots where you have these orange robots that have automated the warehouse
like kind of picking and collecting of items and it's like really elegant and beautiful
way.
It's actually one of my favorite applications of robotics of all time.
I think it kind of came across the company like 2006 was just amazing.
And what was the-
The warehouse robots, the transport little thing.
So basically instead of a person going and walking around and picking the seven items
in your order, these robots go and pick up a shelf and move it over in a row where like
the seven shelves that contain the seven items are lined up in a laser or whatever points
to what you need to get and you go and pick it and you place it to fill the order.
And so the people are fulfilling the final orders.
What was interesting about that is that when I was asking them about like kind of the impact
on labor, when they transitioned that warehouse, the throughput increased so much that the
jobs shifted towards the final fulfillment, even though the robots took over entirely
the search of the items themselves and the labor, the job stayed like nobody, like there
was actually the same amount of jobs, roughly they were necessary, but the throughput increased
by like I think over two X or some amount, right?
Like so you have these situations that are not zero some games in this really interesting
way and the optimist to me thinks that there's these types of solutions in almost any industry
where the growth that's enabled creates opportunities that you can then leverage, but you got to
be intentional about finding those and really helping make those links because any, even
if you make the argument that like there's a net positive locally, there's always tough
hits that you got to be very careful about.
That's right.
You have to have an understanding of that link because there's a short period of time
whether training is acquired or just mental transition or physical or whatever is acquired,
that's still going to be short term pain, the uncertainty of it.
There's families involved, you know, it, I mean, it's exceptionally is difficult on
a human level and you have to really think about that even you can't just look at economic
metrics always it's human beings.
That's right.
And you can't even just take it as like, okay, well, we need to like subsidize or whatever
because like there is an element of just personal pride where majority of people like
people don't want to just be okay, but like they want to actually like have a craft like
you said and have a mission and feel like they're having a really positive impact.
And so my personal belief is that there's a lot of transferability and skill set that
is possible, especially if you create a bridge and an investment to enable it.
And to some degree, that's our responsibility as well in this process.
You mentioned Kiva robots, Amazon.
Let me ask you about the Astro robot, which is, I don't know if you've seen it.
It's Amazon has announced that it's a home robot that they have a screen looks awfully
a lot like Cosmo has, I think different vision probably.
What are your thoughts about like home robotics in this kind of space?
There's been a quite a bunch of home robots, social robots that very unfortunately have
closed their doors that for various reasons, perhaps were too expensive.
There's manufacturing challenges, all that kind of stuff.
What are your thoughts about Amazon getting into the space?
Yeah, we had some signs that they were getting into like long, long, long ago.
Maybe they were a little too interested in Cosmo and right during our conversations,
but they're also very good partners actually for us as we kind of just integrated a lot
of shared technology.
If I could also get your thoughts on, you could think of Alexa as a robot as well,
Echo.
Do you see those as fundamentally different just because you can move and look around?
Is that fundamentally different than the thing that just sits in place?
It opens up options, but my first reaction is I think, I have my doubts that this one's
going to hit the mark because I think for the price point that it's at and the kind
of functionality and value propositions that they're trying to put out, it's still searching
for the killer application that justifies, I think it was a $1,500 price point or somewhere
on there.
That's a really high bar, so there's enthusiasts, and early adopters will obviously pursue it,
but you have to really hit a high mark at that price point, which we were always very
cautious about jumping too quickly to the more advanced systems that we really wanted
to make, but would have raised the bar so much that you have to be able to hit it in
today's cost structures and technologies.
The mobility is an angle that hasn't been utilized, but it has to be utilized in the
right way, and so that's going to be the biggest challenge is can you meet the bar of what
the mass market consumer, think like our neighbors, our friends, parents, would they
find a deep, deep value like in this at a mass scale that justifies the price point?
I think that's in the end one of the biggest challenges for robotics, especially consumer
robotics where you have to kind of meet that bar, it becomes very, very hard.
There's also the higher bar, just like you were saying with Cosmo, a thing that can look
one way and then turn around and look at you, that's either a super desirable quality or
a super undesirable quality, depending on how much you trust the thing, and so there's
a problem of trust to solve there, there's a problem of personality, it's the quote
unquote problem that Cosmo solved so well, is that you trust the thing, and that has
to do with the company, with the leadership, with the intent that's communicated by the
device and the company and everything together.
Yeah, exactly right, and I think they also have to retrace some of the warnings on the
character side, where as usual, I think that's the place where a lot of companies are great
at the hardware side of it and can think about those elements, and then there's the thinking
about the AI challenges, particularly with the advantage of Alexa is a pretty huge boost
for them.
The character side of it for technology companies is pretty novel territory, and so that will
take some iterations, but yeah, I mean, I hope this continued progress in the space
and that thread doesn't go dormant for too long, and it's not, it's going to take a
while to evolve into the ideal applications, but this is one of Amazon's, I guess you
could call it, it's definitely part of their DNA, but in many cases it's also a strength
where they're very willing to iterate aggressively and move quickly.
Take risks.
Take risks.
You have deep pockets.
Yeah.
You have more misfires than an Apple would, but it's different styles and different approaches,
and at the end of the day, it's like there's a few familiar kind of elements there for
sure, which was kind of-
A mage.
There's one way to put it.
Why is it so hard at a high level to build a robotics company, a robotics company that
lives for a long time?
If you look at, I thought Cosmo for sure would live for a very long time.
That to me was exceptionally successful vision and idea and implementation.
iRobot is an example of a company that has pivoted in all the right ways to survive and
arguably thrive by focusing on the, having like a, have a driver that constantly provides
profit, which is the vacuum cleaner, and of course there's like Amazon, what they're
doing is they're almost like taking risks so they can afford it because they have other
sources of revenue, right?
But outside of those examples, most robotics companies fail.
Why do they fail?
Why is it so hard to run a robotics company?
iRobot's impressive because they found a really, really great fit of where the technology
could satisfy a really clear use case and need and they did it well and they didn't
try to overshoot from a cost-to-benefit standpoint.
Robotics is hard because it like tends to be more expensive.
It combines way more technologies than a lot of other types of companies do.
If I were to like say one thing that is maybe the biggest risk in like a robotics company
failing is that it can be either a technology in search of application or they try to bite
off a kind of an offering that has a mismatch in kind of price to function and just the
mass market appeal isn't there and consumer products are just hard.
It's just, I mean, after all the years and it like definitely kind of feel a lot of the
battle scars because you have, not only do you have to like hit the function, we have
to educate and explain, get awareness up, deal with different conductive consumers.
There's a reason why a lot of technology sometimes start in the enterprise space and then kind
of continue forward in the consumer space even like you see AR like starting to kind
of make that shift with HoloLens and so forth in some ways.
Consumers and price points that they're willing to kind of be attracted in a mass market way
and I don't mean like 10,000 enthusiasts bought it, but I mean like 2 million, 10 million,
50 million like mass market kind of interest have bought it.
That bar is very, very high and typically robotics is novel enough and non-standardized
enough to where it pushes on price points so much that you can easily get out of range
where the capabilities in today's technology or just a function that was picked just doesn't
line up and so that product market fit is very important.
So the space of killer apps or rather super compelling apps is much smaller because it's
easy to get outside of the price range for most consumers.
And it's not constant, right?
Yeah.
And that's why we picked off entertainment because the quality was just so low in physical
entertainment that we felt we could leapfrog that and still create a really compelling
offering at a price point that was defensible and that proved out to be true.
And over time, that same opportunity opens up in healthcare, in home applications, in
commercial applications, in kind of broader, more generalized interface, but there's missing
pieces in order for that to happen and all of those have to be present for it to line
up.
And we see these sort of trends in technology where kind of technologies that start in one
place evolve and kind of grow to another.
You start in gaming, some things start in space or aerospace and then kind of move into
the consumer market.
And sometimes it's just a timing thing, right, where how many stabs at what became the iPhone
where they're over the 20 years before that just weren't quite ready in the function relative
to the kind of price point and complexity.
And sometimes it's a small detail of the implementation that makes all the difference,
which is design is so important.
And I think, yeah, like the new generation UX, right?
And that's tough and oftentimes all of them have to be there and it has to be like a perfect
storm.
But yeah, history repeats itself in a lot of ways in a lot of these trends, which is
pretty fascinating.
Well, let me ask you about the humanoid form.
What do you think about the Tesla bot and humanoid robotics in general?
So obviously, to me, autonomous driving, Waymo and the other companies working in the space,
that seems to be a great place to invest in potential revolutionary application robotics
application, folks application.
What's the role of humanoid robotics?
Do you think Tesla bot is ridiculous?
Do you think it's super promising?
Do you think it's interesting, full of mystery?
Nobody knows.
What do you think about this thing?
Yeah, I think today, humanoid form robotics is research.
There's very few situations where you actually need a humanoid form to solve a problem.
If you think about it, wheels are more efficient than legs.
There's joints and degrees of freedom beyond a certain point, just add a lot of complexity
and cost.
So if you're doing a humanoid robot, oftentimes it's in the pursuit of a humanoid robot, not
in the pursuit of an application for the time being, especially when you have the gaps in
interface and AI that we talk about today.
So anything Elon does, I'm interested in following.
So there's an element of that.
No matter how crazy.
Yeah, how crazy it is.
I'll pay attention and I'm curious to see what comes out of it.
So it's like you can't ever ignore it, but it's definitely far afield from their kind
of core business, obviously.
What was interesting to me, as I've disagreed with Elon a lot about this, is to me the compelling
aspect of the humanoid form and a lot of kind of robots, Cosmo, for example, is the human
robot interaction part.
From Elon Musk's perspective, the Tesla bot has nothing to do with the human.
It's a form that's effective for the factory because the factory is designed for humans.
But to me, the reason you might argue for the humanoid form is because at a party, it's
a nice way to fit into the party.
The humanoid form has a compelling notion to it in the same way that Cosmo is compelling.
I would argue, if we were arguing about this, that it's cheaper to build a Cosmo like that
form.
But if you wanted to make an argument, which I have with Jim Keller about, you could actually
make a humanoid robot for pretty cheap, it's possible.
And then the question is, all right, if you're using an application where it can be flawed,
it can have a personality and be flawed in the same way that Cosmo is, then maybe it's
interesting for integration to human society.
That's to me is an interesting application of a humanoid form.
Because humans are drawn, like I mentioned to you, legged robots, we're drawn to legs
and limbs and body language and all that kind of stuff.
And even a face, even if you don't have the facial features, which you might now want
to have to reduce the creepiness factor, all that kind of stuff.
But yeah, that to me, the humanoid form is compelling.
But in terms of that being the right form for the factory environment, I'm not so sure.
Yeah.
For the factory environment, right off the bat, what are you optimizing for?
Is it strength?
Is it mobility?
Is it versatility?
That change is completely the look and feel of the robot that you create.
And almost certainly the human form is over designed for some dimensions and constrained
for some dimensions.
And so what are you grasping?
Is it big?
Is it little?
How do you customize it and make it customizable for the different needs if that was the optimization?
And then for the other one, I could totally be wrong.
I still feel that the closer you try to get to a human, the more you're subject to the
biases of what a human should be.
And you lose flexibility to shift away from your weaknesses and towards your strengths.
And that changes over time, but there's ways to make really approachable and natural interfaces
for robotic characters and deployments in these applications that do not at all look
like a human directly, but that actually creates way more flexibility and capability and role
and forgiveness and interface and everything else.
Yeah, it's interesting, but I'm still confused by the magic I see in legged robots.
Yeah.
So there is a magic.
So I'm absolutely amazed at it from a technical curiosity standpoint and the magic that the
Boston Dynamics team can do from walking and jumping and so forth.
Now, there's been a long journey to try to find an application for that sort of technology,
but wow, that's incredible technology, right?
So then you kind of go towards, okay, are you working back from a goal of what you're
trying to solve or are you working forward from a technology and I'm looking for a solution?
And I think that's where it's kind of a bi-directional search oftentimes, but the two have to meet.
And that's where humanoid robots is kind of close to that in that it is a decision about
a form factor and a technology that it forces that doesn't have a clear justification on
why that's the killer app from the other end.
But I think the core fascinating idea with the Tesla bot is the one that's carried by
Waymo as well, is when you're solving the general robotics problem of perception control
where there's the very clear applications of driving, it's as you get better and better
at it, when you have like Waymo driver, the whole world starts to kind of start to look
like a robotics problem.
So it's very interesting.
So without your detection, cospecation, segmentation, tracking, planning, like it's, yeah.
So there's no reason, I mean, I'm not speaking for Waymo here, but moving goods, there's
no reason transformer like this thing couldn't take the goods up an elevator, like that slowly
expand what it means to move goods and expand more and more of the world into a robotics
problem.
Well, that's right.
And you start to think of it as an end-to-end robotics problem from loading from everything.
And even like the truck itself, today's generation is integrating into today's understanding of
what a vehicle is, a Pacific Jaguar, the freight liners from Daimler.
There's nothing that stops us from down the road after starting to get to scale to expand
these partnerships to really rethink what would the next generation of a truck look
like that is actually optimized for autonomy, not for today's world.
And maybe that means a very different type of trailer.
Maybe there's a lot of things you could rethink on that front, which is on its own very, very
exciting.
Let me ask you, like I said, you went to the Mecca of robotics, which is CMU, Carnegie
Mellon University.
You got a PhD there.
So maybe by way of advice and maybe by way of story and memories, what does it take to
get a PhD in robotics at CMU?
And maybe you can throw in there some advice for people who are thinking about doing work
in artificial intelligence and robotics and are thinking about whether to get a PhD.
That's exactly what I was at CMU for undergrad as well and didn't know anything about robotics
coming in and was doing electrical computer engineering, computer science, and really
got more and more into kind of AI.
And then fell in love with autonomous driving and at that point, that was just by a big
margin, such an incredible central spot of investment in that area.
And so what I would say is that robotics for all the progress that's happened is still
a really young field.
There's a huge amount of opportunity.
Now that opportunity shifted where something like autonomous driving has moved from being
very research and academics driven to being commercial driven, where you see the investments
happening in commercial.
Now there's other areas that are much younger and you see kind of grasping and impulation
making kind of the same sort of journey that autonomy made and there's other areas as
well.
What I would say is the space moves very quickly.
Anything you do a PhD in like it is in most areas will evolve and change as technology
changes and constraints change and hardware changes and the world changes.
And so the beautiful thing about robotics is it's super broad.
It's not a narrow space at all and it could be a million different things in a million
different industries.
And so it's a great opportunity to come in and get a broad foundation on AI, machine
learning, computer vision, systems, hardware, sensors, all these separate things.
You do need to like go deep and find something that you're like really, really passionate
about.
Obviously, like just like any PhD, this is like a five, six year kind of endeavor and
you have to love it enough to go super deep to learn all the things necessary to be super
deeply functioning in that area and then contribute to it in a way that hasn't been done before.
And in robotics, that probably means more breadth because robotics is rarely kind of
like one particular kind of narrow technology.
And it means being able to collaborate with teams where like one of the coolest aspects
of like the experience that I kind of cherish in our PhD is that we actually had a pretty
large AV project that for that time was like a pretty serious initiative where you got to
like partner with a larger team and you had the experts in perception and the experts
in planning and the staff and the mechanical engineers.
For the DARPA challenge.
So I was working on a project called UPI back then, which was basically the off-road version
of the DARPA challenge.
It was a DARPA funded project for basically like a large off-road vehicle that you would
like drop and then give it a way point 10 kilometers away and it would have to navigate
a completely instructional.
In an off-road environment.
Yeah.
So like forest, ditches, rocks, vegetation.
And so it was like a really, really interesting kind of a hard problem where like wheels would
be up to my shoulders.
It's like gigantic, right?
Yeah, by the way, AV for people stands for autonomous vehicles.
Autonomous vehicles.
Yeah.
Sorry.
And so what I think is like the beauty of robotics, but also kind of like the expectation
is that there's spaces in computer science where you can be very, very narrow and deep.
Robotics, one of the necessity, but also the beauty of it is that it forces you to be excited
about that breadth and that partnership across different disciplines that enable it.
But that also opens up so many more doors where you can go and you can do robotics and almost
any category where robotics isn't really an industry.
It's like AI, right?
It's like the application of physical automation to all these other worlds.
And so you can do robotic surgery, you can do vehicles, you can do factory automation,
you can do healthcare, you can do like leverage the AI around the sensing to think about static
sensors and seen understanding.
So I think that's got to be the expectation and the excitement.
And it breeds people that are probably a little bit more collaborative and more excited about
working in teams.
If I could briefly comment on the fact that the robotics people I've met in my life from
CMU and MIT, they're really happy people.
Because I think it's the collaborative thing.
I think you're not like sitting in like the fourth basement.
Yes, exactly.
When you're doing machine learning purely software, it's very tempting to just disappear
into your own hole and never collaborate.
And that breeds a little bit more of the silo mentality of like, I have a problem.
It's almost like negative to talk to somebody else or something like that.
But robotics folks are just very collaborative, very friendly.
And there's also an energy of like, you get to confront the physics of reality often,
which is humbling and also exciting.
So it's humbling when it fails and exciting when it finally works.
It's like a purity of the passion.
And you got to remember that right now, robotics and AI is like just all the rage and autonomous
vehicles and all this, 15 years ago and 20 years ago, it wasn't that deeply lucrative.
People that went into robotics, they did it because they thought it was just the coolest
thing in the world to make physical things intelligent in the real world.
And so there's like a raw passion where they went into it for the right reasons and so
on.
So it's really great space.
And that organizational challenge, by the way, like when you think about the challenges
in AV, we talk a lot about the technical challenges, the organizational challenges through the
roof where you think about the challenge, what it takes to build an AV system and you
have companies that are now thousands of people.
And you look at other really hard technical problems like an operating system, it's pretty
well established.
Like you kind of know that there's the file system, there's virtual memory, there's this,
there's that, there's like caching and like, and there's like a really reasonably well
established modularity and APIs and so forth.
And so you can kind of like scale it in an efficient fashion.
That doesn't exist anywhere near to that level of maturity in autonomous driving right now.
And tech stacks are being reinvented, organizational structures are being reinvented.
You have problems like pedestrians that are not isolated problems.
They're part sensing, part behavior prediction, part planning, part evaluation.
And like one of the biggest challenges is actually how do you solve these problems where
the mental capacity of a human is starting to get strained on how do you organize it
and think about it where, you know, you have this like multi-dimensional matrix that needs
to all work together.
And so that makes it kind of cool as well because it's not like solved at all from, you
know, like, what does it take to actually scale this, right?
And then you look at like other gigantic challenges that have, you know, that have been successful
and are way more mature.
There's a stability to it.
And like maybe the autonomous vehicle space will get there.
But right now, just as many technical challenges as they are, they're like organizational challenges
on how do you like solve these problems that touch on so many different areas and efficiently
tackle them while like maintaining progress among all these constraints while scaling.
By way of advice, what advice would you give to somebody thinking about doing a robotic
startup?
You mentioned Cosmo.
Somebody that wanted to carry the Cosmo flag forward, the Anki flag forward, looking back
at your experience, looking forward at a future that will obviously have such robots.
What advice would you give to that person?
Yeah, it was the greatest experience ever.
And it's like, there's something you, there are things you learn navigating a startup
that you'll never like, you, it was very hard to encounter that in like a typical kind of
work environment.
And, and it's just, it's, it's wonderful.
You got to be ready for it.
It's not as like, you know, the, the glamour of a startup, there's just like, just brutal
emotional swings up and down.
And so having co-founders actually helps a ton.
Like I would not, cannot imagine doing it solo, but having at least somebody where on your
darkest days, you can kind of like really openly just like have that conversation and, you
know, lean on to somebody that's, that's in the thick of it with you helps a lot.
What I would say,
What was the nature of darkest days and the emotional swings?
Is it worried about the funding?
Is it worried about what the, any of your ideas are any good or ever were good?
Is it like the self doubt?
Is it like facing new challenges that have nothing to do with the technology, like organizational
human resources, that kind of stuff?
What?
Yeah.
You come from a world in school where you feel that you put in a lot of effort and you'll
get the right result and input translates proportional to output.
And, you know, you need to solve, solve the set or do whatever and just kind of get it
done.
Now, PhD tests out a little bit, but at the end of the day, you put in the effort, you
tend to like kind of come out with your enough results to you kind of get a PhD in the startup
space, like, you know, like you could talk to 50 investors and they just don't see your
vision and it doesn't matter how hard you kind of tried and pitched.
You could work incredibly hard and you have a manufacturing defect.
And if you don't fix it, you're going to, you're out of business.
You need to raise money by a certain date and there's a, you got to have this milestone
in order to like have a good pitch and you do it.
You have to have this talent and you just don't have it inside the company or, you know,
you have to get 200 people or however many people kind of like along with you and kind
of buy in the journey.
You're like disagreeing with an investor and they're your investors.
So it's just like, you know, it's like you, there's no walking away from it, right?
So, and it tends to be like those things where you just kind of get clobbered in so many
different ways that like things end up being harder than you expect.
And it's like such a gauntlet, but you learn so much in the process and there's a lot of
people that actually end up rooting for you and helping you like from the outside and
you get good, great mentors and you like get fine, fantastic people that step up in the
company and you have this like magical period where everybody's like, it's life or death
for the company, but like you're all fighting for the same thing and it's the most satisfying
kind of journey ever.
The things that make it easier and that I would recommend is like be really, really thoughtful
about the, the application.
Like there's a, there's a saying of like kind of, you know, team and execution and market
and like kind of how important are each of those.
And oftentimes the market wins and you come at it thinking that if you're smart enough
and you work hard enough and you're like, have the right talented team and so forth,
like you'll always kind of find a way through.
And it's surprising how much dynamics are driven by the industry you're in and the timing
of you entering that industry.
And so just Waymo is a great example of it.
There is though, I don't know if they'll ever be another company or, or suite of companies
that has raised and continues to spend so much money at such an early phase of revenue
generation and product, productization, you know, from a PNL standpoint, like it's, it's
a anomaly, like by any measure of any industry that's ever existed, except for maybe the
US space program, like, right, like, but it's like a multiple trillion dollar opportunities,
which is so unusual to find that size of a market that just the progress that shows
that you're risking of it, you could apply whatever discounts you want off that trillion
dollar market and still justifies the investment that is happening because like being successful
in that space makes all the investment feel trivial.
Now by the same consequence, like the size of the market, the size of the target audience,
the ability to capture that market share, how hard that's going to be, who the incumbents
like, that's probably one of the lessons I appreciate, like more than anything else
where like those things really, really do matter.
And oftentimes can dominate the quality of the team or execution because if you miss
the timing or you do it in the wrong space, you run into like the institutional kind of
headwinds of a particular environment, like, let's say you have the greatest idea in the
world, but you barrel into healthcare, but it takes 10 years to innovate in healthcare
because of a lot of challenges, right?
Like there's fundamental laws of physics that you have to think about.
And so the combination of like Anki and Waymo kind of drives that point home for me where
you can do a ton if you have the right market, the right opportunity, the right way to explain
it and you show the progress in the right sequence.
It actually can really significantly change the course of your journey and startup.
How much of is understanding the market and how much of is creating a new market?
How do you think about, like the space robotics is really interesting, you said exactly right.
The space of applications is small relative to the cost involved.
So how much is like truly revolutionary thinking about like what is the application?
And then, yeah, but so like creating something that didn't really exist, like this is pretty
obvious to me, the whole space of home robotics, just everything that Cosmo did, I guess you
could talk to it as a toy and people will understand it because it was much more than
a toy.
Yeah.
And I don't think people fully understand the value of that.
You have to create it and the product will communicate it.
Like just like the iPhone, nobody understood the value of no keyboard and a thing that can
do web browsing.
I don't think they understood the value of that until you create it.
Yeah.
So creating a foot and a door and an entry point still helps because at the end of the
day, like an iPhone replaced your phone.
And so it had a fundamental purpose and all these things did it did better, right?
And so then you could do ABC on top of it.
And then like you even remember the early commercials where there's always like one
application of what it could do and then you get a phone call, right?
And so that was intentionally sending a message, something familiar, but then like you can
send a text message, you can listen to music, you can surf the web, right?
And so, you know, autonomous driving obviously anchors on that as well.
You don't have to explain to somebody the functionality of an autonomous truck, right?
Like there's nuances around it, but the functionality makes sense.
In the home, you have a fundamental advantage like we always thought about this because
it was so painful to explain to people what our products did and how I kind of communicate
that super cleanly, especially when something was so experiential.
And so you compare like Anki to Nest, Nest had some beautiful products where they started
scaling and like actually find like really great success and they had like really clean
and beautiful marketing messaging because they anchored on reinventing existing categories
where it was a smart thermostat, right?
And like, and so you kind of are able to take what's familiar, anchor that understanding
and then explain what's better about it.
That's funny.
You're right.
Cosmo is like totally new thing.
Like what is this thing?
It's we struggle.
We spent like a lot of money on marketing.
We had a heart.
Like we actually had far greater efficiency on Cosmo than anything else because we found
a way to capture the emotion in some little shorts to kind of lean into the personality
in our marketing.
And it became viral where like we had these kind of videos that would like go and get
like hundreds of thousands of views and like kind of like get spread and sometimes millions
of views.
And so, but it was like really, really hard.
And so finding a way to kind of like anchor on something that's familiar, but then grow
into something that's not is an advantage, but then again, like, you don't have like
this success is otherwise like Alexa never had a comp, right?
You could argue that that's very novel and very new.
And there's a lot of other examples that kind of created a kind of a category out of like
Kiva systems.
I mean, they like came in and they like enterprises is a little easier because if you can, it's
less susceptible to this because if you can argue a queer value proposition, it's a more
logical conversation that you can have with customers.
It's not it's a little bit less emotional and kind of subjective, but
Yeah.
In the home, you have to
Yeah.
It's like a home robot.
It's like, what does that mean?
Yeah.
And so then you really have to be crisp about the value proposition and what like really
makes it worth it.
And we by the way, went to that same one and we almost like, we almost hit a wall coming
out of 2013 where we were so big on explaining why our stuff was so high tech and all the
kind of like great technology in it and how cool it is and so forth to having to make
a super hard pivot on why is it fun and why does the random kind of family of, you know,
for need this, right?
Yeah.
So it's learnings, but that's that's the challenge and I think like robotics tends to sometimes
fall into the new category problem, but then you got to be really crisp about why it needs
to exist.
And some of robotics, depending on the category, depending on the application is a little bit
of a marketing this challenge.
And I don't I don't mean, I mean, it's it's the kind of marketing that Waymo is doing
that Tesla is doing is like showing off incredible engineering, incredible technology, but convincing,
like you said, family for this, this will this is like this is transformative for your
life.
This is fun.
This is easy.
They don't care how much tech is in your thing.
They don't.
They don't care.
They need to know why they want it.
And some of that is just marketing.
Yeah.
And that's like Roomba, like, yes, they didn't, you know, like, go and, you know, have this
like, you know, huge, huge ramp into like the entirety of a kind of era of robotics and
so forth.
But like, they built a really great business and in a vacuum cleaner world and like everybody
understands where a vacuum cleaner is, most people are annoyed by doing it.
And now you have one that like kind of does it itself, various degrees of quality.
But that is so compelling that like, it's easier to understand.
And like, and they had a very kind of, and I think they have like 15% of the vacuum cleaner
market.
So it's like pretty successful, right?
I think we need more of those types of thoughtful stepping stones in robotics, but the opportunities
are becoming bigger because hardware is cheaper, computer is cheaper, cloud is cheaper and AI
is better.
So there's a lot of opportunity.
If we zoom out from specifically startups and robotics, what advice do you have to high
school students, college students about career and living a life that you'd be proud of?
You lived one heck of a life, you're very successful in several domains.
If you can convert that into a generalizable potion, what advice would you give?
That's a very good question.
So it's very hard to go into a space that you're not passionate about and push hard
enough to be, to like maximize your potential in it.
And so there's always kind of like the saying of like, okay, follow your passion, great.
Try to find the overlap of where your passion overlaps with like a growing opportunity and
need in the world where it's not too different than the startup kind of argument that we
talked about where if you are...
Where your passion meets the market.
Right?
You know what I mean?
Because it's like, that's a beautiful thing where you can do what you love, but it also
just opens up tons of opportunities because the world's ready for it, right?
And so if you're interested in technology, that might point to like go and study machine
learning because you don't have to decide what career you're going to go into, but it's
going to be such a versatile space that's going to be at the root of like everything
that's going to be in front of us that you can have eight different careers in different
industries and be an absolute expert in this like kind of tool set that you wield that
can go and be applied.
And by the way, that doesn't apply to just technology, right?
It could be the exact same thing if you want to, you know, same thought process or price
to design, to marketing, to sales, to anything, but that versatility where you like when you're
in a space that's going to continue to grow, it's just like what company do you join?
One that just is going to grow and the growth creates opportunities where the surface area
is just going to increase and the problems will never get stale and you can have many...
And so you go into a career where you have that sort of growth in the world that you're
in, you end up having so much more opportunity that organically just appears and you can
then have more shots on goal to find like that killer overlap of timing and passion
and skill set and point in life where you can like, you know, just really be motivated
and fall in love with something.
And then at the same time, like find a balance like there's been times in my life where I
worked like a little bit too obsessively and crazy and I think we kind of like tried to
correct that, you know, kind of the right opportunities, but, you know, I think I probably
appreciate a lot more now friendships that go way back, you know, family and things like
that.
And I kind of have the personality where I could ease, like I have like so much desire
to really try to optimize like, you know, when I'm working on that, I can easily go
to kind of an extreme and now I'm trying to like kind of find that balance and make sure
that I have the friendships, the family, like relationship with the kids, everything that
like I don't, I push really, really hard, but it kind of find a balance and, and I think
people can be happy on actually many kind of extremes on that spectrum, but it's easy
to kind of inadvertently make a choice by how, how you approach it that then becomes
really hard to unwind.
And so being very thoughtful about kind of all of those dimensions makes a lot of sense.
And so the comment, I think those are all interrelated, but at the end of the day,
love, passion and love, love towards, you said, family, friends and hopefully one day,
if your work pans out, Boris, is love towards robots, not the creepy kind, the good kind,
just just friendship and, and fun.
Yeah.
It's like another dimension to just how we interface with the world.
Yeah.
Yes, you're one of my favorite human beings, roboticist, you've created some incredible
robots, and I think inspired countless people.
And like I said, I hope Cosmo, I hope you work with Anki lives on.
And I can't wait to see what you do with Waymo.
I mean, that's if we're talking about artificial intelligence technology, there's the potential
to revolutionize so much of our world.
That's it right there.
So thank you so much for the work you've done.
Thank you for spending your valuable time talking with me.
Thanks.
Thanks.
Thanks for listening to this conversation with Boris Hoffman.
To support this podcast, please check out our sponsors in the description.
And now let me leave you with some words from Isaac Asimov.
If you were to insist I was a robot, you might not consider me capable of love in some mystic
human sense.
Thank you for listening and hope to see you next time.