This graph shows how many times the word ______ has been mentioned throughout the history of the program.
The following is a conversation with Michael Littman, a computer science professor at Brown University
doing research on and teaching machine learning, reinforcement learning, and artificial intelligence.
He enjoys being silly and lighthearted in conversation, so this was definitely a fun one.
Quick mention of each sponsor, followed by some thoughts related to the episode.
Thank you to SimplySafe, a home security company I use to monitor and protect my apartment.
ExpressVPN, the VPN I've used for many years to protect my privacy on the internet.
Masterclass, online courses that I enjoy from some of the most amazing humans in history,
and better help online therapy with a licensed professional.
Please check out these sponsors in the description to get a discount and to support this podcast.
As a side note, let me say that I may experiment with doing some solo episodes in the coming month
or two. The three ideas I have floating in my head currently is to use one, a particular moment
in history, two, a particular movie, or three, a book to drive a conversation about a set of
related concepts. For example, I could use 2001 Space Odyssey or Ex Machina to talk about AGI
for one, two, three hours, or I could do an episode on the rise and fall of Hitler and Stalin
each in a separate episode, using relevant books and historical moments for reference.
I find the format of a solo episode very uncomfortable and challenging, but that just
tells me that it's something I definitely need to do and learn from the experience.
Of course, I hope you come along for the ride. Also, since we have all this
momentum built up on announcements, I'm giving a few lectures on machine learning at MIT this
January. In general, if you have ideas for the episodes, for the lectures, or for just short
videos on YouTube, let me know in the comments that I still definitely read despite my better
judgment and the wise sage device of the great Joe Rogan. If you enjoy this thing, subscribe
on YouTube, review it with five stars on a podcast, follow on Spotify, support on Patreon,
or connect with me on Twitter at Lex Freedman. And now here's my conversation with Michael Littman.
I saw a video of you talking to Charles Isbell about Westworld, the TV series. You guys were
doing a thing where you're watching new things together, but let's rewind back. Is there a
sci-fi movie or book or shows that was profound that had an impact on you philosophically or
just specifically something you enjoyed nerding out about?
Yeah, interesting. I think a lot of us have been inspired by robots in movies. One that I really
like is, there's a movie called Robot and Frank, which I think is really interesting because it's
very near-term future where robots are being deployed as helpers in people's homes. And we
don't know how to make robots like that at this point, but it seemed very plausible. It seemed
very realistic or imaginable. And I thought that was really cool because they're awkward. They do
funny things. It raised some interesting issues, but it seemed like something that would ultimately
be helpful and good if we could do it right. Yeah, he was an older cranky gentleman.
He was an older cranky jewel thief. Yeah. It's kind of a funny little thing, which is he's a
jewel thief. And so he pulls the robot into his life, which is something you could imagine
taking a home robotics thing and pulling into whatever quirky thing that's involved in your
existence. Get this meaningful to you. Exactly so. Yeah. And I think from that perspective,
I mean, not all of us are jewel thieves. And so when we bring our robots into our lives,
it explains a lot about this apartment, actually. But no, the idea that people should have the
ability to make this technology their own, that it becomes part of their lives. And I think that's
it's hard for us as technologists to make that kind of technology. It's easier to mold people
into what we need them to be. And just that opposite vision, I think, is really inspiring.
And then there's a anthropomorphization where we project certain things on them,
because I think the robot was kind of dumb. But I have a bunch of Roombas that play with and
they, you immediately project stuff onto them, much greater level of intelligence. We'll probably
do that with each other too. Much, much, much greater degree of compassion. That's right. One
of the things we're learning from AI is where we are smart and where we are not smart. Yeah.
Yeah. You also enjoy, as people can see, and I enjoyed myself watching you sing and even dance
a little bit, a little bit, a little bit of dancing. A little bit of dancing. That's not
quite my thing. As a method of education or just in life, you know, in general. So easy question.
What's the definitive, objectively speaking, top three songs of all time? Maybe something that
you know, to walk that back a little bit, maybe something that others might be surprised by.
Three songs that you kind of enjoy. That is a great question that I cannot answer. But instead,
let me tell you a story. Pick a question you do want to answer. That's right. I've been watching
the presidential debates and vice presidential debates and it turns out, yeah, it's really,
you can just answer any question you want. So it's a related question.
Yeah, well said. I really like pop music. I've enjoyed pop music ever since I was very young.
So 60s music, 70s music, 80s music, this is all awesome. And then I had kids and I think I stopped
listening to music and I was starting to realize that my musical taste had sort of frozen out.
And so I decided in 2011, I think, to start listening to the top 10 billboard songs each
week. So I'd be on the treadmill and I would listen to that week's top 10 songs so I could
find out what was popular now. And what I discovered is that I have no musical taste
whatsoever. I like what I'm familiar with. And so the first time I'd hear a song is the first
week that was on the charts, I'd be like, ugh. And then the second week, I was into it a little bit
and the third week, I was loving it. And by the fourth week is like just part of me. And so
I'm afraid that I can't tell you the most, my favorite song of all time because it's whatever
I heard most recently. Yeah, that's interesting. People have told me that there's an art to
listening to music as well. And you can start to, if you listen to a song, just carefully,
explicitly just force yourself to really listen. You start to, I did this when I was part of
Jazz Band and Fusion Band in college. You start to hear the layers of the instruments. You start
to hear the individual instruments and you can listen to classical music or to orchestra this
way. You can listen to jazz this way. And it's funny to imagine you now to walk in that forward
to listening to pop hits now as like a scholar listening to like Cardi B or something like that
or Justin Timberlake. No, not Timberlake, Bieber. They've both been in the top 10 since I've been
listening. They're still up there. Oh my God. I'm so cool. If you haven't heard Justin Timberlake's
top 10 in the last few years, there was one song that he did where the music video was set at
essentially NeurIPS. Oh, wow. Oh, the one with the robotics. Yeah. Yeah. Yeah. Yeah. Yeah. Yeah.
It's like at an academic conference and he's doing a demo and it was sort of a cross between
the Apple, like Steve Jobs kind of talk and NeurIPS. So, you know, it's always fun when
AI shows up in pop culture. I wonder if he consulted somebody for that. That's really
interesting. So, maybe on that topic, I've seen your celebrity in multiple dimensions,
but one of them is you've done cameos in different places. I've seen you in a TurboTax commercial
as like I guess the brilliant Einstein character. And the point is that TurboTax doesn't need
somebody like you. It doesn't need a brilliant person. Very few things need someone like me,
but yes, they were specifically emphasizing the idea that you don't need to be a computer expert
to be able to use their software. How did you end up in that world? I think it's an interesting
story. So, I was teaching my class. It was an intro computer science class for non-concentrators,
non-majors. And sometimes when people would visit campus, they would check in to say,
hey, we want to see what a class is like. Can we sit on your class? So, a person came to my class
who was the daughter of the brother of the husband of the best friend of my wife.
Anyway, basically a family friend came to campus to check out Brown and asked to come to my class
and came with her dad. Her dad is who I've known from various kinds of family events and so forth,
but he also does advertising. And he said that he was recruiting scientists for this ad,
this TurboTax set of ads. And he said we wrote the ad with the idea that we get like the most
brilliant researchers, but they all said no. So, can you help us find the like B level scientists?
And I'm like, sure, that's who I hang out with. So, that should be fine. So, I put together a list
and I did what some people called a Dick Cheney. So, I included myself on the list of possible
candidates, you know, with a little blurb about each one and why I thought that would make sense
for them to do it. And they reached out to a handful of them, but then they ultimately,
they YouTube stalked me a little bit and they thought, oh, I think he could do this. And they
said, okay, we're going to offer you the commercial. I'm like, what? So, it was such an interesting
experience because they have another world, the people who do nationwide kind of ad campaigns
and television shows and movies and so forth. It's quite a remarkable system that they have going
because it's like a set. Yeah. So, I went to, it was just somebody's house that they rented in
New Jersey, but in the commercial, it's just me and this other woman. In reality, there were 50
people in that room and another, I don't know, half a dozen kind of spread out around the house
in various ways. There were people whose job it was to control the sun. They were in the backyard
on ladders putting filters up to try to make sure that the sun didn't glare off the window in a way
that would wreck the shot. So, there was like six people out there doing that. There was three people
out there giving snacks, the craft table. There was another three people giving healthy snacks
because that was a separate craft table. There was one person whose job it was to keep me from
getting lost. And I think the reason for all this is because so many people are in one place at one
time, they have to be time efficient. They have to get it done. The morning they were going to do
my commercial and the afternoon they were going to do a commercial of a mathematics professor from
Rinston. They had to get it done. No wasted time or energy. And so there's just a fleet of people
all working as an organism and it was fascinating. I was just the whole time just looking around like
this is so neat. Like one person whose job it was to take the camera off of the cameraman
so that someone else whose job it was to remove the film canister because every couple's takes,
they had to replace the film because film gets used up. It was just, I don't know, I was geeking
out the whole time. It was so fun. How many takes did it take? It looked the opposite like there
was more than two people there. It was very relaxed. Right. Yeah. The person who I was in the scene
with is a professional. She's an improv comedian from New York City. And when I got there, they
had given me a script such as it was. And then I got there and they said, we're going to do this
as improv. I'm like, I don't know how to improv. I don't know what you're telling me to do here.
Yeah. Don't worry. She knows. I'm like, okay, we'll see how this goes. I guess I got pulled
into the story because like, where the heck did you come from? I guess in the scene. Like,
how did you show up in this random person's house? I don't know. Yeah. Well, I mean,
the reality of it is I stood outside in the blazing sun. There was someone whose job it was
to keep an umbrella over me because I started to sweat. I started to sweat. And so I would
wreck the shot because my face was all shiny with sweat. So there was one person who would dab me
off at an umbrella. But yeah, like the reality of it, like, why is this strange stalkery person
hanging around outside somebody's house? We're not sure when you have to look in. We'll have to
wait for the book. But are you, so you make, you make, like you said, YouTube, you make videos
yourself. You make awesome parody, sort of parody songs that kind of focus on a particular
aspect of computer science. How much, those seem really natural. How much production value goes
into that? Do you also have a team of 50 people? The videos, almost all the videos, except for
the ones that people would have actually seen, were just me. I write the lyrics. I sing the song.
I generally find a, like a backing track online because I'm like, you can't really play an instrument.
And then I do, in some cases, I'll do visuals using just like PowerPoint. Lots and lots of
PowerPoint to make it sort of like an animation. The most produced one is the one that people
might have seen, which is the overfitting video that I did with Charles Isbell. And that was
produced by the Georgia Tech and Udacity people because we were doing a class together. It was
kind of, I usually do parody songs, kind of to cap off a class at the end of a class.
So that one you're wearing, so it was just a thriller. You're wearing the Michael Jackson,
the red leather jacket. The interesting thing with podcasting that you're also into is that
I really enjoy is that there's not a team of people. It's kind of more, because you know,
there's something that happens when there's more people involved than just one person,
that just the way you start acting, I don't know, there's a censorship. You're not given,
especially for like slow thinkers like me, you're not, and I think most of us are,
if we're trying to actually think we're a little bit slow and careful, it kind of
large teams get in the way of that. And I don't know what to do with that. To me,
it's very popular to criticize, quote unquote, mainstream media, but there is legitimacy
to criticizing them the same. I love listening to NPR, for example, but it's clear that there's
a team behind it. There's a constant commercial breaks. There's this kind of like rush of like,
okay, I have to interrupt you now, because we have to go to commercial, just this whole,
it creates, it destroys the possibility of nuanced conversation. Yeah, exactly.
Evian, which Charles Isbel, who I talked to yesterday, told me that Evian is naive backwards,
which the fact that his mind thinks this way is just quite brilliant. Anyway, there's a freedom
to this podcast. He's Dr. Awkward, which by the way, is a palindrome. That's a palindrome that
I happen to know from other parts of my life, and I just figured out, well, you know, use it against
Charles. Dr. Awkward. So what was the most challenging parody song to make? Was it the
Thriller one? No, that one was really fun. I wrote the lyrics really quickly, and then I gave it
over to the production team. They recruited an acapella group to sing. That went really smoothly.
It's great having a team, because then you can just focus on the part that you really love,
which in my case is writing the lyrics. For me, the most challenging one, not challenging in a bad
way, but challenging in a really fun way, was I did, one of the parody songs I did is about
the halting problem in computer science, the fact that you can't create a program that can tell
for any other arbitrary program, whether it's actually going to get stuck in an infinite
loop or whether it's going to eventually stop. And so I did it to an 80s song because that's,
I hadn't started my new thing of learning current songs, and it was Billy Joel's The Piano Man.
Nice. Which is a great song. Yeah, yeah. Sing me a song. You're The Piano Man. Yeah, it's a great song.
So the lyrics are great because, first of all, it rhymes. Not all songs rhyme. I've done
Rolling Stones songs, which turn out to have no rhyme scheme whatsoever. They're just sort of yelling
and having good time, which makes it not fun from a parody perspective, because like you can
say anything. But the lines rhyme, and there was a lot of internal rhymes as well. And so
figuring out how to sing with internal rhymes, a proof of the halting problem, was really challenging.
And I really enjoyed that process. What about, last question, this topic, what about the dancing
in the thriller video? How many takes that take? So I wasn't planning to dance. They had me in
the studio and they gave me the jacket, and it's like, well, you can't, if you have the jacket
and the glove, like there's not much you can do. Yeah. So I think I just danced around. And then
they said, why don't you dance a little bit? There was a scene with me and Charles dancing
together. In that video? They did not use it in the video, but we recorded it. I don't remember.
Yeah, no, it was pretty funny. And Charles, who has this beautiful, wonderful voice,
doesn't really sing. He's not really a singer. And so that was why I designed the song with
him doing a spoken section and me doing the singing. Yeah, it's very like Barry White.
Yeah, smooth baritone. Yeah, yeah, it's great. That was awesome. So one of the other things
Charles said is that, you know, everyone knows you as like a super nice guy, super passionate about
teaching and so on. What he said, I don't know if it's true that despite the fact that you are
in cold blood, like, okay, I will admit this finally for the first time. That was, that was me.
It's the Johnny Cash song. The Manorino, just to watch him die.
But that you actually do have some strong opinions and some topics. So if this, in fact,
is true, what strong opinions would you say you have? Is there ideas you think maybe in artificial
intelligence, machine learning, maybe in life that you believe is true that others might,
you know, some number of people might disagree with you on.
So I try very hard to see things from multiple perspectives. There's, there's this great Calvin
and Hobbs cartoon where, do you know, okay, so Calvin's dad is always kind of a bit of a foil
and he talked Calvin into, Calvin had done something wrong. The dad talks him into like
seeing it from another perspective and Calvin, like this breaks Calvin because he's like,
oh my gosh, now I can see the opposite sides of things. And so it becomes like a Cubist cartoon
where there is no front and back, everything's just exposed and it really freaks him out. And
finally he settles back down. It's like, oh good, no, I can make that go away. But like I'm that,
I'm that, I live in that world where I'm trying to see everything from every perspective all the
time. So there are some things that I've formed opinions about that I would be harder, I think,
to disavow me of. One is the super intelligence argument and the existential threat of AI is
one where I feel pretty confident in my feeling about that one. Like I'm willing to hear other
arguments, but like I am not particularly moved by the idea that if we're not careful, we will
accidentally create a super intelligence that will destroy human life. Let's talk about that.
Let's get you in trouble and record your video. It's like Bill Gates, I think he said like some
quote about the internet that that's just going to be a small thing. It's not going to really go
anywhere. And then I think Steve Ballmer said, I don't know why I'm sticking on Microsoft.
That's something that like smartphones are useless. There's no reason why Microsoft should
get into smartphones that kind of. So let's get you, let's talk about AGI. As AGI is destroying
the world, we'll look back at this video and see. Now, I think it's really interesting to actually
talk about because nobody really knows the future. So you have to use your best intuition. It's very
very difficult to predict it, but you have spoken about AGI and the existential risks
around it and sort of basing your intuition that we're quite far away from that being a
serious concern relative to the other concerns we have. Can you maybe unpack that a little bit?
Yeah, sure, sure, sure. So as I understand it, that for example, I read Boston's book
and a bunch of other reading material about this sort of general way of thinking about the world.
And I think the story goes something like this, that we will at some point create computers that
are smart enough that they can help design the next version of themselves, which itself will be
smarter than the previous version of themselves and eventually bootstrapped up to being smarter
than us, at which point we are essentially at the mercy of this sort of more powerful intellect,
which in principle, we don't have any control over what its goals are. And so if its goals
are at all out of sync with our goals, like for example, the continued existence of humanity,
we won't be able to stop it. It'll be way more powerful than us and we will be toast.
So there's some, I don't know, very smart people who have signed on to that story,
and it's a compelling story. Now I can really get myself in trouble. I once wrote an op-ed about
this, specifically responding to some quotes from Elon Musk, who has been on this very podcast
more than once. E.A.I. Summoning the Demon. I think he said, but then he came to Providence,
Rhode Island, which is where I live, and said to the governors of all the states,
you're worried about entirely the wrong thing. You need to be worried about AI. You need to be
very, very worried about AI. And journalists kind of reacted to that. They wanted to get
people's take. And I was like, okay, my belief is that one of the things that makes Elon Musk
so successful and so remarkable as an individual is that he believes in the power of ideas. He
believes that if you have a really good idea for getting into space, you can get into space.
If you have a really good idea for a company or for how to change the way that people drive,
you just have to do it and it can happen. It's really natural to apply that same idea to AI.
You see the systems that are doing some pretty remarkable computational tricks,
demonstrations, and then to take that idea and just push it all the way to the limit and think,
okay, where does this go? Where is this going to take us next? And if you're a deep believer in
the power of ideas, then it's really natural to believe that those ideas could be taken to the
extreme and kill us. So I think his strength is also his undoing because that doesn't mean it's
true. It doesn't mean that that has to happen, but it's natural for him to think that.
So another way to phrase the way he thinks, and I find it very difficult to argue with that line
of thinking. So Sam Harris is another person from the neuroscience perspective that thinks like that
is saying, well, is there something fundamental in the physics of the universe that prevents
this from eventually happening? And that's, Nick Bostrom thinks in the same way that kind of zooming
out, yeah, okay, we humans now are existing in this like time scale of minutes and days. And so
our intuition is in this time scale of minutes, hours and days. But if you look at the span of
human history, is there any reason we you can't see this in 100 years? And like, is there is there
something fundamental about the laws of physics that prevent this? And if it doesn't, then it
eventually will happen. Or we will destroy ourselves in some other way. And it's very difficult, I
find to actually argue against that. Yeah. Me too. And not sound like, not sound like you're just
like rolling your ass. Like, I have like science fiction, we don't have to think about it. But even
even worse than that, which is like, I don't know, kids, but like, I got to pick up my kids now.
Like, okay, I see, there's more pressing short term. Yeah, there's more pressing short term
things that like stop it with this existential crisis will have much, much shorter things like
now, especially this year, there's COVID. So like any kind of discussion like that is,
like there's, you know, there's pressing things today. And then so the Sam Harris argument,
well, like any day, the exponential singularity can occur is very difficult to argue against.
I mean, I don't know. But part of his story is also, he's not going to put a date on it. It could
be in a thousand years. It could be in a hundred years. It could be in two years. It's just that
as long as we keep making this kind of progress, it's ultimately has to become a concern.
I kind of am on board with that. But the thing that the piece that I feel like is missing from
that, that way of extrapolating from the moment that we're in, is that I believe that in the process
of actually developing technology that can really get around in the world and really process and
do things in the world in a sophisticated way, we're going to learn a lot about what that
means, which that we don't know now, because we don't know how to do this right now.
If you believe that you can just turn on a deep learning network and it eventually
give it enough compute and it'll eventually get there, well, sure, that seems really scary,
because we won't be in the loop at all. We won't be helping to design or target these kinds of
systems. But I don't see that, that feels like it is against the laws of physics,
because these systems need help, right? They need to surpass the difficulty,
the wall of complexity that happens in arranging something in the form that will happen in.
Yeah. I believe in evolution. I believe that there's an argument, right? So there's another
argument, just to look at it from a different perspective, that people say, why don't believe
in evolution? How could evolution, it's sort of like a random set of parts assemble themselves
into a 747, and that could just never happen. So it's like, okay, that's maybe hard to argue
against, but clearly 747s do get assembled, they get assembled by us. Basically, the idea being that
there's a process by which we will get to the point of making technology that has that kind of
awareness. And in that process, we're going to learn a lot about that process, and we'll have
more ability to control it or to shape it or to build it in our own image. It's not something
that is going to spring into existence like that 747, and we're just going to have to contend with it
completely unprepared. That's very possible that in the context of the long arc of human history,
it will in fact spring into existence. But that that springing might take, like if you look at
nuclear weapons, like even 20 years is a springing in the context of human history. And it's very
possible just like with nuclear weapons that we could have, I don't know what percentage you want
to put at it, but the possibility of... Could have knocked ourselves out. Yeah, the possibility of
human beings destroying themselves in the 20th century. With nuclear weapons, I don't know,
you can, if you really think through it, you could really put it close to like, I don't know, 30, 40%,
given like the certain moments of crisis that happen. So like, I think one,
one like fear in the shadows that's not being acknowledged is it's not so much the AI will
run away, is that as it's running away, we won't have enough time to think through how to stop it.
Right, the fast takeoff or fume. Yeah, I mean, my much bigger concern, I wonder what you think
about it, which is we won't know it's happening. So I kind of think that there's an AGI situation
already happening with social media, that our minds, our collective intelligence of human
civilizations is already being controlled by an algorithm. And like, we're already super, like,
the level of a collective intelligence, thanks to Wikipedia, people should donate to Wikipedia
to feed the AGI. Man, if we had a super intelligence that was in line with Wikipedia's values,
that's a lot better than a lot of other things I could imagine. I trust Wikipedia more than I
trust Facebook or YouTube as far as trying to do the right thing from a rational perspective.
Now, that's not where you were going, I understand that, but it does strike me that there's sort of
smarter and less smart ways of exposing ourselves to each other on the internet.
Yeah, the interesting thing is that Wikipedia and social media have very different forces.
You're right. I mean, if AGI was Wikipedia, it'd be just like this cranky, overly competent
editor of articles. There's something to that, but the social media aspect is not,
so the vision of AGI is as a separate system that's super intelligent. That's super intelligent,
that's one key little thing. I mean, there's the paperclip argument that's super dumb,
but super powerful systems. But with social media, you have a relatively like algorithms
we may talk about today, very simple algorithms that when something Charles talks a lot about,
which is interactive AI, when they start like having at scale, like tiny little interactions
with human beings, they can start controlling these human beings. So a single algorithm can
control the minds of human beings slowly to what we might not realize. It could start wars,
it could change the way we think about things. It feels like in the long arc of history, if I
were to sort of zoom out from all the outrage and all the tension on social media, that it's
progressing us towards better and better things. It feels like chaos and toxic and all that kind
of stuff. It's chaos and toxic, yeah. But it feels like actually, the chaos and toxic is similar
to the kind of debates we had from the founding of this country. There was a civil war that happened
over that period. And ultimately, it was all about this tension of like, something doesn't feel
right about our implementation of the core values we hold as human beings and they're constantly
struggling with this. And that results in people calling each other, just being shady to each
other on Twitter. But ultimately, the algorithm is managing all that. And it feels like there's
a possible future in which that algorithm controls us into the direction of self-destruction and
whatever that looks like. Yeah. So, all right, I do believe in the power of social media to screw
us up royally. I do believe in the power of social media to benefit us, too. I do think that we're
in a, yeah, it's sort of almost got dropped on top of us. And now we're trying to, as a culture,
figure out how to cope with it. There's a sense in which, I don't know, there's some arguments
that say that, for example, I guess college-age students now, late college-age students now,
people who were in middle school when social media started to really take off,
may be, may be really damaged. This may have really hurt their development in a way that we
don't have all the implications of quite yet. That's the generation who, and I hate to make it
somebody else's responsibility, but they're the ones who can fix it. They're the ones who can figure
out, how do we keep the good of this kind of technology without letting it eat us alive?
And if they're successful, we move on to the next phase, the next level of the game. If they're not
successful, then, yeah, then we're going to wreck each other. We're going to destroy society.
So you're going to, in your old age, sit on a porch and watch the world burn because of the
TikTok generation that... I believe, well, so this is my kid's age, right? And certainly my
daughter's age, and she's very tapped in to social stuff, but she's also, she's trying to find that
balance, right, of participating in it and in getting the positives of it, but without letting
it eat her alive. And I think sometimes she ventures... I hope she doesn't watch this.
Sometimes I think she ventures a little too far and is consumed by it, and other times she gets
a little distance. And if there's enough people like her out there, they're going to navigate
this choppy waters. That's an interesting skill, actually, to develop. I talked to my dad about
it. You know, I've now somehow, this podcast in particular, but other reasons has received a
little bit of attention. And with that, apparently in this world, even though I don't shut up about
love and I'm just all about kindness, I have now a little mini army of trolls. It's kind of hilarious,
actually, but it also doesn't feel good. But it's a skill to learn to not look at that, like to
moderate actually, how much you look at that. The discussion I have with my dad is similar to,
it doesn't have to be about trolls, it could be about checking email, which is like, if you're
anticipating, you know, there's my dad runs a large institute at Drexel University, and there
could be stressful like emails you're waiting, like there's drama of some kinds. And so like,
there's a temptation to check the email, if you send an email and you got it, and that pulls you
in into, it doesn't feel good. And it's a skill that he actually complains that he hasn't learned,
I mean, he grew up without it. So he hasn't learned the skill of how to shut off the internet and
walk away. And I think young people, while they're also being quote unquote damaged by like,
you know, being bullied online, all those stories, which are very like horrific, you basically can't
escape your bullies these days when you're growing up. But at the same time, they're also learning
that skill of how to be able to shut off the, like disconnect with it, be able to laugh at it,
not take it too seriously. It's fascinating. Like we're all trying to figure this out. Just like
you said, has it been dropped on us? And we're trying to figure it out. Yeah, I think that's
really interesting. And I guess I've become a believer in the human design, which I feel like
I don't completely understand. Like how do you make something as robust as us? Like we're so flawed
in so many ways. And yet, and yet, you know, we dominate the planet. And we do seem to manage
to get ourselves out of scrapes, eventually, not necessarily the most elegant possible way,
but somehow we get we get to the next step. And I don't know how I'd make a machine do that. I
generally speaking, like if I train one of my reinforcement learning agents to play a video
game, and it works really hard on that first stage over and over and over again, and it makes it
through it succeeds on that first level. And then the new level comes, and it's just like, okay,
I'm back to the drawing board. And somehow humanity, we keep leveling up. And then somehow
managing to put together the skills necessary to achieve success, some semblance of success in
that next level too. And, you know, I hope we can keep doing that. You mentioned reinforcement
learning. So you've had a couple years in the field. No, quite, you know, quite a few, quite a
long career in artificial intelligence broadly, but reinforcement learning specifically.
Can you maybe give a hint about your sense of the history of the field, and in some ways it's
changed with the advent of deep learning, but as long roots, like how is it weaved in and out of
your own life? How have you seen the community change, or maybe the ideas that it's playing with
change? I've had the privilege, the pleasure of being of having almost a front row seat to a lot
of this stuff. And it's been really, really fun and interesting. So when I was in college,
in the 80s, early 80s, the neural net thing was starting to happen. And I was taking a lot of
psychology classes and a lot of computer science classes as a college student. And I thought,
you know, something that can play tic-tac-toe and just like learn to get better at it, that ought
to be a really easy thing. So I spent almost, almost all of my, what would have been vacations
during college, like hacking on my home computer, trying to teach it how to play tic-tac-toe and
programming language? Basic. Oh yeah. That's my first language. That's my native language.
Is that when you first fell in love with computer science, just like programming basic on that?
What was the computer, do you remember? I had a TRS-80 model one before they were called model
ones, because there was nothing else. I got my computer in 1979. So I would have been Bar Mitzvahd,
but instead of having a big party that my parents threw on my behalf, they just got me a computer,
because that's what I really, really, really wanted. I saw them in the mall in Rearshack,
and I thought, what, how are they doing that? I would try to stump them. I would give them math
problems, like one plus, and then in parentheses, two plus one. And it would always get it right.
I'm like, how do you know so much? I've had to go to an algebra class for the last few years to
learn this stuff, and you just seem to know. So I was smitten and got a computer, and I
think ages 13 to 15, I have no memory of those years. I think I just was in my room with the
computer. Listening to Billy Joel. Communing, possibly listening to the radio, listening
to Billy Joel. That was the one album I had on vinyl at that time. And then I got it on cassette
tape, and that was really helpful, because then I could play it. I didn't have to go down in my
parents' Wi-Fi, or hi-fi, sorry. And at age 15, I remember kind of walking out and like,
okay, I'm ready to talk to people again. Like, I've learned what I need to learn here.
And so yeah, so that was my home computer. And so I went to college, and I was like,
oh, I'm totally going to study computer science. And I opted, the college I chose specifically
had a computer science major. The college I really wanted to go to didn't, so bye-bye to them.
Which college did you go to? So I went to Yale. Princeton would have been way more convenient,
and it was just a beautiful campus, and it was close enough to home, and I was really excited
about Princeton. And I visited, I said, so computer science major, like, well, we have
computer engineering. I'm like, oh, I don't like that word engineering. I like computer science.
I really, I want to do, like, you're saying hardware and software? They're like, yeah,
like, I just want to do software. I couldn't care less about hardware.
And you grew up in Philadelphia? I grew up outside Philly, yeah. Yeah.
Okay. So the local schools were like Penn and Drexel and Temple. Like,
everyone in my family went to Temple, at least at one point in their lives, except for me.
So, yeah, Philly family.
Yale had a computer science department, and that's when you,
it's kind of interesting you said 80s in neural networks. That's when the neural networks are
a hot new thing or a hot thing, period. So what is that in college, when you first learned about
neural networks? Yeah. Yeah.
When she learned, like, how did you- And it was in a psychology class, not in a CS class.
Oh, wow. Yeah.
Was it psychology or cognitive science, or like, do you remember, like, what context?
It was, yeah, yeah, yeah. So I was a, I've always been a bit of a cognitive psychology groupie.
So like, I studied computer science, but I like, I like to hang around where the
cognitive scientists are, because I don't know, brains, man, they're like, they're wacky, cool.
And they have a bigger picture view of things. They're a little less
engineering, I would say. They're more, they're more interested in the nature of cognition and
intelligence and perception and how the division system works. Like, they're asking always bigger
questions. Now, with the deep learning community, there, I think more, there's a lot of intersections,
but I do find that the, the neuroscience folks actually, and cognitive psychology, cognitive
science folks, are starting to learn how to program, how to use neural, artificial neural
networks. And they are actually approaching problems in like, totally new, interesting ways.
It's fun to watch that grad students from those departments, like, approach a problem of machine
learning. Right, they come in with a different perspective. Yeah, they don't care about like,
you're imaging that data set or whatever. They want like, to understand the, the, the, like,
the basic mechanisms at the, at the neuronal level, at the functional level of intelligence.
So it's kind of, it's kind of cool to see them work. But yeah, okay, so you always love, you're
always a groupie of cognitive psychology. Yeah, yeah. And so, so it was in a class by Richard
Garrick. He was kind of my, my favorite psych professor in college. And I took like, three
different classes with him. And yeah, so that we, they were talking specifically the class, I think
was kind of a, there was a big paper that was written by Stephen Pinker and Prince, I don't,
I'm blanking on Prince's first name, but Prince and Pinker and Prince, they wrote kind of a,
they were at that time, kind of like, I'm blanking on the names of the current people,
the cognitive scientists who are complaining a lot about deep networks.
Oh, Gary, Gary Marcus, and who else? I mean, there's a few, but Gary, Gary's the most feisty.
Sure. Gary's very feisty. And with this, with his co-author, they, they, you know, they're kind of
doing these kinds of takedowns where they say, okay, well, yeah, it does all these amazing,
amazing things, but here's a shortcoming, here's a shortcoming, here's a shortcoming. And so the
Pinker Prince paper is kind of like the, that generation's version of Marcus and Davis, right,
where they're, they're trained as cognitive scientists, but they're looking skeptically at
the results in the, in the artificial intelligence neural net kind of world and saying, yeah, it
can do this and this and this, but like, it can't do that. And it can't do that. And it can't do
that. Maybe in principle, or maybe just in practice at this point, but, but the fact of the matter is
you're, you've narrowed your focus too far to be impressed. You know, you're impressed with the,
the things within that circle, but you need to broaden that circle a little bit. You need to
look at a wider set of problems. And so, so we had, so I was in this seminar in college that was
basically a close reading of the Pinker Prince paper, which was like really thick. There was a
lot going on in there. And, and it, and it talked about the reinforcement learning idea a little
bit. I'm like, oh, that sounds really cool because behavior is what is really interesting to me
about psychology anyway. So making programs that, I mean, programs are things that behave.
People are things that behave. Like I want to make learning that learns to behave.
In which way was reinforcement learning presented? Is this talking about human and
animal behavior? Or are we talking about actual mathematical construct?
That's right. So that's a good question. Right. So this is, I think it wasn't actually talked about
as behavior in the paper that I was reading. I think that it just talked about learning.
And to me, learning is about learning to behave, but really neural nets at that point were about
learning, like supervised learning. So learning to produce outputs from inputs. So I kind of
tried to invent reinforcement learning. I, when I graduated, I joined a research group at Bellcore,
which had spun out of Bellabs recently at that time because of the divestiture of the,
of long distance and local phone service in the 1980s, 1984. And I was in a group with Dave Ackley,
who was the first author of the Boltzmann machine paper. So the very first neural net paper that
could handle XOR, right? So XOR sort of killed neural nets, the very first, the zero width
order. The first winter. Yeah. The, the perceptron's paper and Hinton, along with the student Dave
Ackley, and, and I think there was other authors as well, showed that, no, no, no, with Boltz
machines, we can actually learn nonlinear concepts. And so everything's back on the table again. And
that kind of started that second wave of neural networks. So Dave Ackley was, he became my mentor
at, at Bellcore. And we talked a lot about learning and life and computation and how all
these things fit together. Now, Dave and I have a podcast together. So, so I get to kind of
enjoy that sort of his, his perspective once again, even, even all these years later.
And so I said, so I said, I was really interested in learning, but in the concept of behavior.
And he's like, Oh, well, that's reinforcement learning here. And he gave me Rich Sutton's
1984 TD paper. So I read that paper, I honestly didn't get all of it. But I got the idea, I got
that they were using, that he was using ideas that I was familiar with in the context of neural
nets and, and like sort of back prop. But with this idea of making predictions over time,
I'm like, this is so interesting, but I don't really get all the details I said to Dave. And
Dave said, Oh, well, why don't we have him come and give a talk? And I was like,
wait, what, you can do that? Like, these are real people? I thought they were just words. I thought
it was just like ideas that somehow magically seeped into paper. He's like, no, I, I, I know Rich,
like, we'll just have him come down and he'll give a talk. And so I was, you know, my mind was blown.
And so Rich came and he gave a talk at Bellcore. And he talked about what he was super excited,
which was they had just figured out at the time, Q learning. So Watkins had visited the
Rich Sutton's lab at UMass or Andy Bartow's lab that Rich was a part of. And he was really excited
about this because it resolved a whole bunch of problems that he didn't know how to resolve in
the, in the earlier paper. And so people don't know TD temporal difference. These are all just
algorithms for reinforcement learning. Right. And TD temper difference in particular is about
making predictions over time. And you can try to use it for making decisions, right? Because if you
can predict how good a future action, an action outcomes will be in the future, you can choose
one that has better and over. But the theory didn't really support changing your behavior,
like the predictions had to be of a consistent process if you really wanted it to work.
And one of the things that was really cool about Q learning, another algorithm for
reinforcement learning is it was off policy, which meant that you could actually be learning
about the environment and what the value of different actions would be while actually
figuring out how to behave optimally. So that was a revelation.
Yeah. And the proof of that is kind of interesting. I mean, that's really surprising to me when I
first read that and then enriched rich Sutton's book on the matter. It's, it's kind of beautiful
that a single equation can capture equation, one line of code and like you can learn anything.
Yeah, like an equation and code. You're right. Like you can, the code that you can arguably,
at least if you like squint your eyes can say this is all of intelligence is that you can implement
that in a single, well, I think I started with Lisp, which is a shout out to Lisp,
like a single line of code, key piece of code, maybe a couple, they could do that as kind of
magical. It's feels too good to be true. Well, and it sort of is. Yeah. It seems they require
an awful lot of extra stuff supporting it. But, but nonetheless, the idea is, the idea is really
good. And as far as we know, it is, it is a very reasonable way of trying to create adaptive
behavior, behavior that gets better at something over time. Did you find the idea of optimal
at all compelling that you could prove that it's optimal? So like one part of computer science
that it makes people feel warm and fuzzy inside is when you can prove something like
that assorting algorithm worst case runs and, and log in and it makes everybody feel so good,
even though in reality, it doesn't really matter what the worst case is, what matters is like,
does this thing actually work in practice on this particular actual set of data that I,
that I enjoy? Did you? So here's that, here's a place where I have maybe a strong opinion.
Uh-oh. Which is like, you're right, of course, but no, no, like so, so what makes worst case
so great, right? If you have a worst case analysis, so great is that you get modularity.
You can take that thing and plug it into another thing and still have some understanding of what's
going to happen when you click them together, right? If it just works well in practice, in
other words, with respect to some distribution that you care about, when you go plug it into
another thing, that distribution can shift and can change and your thing may not work well anymore
and you want it to and you wish it does and you hope that it will, but it might not. And then,
ah. So you're, so you're saying you don't like machine learning.
But we have some positive theoretical results for these things. You know, you can come back at me
with, yeah, but they're really weak and yeah, they're really weak and you can even say that,
you know, sorting algorithms, like if you do the optimal sorting algorithm, it's not really the
one that you want and that might be true as well. But it is, the modularity is a really powerful
state. But as an engineer, you can then assemble different things that you can count on them to
be, I mean, it's interesting. It's a balance like with everything else in life. You don't want to
get too obsessed. I mean, this is what computer scientists do, which they tend to like get obsessed
and they over optimize things or they start by optimizing them and they over optimize.
So it's, it's easy to like get really granular about this thing. But like the step from an
N squared to an N log N sorting algorithm is a big leap for most real world systems. No matter
what the actual behavior of the system is, that's a big leap. And the same can probably be said for
other kind of first leaps that you would take on a particular problem. Like it's the picking the
low hanging fruit or whatever the equivalent of doing the not the dumbest thing, but the next
to the dumbest thing is picking the most delicious reachable fruit. Yeah. Most delicious reachable
fruit. I don't know why that's not a saying. And yeah. Okay. So, so you then this is the 80s and
this kind of idea starts to percolate of learning. At that point, I got to meet Rich Sutton. So
everything was sort of downhill from there. And that was, that was really the pinnacle of everything.
But then I, you know, then I felt like I was kind of on the inside. So then as interesting
results were happening, I could like check in with, with Rich or with Jerry Tesaro, who had a huge
impact on kind of early thinking in, in temple difference learning and reinforcement learning
and show that you could do, you could solve problems that we didn't know how to solve any other way.
And so that was really cool. So it was good things were happening. I would hear about it from
either the people who were doing it or the people who were talking to the people who were doing it.
And so I was able to track things pretty well through, through the 90s. So what wasn't most of
the excitement on reinforcement learning in the 90s era with, what is it, TD Gamma? Like,
what's the role of these kind of little like fun game playing things and breakthroughs about
you know, exciting the community? Was that like, what were your, because you've also built a cross
or we're part of building a crossword puzzle, uh, solver, yeah, solving program, uh, called
proverb. So, so you were interested in this as a problem, like in forming, in using games to
understand how to build, uh, intelligent systems. So like, what did you think about TD Gamma? Like,
what did you think about that whole thing in the 90s? Yeah. I mean, I found the TD Gamma result
really just remarkable. So I had known about some of Jerry's stuff before he did TD Gamma. He did a
system, just more vanilla, well, not in, not entirely vanilla, but a more classical backpropy
kind of, uh, network for playing backgammon where he was training it on expert moves. So it was
kind of supervised, but the way that it worked was not to mimic the actions, but to learn internally
an evaluation function. So to learn, well, if the expert chose this over this, that must mean that
the expert values this more than this. And so let me adjust my weights to make it so that the network
evaluates this as being better than this. So it could learn from, from human preferences, it could
learn its own preferences. And then when he took the step from that to actually doing it as a full
on reinforcement learning problem where you didn't need a trainer, you could just let it play,
that was, that was remarkable. Right. And so I think as, as humans often do, as we've done in
the recent past as well, people extrapolate. And it's like, Oh, well, if you can do that, which
is obviously very hard, then obviously you could do all these other problems that we, that we want
to solve that we know are also really hard. And it turned out very few of them ended up being
practical, partly because I think neural nets, certainly at the time we're struggling to be
consistent and reliable. And so training them in a reinforcement learning setting was a bit of a
mess. I had, I don't know, generation after generation of like master students who wanted to do
value function approximation, basically learn reinforcement learning with neural nets.
And over and over and over again, we were failing. We couldn't get the good results that Jerry
Tesarro got. I now believe that Jerry is a neural net whisperer. He has a particular
ability to get neural networks to do things that other people would find impossible. And it's not
the technology, it's the technology and Jerry together. Yeah. At which I think speaks to
the role of the human expert in the process of machine learning. Right. It's so easy. We're so
drawn to the idea that it's the technology that is where the power is coming from, that I think we
lose sight of the fact that sometimes you need a really good, just like, I mean, no one would think,
hey, here's this great piece of software. Here's like, I don't know, GNU Emacs or whatever. And it
doesn't that prove that computers are super powerful and basically going to take over the world.
It's like, no, Stamina is a hell of a hacker. Right. So he was able to make the code do these
amazing things. He couldn't have done it without the computer, but the computer couldn't have
done it without him. And so I think people discount the role of people like Jerry who
have just a particular set of skills. On that topic, by the way, as a small side note,
I tweeted Emacs is greater than Vim yesterday and deleted the tweet 10 minutes later when I
realized you were on fire. It started a war. I was like, oh, I was just kidding. I was just being
and I'm going to walk back and on. So people still feel passionately about that particular
piece of software. Yeah, I don't get that because Emacs is clearly so much better. I don't understand.
But why do I say that? Because I spent a block of time in the 80s making my fingers
know the Emacs keys. And now that's part of the thought process for me. I need to express.
And if you take that, if you take my Emacs key bindings away, I become...
I can't express myself. I'm the same way with, I don't know if you know what it is,
but a Kinesis keyboard, which is this butt shaped keyboard. Yes, I've seen them.
Yeah. And they're very, I don't know, sexy, elegant. They're beautiful.
Yeah, they're gorgeous, way too expensive. But the problem with them, similar with Emacs is
once you learn to use it. It's harder to use other things.
It's hard to use other things. There's this absurd thing where I have like small, elegant,
lightweight, beautiful little laptops. And I'm sitting there in a coffee shop with a giant Kinesis
keyboard in a sexy little laptop. It's absurd. But you know, I used to feel bad about it,
but at the same time, you just kind of have to, sometimes it's back to the Billy Joel thing.
You just have to throw that Billy Joel record and throw Taylor Swift and Justin Bieber to the wind.
But I like them now because again, I have no musical taste. Like now that I've heard Justin
Bieber enough, I'm like, I really like his songs. And Taylor Swift, not only do I like her songs,
but my daughter's convinced that she's a genius. And so now I basically have signed on to that.
So yeah, that speaks to the back to the robustness of the human brain. That speaks to
the neuroplasticity that you can just, you can just like a mouse, teach yourself to,
or probably dog, teach yourself to enjoy Taylor Swift. I'll try it out. I don't know.
I try, you know what it has to do with just like acclimation, right? Just like you said,
a couple of weeks. Yeah.
That's an interesting experiment. I'll actually try that.
Like I'll listen to it. That wasn't the intent of the experiment,
just like social media. It wasn't intended as an experiment to see what we can take as a society,
but it turned out that way.
I don't think I'll be the same person on the other side of the week listening to Taylor Swift,
but let's try. You know, it's more compartmental. Don't be so worried. Like I get that you can be
worried, but don't be so worried because we compartmentalize really well. And so it won't
bleed into other parts of your life. You won't start, I don't know, wearing red lipstick or
whatever. Like it's fine. It's fine. Change fashion and everything. It's fine. But you know what? The thing
you have to watch out for is you'll walk into a coffee shop once we can do that again.
And recognize the song.
And you'll be, no, you won't know that you're singing along until everybody in the coffee
shop is looking at you. And then you're like, that wasn't me.
Yeah. That's the, you know, people are afraid of AGI. I'm afraid of the Taylor Swift takeover.
Yeah. And I mean, people should know that TD Gammon was, I get, would you call it,
do you like the terminology of self play by any chance? So, so like systems that learn
by playing themselves. Just, I don't know if it's the best word, but uh...
So what's, what's the problem with that term?
Okay. So it's like the big bang, like it's, it's like talking to serious physicists.
Do you like the term big bang when, when it was early? I feel like it's the early days of self
play. I don't know. Maybe it was used previously, but I think it's been used by only a small group
of people. And so like, I think we're still deciding, is this ridiculously silly name,
a good name for the concept, potentially one of the most important concepts in artificial intelligence?
Okay. It depends how broadly you apply the term. So I used the term in my 1996 PhD dissertation.
Oh, you, wow. The actual term stuff.
Yeah. Because, because Tassaro's paper was something like training up an expert back
admin player through self play. So I think it was in the title of his paper.
Oh, okay. If not in the title, it was definitely a term that he used.
There's another term that we got from that work is rollout. So I don't know if you,
do you ever hear the term rollout? That's a back admin term that has now applied
generally in computers. Well, at least in AI, because of TDGammon.
Yeah. That's fascinating.
So how is self play being used now? And like, why is it, does it,
does it feel like a more general powerful concept? It's sort of the idea of, well,
the machine just going to teach itself to be smart.
Yeah. So that's, that's where maybe you can correct me, but that's where, you know,
the continuation of the spirit and actually like literally the exact algorithms of TDGammon
are applied by DeepMind and OpenAI to learn games that are a little bit more complex,
that when I was learning artificial intelligence, Go was presented to me
with artificial intelligence, the modern approach. I don't know if they explicitly
pointed to Go in those books as like unsolvable kind of thing, like implying that these approaches
hit their limit in this, with these particular kind of games. So something,
I don't remember if the book said it or not, but something in my head for was the professors
instilled in me the idea like this is the limits of artificial intelligence of the field.
Like it instilled in me the idea that if we can create a system that can solve the game of Go,
we've achieved AGI. That was kind of, I didn't explicitly like say this, but that was the feeling.
And so from, I was one of the people that it seemed magical when a learning system was able to beat
a human world champion at the game of Go. And even more so from that, that was AlphaGo,
even more so with AlphaGo Zero than kind of renamed and advanced into AlphaZero,
beating a world champion or world-class player without any supervised learning on expert games
we're doing only through by playing itself. So that is, I don't know what to make of it.
I think it'll be interesting to hear what your opinions are on just how exciting, surprising,
profound, interesting, or boring the breakthrough performance of AlphaZero was.
Okay. So AlphaGo knocked my socks off. That was so remarkable.
Which aspect of it?
They got it to work that they actually were able to leverage a whole bunch of different ideas,
integrate them into one giant system. Just the software engineering aspect of it is mind blowing.
I've never been a part of a program as complicated as the program that they built for that.
And just like Jerry Chisaro is a neural net whisperer, David Silver is a kind of neural
net whisperer too. He was able to coax these networks and these new way out there architectures
to solve these problems that, as you said, when we were learning from AI,
no one had an idea how to make it work. It was remarkable that these techniques that were so
good at playing chess and that could beat the world champion in chess couldn't beat your typical
go playing teenager in Go. So the fact that in a very short number of years we kind of ramped up to
trouncing people in Go just blew me away.
So you're kind of focusing on the engineering aspect, which is also very surprising.
I mean, there's something different about large, well-funded companies. I mean,
there's a compute aspect to it too, like that, of course, I mean, that's similar to Deep Blue,
right, with IBM. There's something important to be learned and remembered about a large company
taking the ideas that are already out there and investing a few million dollars into it,
or more. And so you're kind of saying the engineering is kind of fascinating both on the
with AlphaGo is probably just gathering all the data, right, of the expert games, like organizing
everything, actually doing distributed supervised learning. And to me, see the engineering I kind
of took for granted, to me philosophically being able to persist in the in the face of like long
odds, because it feels like for me, I will be one of the skeptical people in the room thinking
that you can learn your way to beat Go. Like it sounded like, especially with David Silver,
it sounded like David was not confident at all. It's funny how confidence works.
It's like, you're not cocky about it, but... Right, because if you're cocky about it,
you kind of stop and stall and don't get anywhere. Yeah, but there's a hope that's
unbreakable. Maybe that's better than confidence. It's a kind of wishful hope and a little dream,
and you almost don't want to do anything else. You kind of keep doing it. That seems to be the
story. But with enough skepticism that you're looking for where the problems are and fighting
through them, because you know, there's got to be a way out of this thing. And for him, it was
probably, there's a bunch of little factors that come into play. It's funny how these stories just
all come together. Like everything he did in his life came into play, which is like a love for video
games and also a connection to the 90s had to happen with TD Gammon and so on. In some ways,
it's surprising. Maybe you can provide some intuition to it that not much more than TD
Gammon was done for quite a long time on the reinforcement learning front. Is that weird to
you? I mean, like I said, the students who I worked with, we tried to get, basically apply
that architecture to other problems, and we consistently failed. There were a couple really
nice demonstrations that ended up being in the literature. There was a paper about controlling
elevators, right, where it's like, okay, can we modify the heuristic that elevators use for
deciding, like a bank of elevators for deciding which floors we should be stopping on to maximize
throughput, essentially. And you can set that up as a reinforcement learning problem, and you can,
you know, have a neural net represent the value function so that it's taking
where are all the elevators, where are the button pushes, you know, this high-dimensional,
well, at the time, high-dimensional input, you know, a couple dozen dimensions,
and turn that into a prediction as to, oh, is it going to be better if I stop at this floor or not?
And ultimately, it appeared as though for the standard simulation distribution for people trying
to leave the building at the end of the day, that the neural net learned a better strategy
than the standard one that's implemented in elevator controllers. So that was nice. There was
some work that Satinder Singh at all did on handoffs with cell phones, you know, deciding when should
you hand off from this cell tower to this cell tower. Oh, okay, communication networks, yeah.
Yeah. And so a couple things seemed like they were really promising. None of them made it into
production that I'm aware of. And neural nets as a whole started to kind of implode around then.
And so there just wasn't a lot of air in the room for people to try to figure out, okay,
how do we get this to work in the RL setting? And then they found their way back in 10 plus years.
So you said AlphaGo was impressive, like it's a big spectacle. Is there...
Right, so then AlphaZero. So I think I may have a slightly different opinion on this than some
people. So I talked to Satinder Singh in particular about this. So Satinder was like Rich Sutton,
a student of Antibarto. So they came out of the same lab, very influential machine learning,
reinforcement learning researcher. Now at DeepMind, as is Rich, though different sites,
the two of them. He's in Alberta. Rich is in Alberta. And Satinder would be in England,
but I think he's in England from Michigan at the moment. But he was much more impressed with AlphaGo
Zero, which didn't get a kind of a bootstrap in the beginning with human-trained games.
It just was purely self-play. Though the first one, AlphaGo, was also a tremendous amount of
self-play. They started off, they kickstarted the action network that was making decisions,
but then they trained it for a really long time using more traditional
temple difference methods. So as a result, it didn't seem that different to me. It seems like,
yeah, why wouldn't that work? Once it works, it works. But he found that removal of that
extra information to be breathtaking. That's a game changer. To me, the first thing was more of a
game changer. The open question, I mean, I guess that's the assumption is that expert games might
contain within them a humongous amount of information. But we know that it went beyond
that. We know that it somehow got away from that information because it was learning strategies.
I don't think AlphaGo is just better at implementing human strategies. I think it
actually developed its own strategies that were more effective. And so from that perspective,
okay, well, so it made at least one quantum leap in terms of strategic knowledge. Okay,
so now maybe it makes three. But that first one is the doozy, getting it to work reliably
and for the networks to hold on to the value well enough. That was a big step.
Well, maybe you could speak to this on the reinforcement learning front. So starting
from scratch and learning to do something like the first random behavior to crappy behavior,
to somewhat okay behavior, it's not obvious to me that that's not impossible to take those steps.
If you just think about the intuition, how the heck does random behavior become
somewhat basic intelligent behavior? Not human level, not super human level, but just basic.
But you're saying to you, the intuition is like, if you can go from human to super human level
intelligence on this particular task of game playing, then you're good at taking leaps. So
you can take many of them. That the system, I believe that the system can take that kind of
leap. Yeah. And also, I think that beginner knowledge in go, you can start to get a feel
really quickly for the idea that being in certain parts of the board seems to be
more associated with winning. Because it's not stumbling upon the concept of winning.
It's told that it wins or that it loses. Well, it's self play. So it both wins and loses.
It's told which side won. And the information is kind of there to start percolating around
to make a difference as to, well, these things have a better chance of helping you win and
these things have a worse chance of helping you win. And so it can get to basic play,
I think pretty quickly. Then once it has basic play, well, now it's kind of forced to do some
search to actually experiment with, okay, well, what gets me that next increment of improvement?
How far do you think? Okay, this is where you kind of bring up the Elon Musk and the Sam Harris
is right. How far is your intuition about these kinds of self play mechanisms being able to take
us? Because it feels one of the ominous, but stated calmly things that when I talked to David
Silver, he said, is that they have not yet discovered a ceiling for Alpha zero, for example,
on the game of Go or chess. Like, it's, it keeps, no matter how much that computer they throw at it,
it keeps improving. So it's possible. It's very possible that you, if you throw, you know,
some like 10x compute that it will improve by 5x or something like that. And when stated calmly,
it's so like, oh, yeah, I guess so. But, but like, and then you think like, well,
can we potentially have like, continuations of Moore's law in totally different way, like
broadly defined Moore's law, not the kind of exponential improvement like, are we going to
have an Alpha zero that swallows the world? But notice it's not getting better at other things.
It's getting better at Go. And I think it's a, that's a big leap to say, okay, well, therefore,
better at other things. Well, I mean, the question is how much of the game of life can be turned
into, right? So that's of that, I think, is a really good question. And I think that we don't,
I don't think we as a, I don't know, community really know that the answer to this. But
so, okay, so, so I went, I went to a talk by some experts on computer chess. So in particular,
computer chess is really interesting because for, you know, for, of course, for a thousand years,
humans were the best chess playing things on the planet. And then computers like edge-to-head of
the best person, and they've been ahead ever since it's not like people have, have overtaken
computers. But, but computers and people together have overtaken computers. Right. So at least
last time I checked, I don't know what the very latest is, but last time I checked that there
were teams of people who could work with computer programs to defeat the best computer programs.
In the game of Go? In the game of chess. In the game of chess. Right. And so using the information
about how these things called ELO scores, this sort of notion of how strong a player are you.
There's a, there's kind of a range of possible scores and the, the, you, you increment in score
basically if you can beat another player of that lower score, 62% of the time or something like
that. Like there's some threshold of if you can somewhat consistently beat someone, then you are
of a higher score than that person. And there's a question as to how many times can you do that in
chess? Right. And so we know that there's a range of human ability levels that cap out with the best
playing humans. And the computers went a step beyond that. And computers and people together
have not gone, I think, a full step beyond that. It feels the estimates that they have is that it's
starting to asymptote, that we've reached kind of the maximum, the best possible chess playing.
And so that means that there's kind of a finite strategic depth. Right. At some point, you just
can't get any better at this game. Yeah. I mean, I, I don't, so I'll actually check that. I think
it's interesting because if you have somebody like Magnus Carlson, who's using these chess programs
to train his mind, like to learn about chess. To become a better chess player. Yeah. And so like
that's a very interesting thing because it, we're not static creatures. We're learning together.
I mean, just like we're talking about social networks, those algorithms are teaching us just
like we're teaching those algorithms. So that's a fascinating thing. But I think the best chess
playing programs are now better than the pairs. Like they have competition between pairs, but the,
it's still, even if they weren't, it's an interesting question. Where's the ceiling?
So the, the David, the ominous David Silver kind of statement is like, we have not found the ceiling.
Right. But the, so the question is, okay, so I don't, I don't know his analysis on that. My,
from talking to Go experts, the depth, the strategic depth of Go seems to be substantially
greater than that of chess, that there's more kind of steps of improvement that you can make,
getting better and better and better and better. But there's no reason to think that it's infinite.
Infinite, yeah. And so it could be that it's, that what David is seeing is a kind of asymptoting,
that you can keep getting better, but with diminishing returns. And at some point,
you hit optimal play. Like in theory, all these finite games, they're finite.
They have an optimal strategy. There's a strategy that is the minimax optimal strategy.
And so at that point, you can't get any better. You can't beat that, that strategy. Now that
strategy may be from an information processing perspective, intractable, right? The, the,
you need, the, the, all the situations are sufficiently different that you can't compress it
at all. It's this giant mess of hard coded rules. And we can never achieve that. But, but that still
puts a cap on how many levels of improvement that we can actually make. But the, the, the thing
about self play is if you, if you put it, although I don't like doing that in a broader category of
self supervised learning is that it doesn't require too much or any human, human labeling.
Yeah. Yeah. Human label or just human effort, the human involvement past a certain point.
And the same thing you could argue is true for the recent breakthroughs in natural
language processing with language models. Oh, this is how you get to GPT three.
Yeah. See how that did the, uh, that was a good, good transition. Yeah.
Yeah. I practiced that for days, uh, leading up to this. No, uh, but like that's one of the
questions is can we find ways to formulate problems in this world that are important to us humans,
like more important than the game of chess, that, uh, to which self supervised kinds of
approaches could be applied, whether it's self play, for example, for like, maybe you could
think of like autonomous vehicles in, in simulation, that kind of stuff, or just robotics applications
and simulation, or, uh, in the self supervised learning where un, unannotated data or data
that's generated by humans naturally without extra costs, like the Wikipedia or like all of the
internet can be used to, to learn something about, to create intelligent systems that do
something, uh, really powerful that pass the touring test or that do some kind of superhuman
level performance. So what's your intuition, like trying to stitch all of it together about
our discussion of AGI, the limits of self play and your thoughts about maybe the limits of
neural networks in the context of language models. Is there some intuition in there that might be
be useful to think about? Yeah, yeah, yeah. So, so first of all, the, the whole transformer network
family of things, um, is really cool. It's really, really cool. I mean, for, you know, if you've ever,
back in the day, you played with, I don't know, Markov models for generating text and you've seen
the kind of text that they spit out and you compare it to what's happening now, it's, it's
amazing. It's so amazing. Now, it doesn't take very long interacting with one of these systems
before you find the holes, right? It's, it's not smart in any kind of general way. It's really
good at a bunch of things and it does seem to understand a lot of the statistics of language
extremely well. And that turns out to be very powerful. You can answer many questions with that,
but it doesn't make it a good conversation list, right? And it doesn't make it a good storyteller.
It just makes it good at imitating of things that is seen in the past. The exact same thing could
be said by people who voting for Donald Trump, about Joe Biden supporters and people voting for
Joe Biden, about Donald Trump supporters is, uh, you know, that they're not intelligent. They're
just following the, yeah, they're following things they've seen in the past. And, uh, so it's very,
it doesn't take long to find the flaws in their, uh, in, in their like natural language generation
abilities. Yes. Yes. So we're being very, that's interesting. Critical of AS systems. Right. So,
so I've had a similar thought, which was that the stories that GPT three spits out are amazing
and very human like, and it doesn't mean that computers are smarter than we realize necessarily.
It partly means that people are dumber than we realize or that much of what we do day to day
is not that deep. Like we're just, we're just kind of going with the flow. We're saying whatever
feels like the natural thing to say next, not a lot of it is, is, is creative or meaningful or,
or intentional. But enough is that we actually get, we get by, right? And we, we do come up with
new ideas sometimes and we do manage to talk each other into things sometimes and we do sometimes
vote for reasonable people sometimes. But, um, but it's really hard to see in the statistics
because so much of what we're saying is kind of rote. Um, and so our metrics that we use to measure
how these systems are doing, don't reveal that because it's, it's, it's in the interstices
that, that is very hard to detect. But is your, do you have an intuition that with these language
models, if they grow in size, it's already surprising that when you go from GPT two to GPT
three, that there is a noticeable improvement. So the question now goes back to the ominous
David Silver in the ceiling. Right. So maybe there's just no ceiling. We just need more compute. Now,
I mean, okay. So now I'm speculating as opposed to before when I was completely on firm ground.
Yeah. All right. Um, I don't believe that you can get something that really can do language and use
language as a thing that doesn't interact with people. Like I think that it's not enough to just
take everything that we've said written down and just say, that's enough. You can just learn from
that and you can be intelligent. I think you really need to be pushed back at. I think that
conversations, even people who are pretty smart, maybe the smartest thing that we know, not maybe
not the smartest thing we can imagine, but we get so much benefit out of talking to each other and
interacting. Um, that's presumably why you have conversations live with guests is that, that there's
something in that interaction that would not be exposed by, oh, I'll just write you a story and
then you can read it later. And I think, I think because these systems are just learning from our
stories, they're not learning from being pushed back at by us, that they're fundamentally limited
into what they could actually become on this route. They have to, they have to get, you know,
shut down. Like we, like we have to have an argument that they have to have an argument with us and
lose a couple of times before they start to realize, oh, okay, wait, there's some nuance here that
actually matters. Yeah, that's actually subtle sounding, but quite profound that the interaction
with humans is, is essential. And the limitation within that is profound as well, because the
time scale, like the bandwidth at which you can really interact with humans is very low. So it's
costly. So you can't, one of the underlying things about self plays, it has to do, you know,
a very large number of interactions. And so you can't really deploy reinforcing learning systems
into the real world to interact. Like you couldn't deploy a language model into the real world to
interact with humans, because it would just not get enough data relative to the cost it takes
to interact. Like the time of humans is, is expensive, which is really interesting. That's
the good that takes us back to reinforcement learning and trying to figure out if there's ways to
make algorithms that are more efficient at learning, keep the spirit and reinforcement
learning and become more efficient. In some sense, this seems to be the goal. I'd love to hear what
your thoughts are. I don't know if you got the chance to see it. The blog post called Biddle
Lesson. Oh, yes. By Rich Sutton, that makes an argument, hopefully I can summarize it perhaps,
perhaps you can. Yeah, but good. Okay. So I mean, I could try and you can correct me, which is,
he makes an argument that it seems if we look at the long arc of the history of the artificial
intelligence field, it calls, you know, 70 years, that the algorithms from which we've seen the biggest
improvements in practice are the very simple, like dumb algorithms that are able to leverage
computation. And you just wait for the computation to improve. Like all the academics and so on
have fun by finding little tricks and, and congratulate themselves on those tricks. And
sometimes those tricks can be like big, that feel in the moment, like big spikes and breakthroughs,
but in reality, over the decades, it's still the same dumb algorithm that just waits for the
compute to get faster and faster. Do you find that to be an interesting argument
against the entirety of the field of machine learning as an academic discipline?
That we're really just a subfield of computer architecture. We're just kind of waiting around
for them to do their next thing. We really don't want to do hardware work. So that's right. I really
don't want to think about it. We're procrastinating. Yes, that's right. Just waiting for them to do
their job so that we can pretend to have done ours. So, uh, yeah, I mean, the argument reminds me a
lot of, of, I think it was a Fred Jelenet quote, early computational linguist who said, you know,
we're building these computational linguistic systems. And every time we fire a linguist,
performance goes up by 10%. Something like that. And so the idea of us building the knowledge in,
in that, in that case, um, was much less, he was finding it to be much less successful than get
rid of the people who know about language as a, you know, from a kind of, uh, scholastic
academic kind of perspective and replace them with more compute. And so I think this is kind
of a modern version of that story, which is, okay, we want to do better on machine vision.
You could build in all these, you know, motivated, part-based models that, you know, that just feel
like obviously the right thing that you have to have, or we can throw a lot of data at it and
guess what we're doing better with, with a lot of data. So I hadn't thought about it until this
moment in this way, but what I believe, well, I've thought about what I believe. What I believe is
that, you know, compositionality and what's the right way to say it, the complexity grows rapidly
as you consider more and more possibilities, like explosively. And so far, Moore's law has also been
growing explosively, exponentially. And so, so it really does seem like, well, we don't have to
think really hard about the algorithm design or the way that we build the systems, because
the best benefit we could get is exponential, and the best benefit that we can get from waiting
is exponential, so we can just wait. That's got to end, right? And there's hints now that Moore's
law is starting to feel some friction, starting to, the world is pushing back a little bit.
One thing I don't know, do lots of people know this? I didn't know this. I was trying to
write an essay, and yeah, Moore's law has been amazing, and it's enabled all sorts of things,
but there's also a kind of counter Moore's law, which is that the development cost for each successive
generation of chips also is doubling. So it's costing twice as much money. So the amount of
development money per cycle or whatever is actually sort of constant. And at some point,
we run out of money. So, or we have to come up with an entirely different way of
doing the development process. So, like, I guess I always always a bit skeptical of the look,
it's an exponential curve. Therefore, it has no end. Soon, the number of people going to
neuropse will be greater than the population of the earth. That means we're going to discover
life on other planets. No, it doesn't. It means that we're in a sigmoid curve on the front half,
which looks a lot like an exponential. The second half is going to look a lot like diminishing returns.
Yeah, but the interesting thing about Moore's law, if you actually look at the technologies
involved, it's hundreds, not thousands of S curves stacked on top of each other. It's not
actually an exponential curve. It's constant breakthroughs. And then what becomes useful
to think about, which is exactly what you're saying, the cost of development, like the size of teams,
the amount of resources that are invested in continuing to find new S curves, new breakthroughs.
And yeah, it's an interesting idea. If we live in the moment, if we sit here today,
it seems to be the reasonable thing to say that exponentials end. And yet,
in the software realm, they just keep appearing to be happening. And it's so, I mean, it's so
hard to disagree with Elon Musk on this because I used to be one of those folks. I'm still one of
those folks. I studied autonomous vehicles. This is what I worked on. And it's like, you look
at what Elon Musk is saying about autonomous vehicles, well, obviously in a couple of years
or in a year or next month, we'll have fully autonomous vehicles. There's no reason why
we can't. Driving is pretty simple. Like it's just a learning problem. And you just need to convert
all the driving that we're doing into data and just having you know, with the trains and that data.
And like we use only our eyes, so you can use cameras and you can train on it. And it's like,
yeah, that's that what that should work. And then you put that hat on like the philosophical hat.
And but then you put the pragmatic hat and it's like, this is what the flaws of computer vision
are. Like this is what it means to trans scale. And then you you put the human factors, the psychology
hat on, which is like, it's actually driving us a lot, the cognitive science or cognitive whatever
the heck you call it, is it's really hard. It's much harder to drive than then we realize there's
a much larger number of edge cases. So building up an intuition around this is around exponential
is really difficult. And on top of that, the pandemic is making us think about exponentials,
making us realize that like, we don't understand anything about it, we're not able to intuit
exponentials, we're either ultra terrified, some part of the population and some part is like
the opposite of whatever the different carefree. And we're not managing about blasé.
Well, wow, that's French. So it's got an accent. So it's, it's fascinating to think what, what
the limits of this exponential growth of technology, not just Moore's law, it's technology,
how that rubs up against the bitter lesson and GPT three and self playing mechanisms,
that it's not obvious. I used to be much more skeptical about neural networks, now at least
give us slither possibility that will be all though will be very much surprised and also,
you know, caught in a way that like we are not prepared for, like in applications of
social networks, for example, because it feels like really good transformer models
that are able to do some kind of like very good natural language generation are the same kind
of models that can be used to learn human behavior and then manipulate that human behavior to gain
advertisers dollars and all those kinds of things, if the capitalist system and
they arguably already are manipulating human behavior. Yeah. So, but not for self preservation,
which I think is a big, that would be a big step. Like if they were trying to manipulate us to
convince us not to shut them off, I would be very freaked out. But I don't see a path to that from
where we are now. They, they, they don't have any of those abilities. That's not what they're trying
to do. They're trying to keep people on, on the site. But see the thing is this, this is the
thing about life on earth is they might be borrowing our consciousness and sentience. Like, so like
in the sense they do, because the creators of the algorithms have, like they're not, you know,
if you look at our body, we're not a single organism, we're a huge number of organisms with
like tiny little motivations, we're built on top of each other. In the same sense, the AI algorithms
that they're not like a system that includes human companies and corporations, right? Because
corporations are funny organisms in and of themselves that really do seem to have self
preservation built in. And I think that's at the, at the design level, I think the design to have
self preservation be a focus. So you're right in that, in that broader system that we're also a
part of and can have some influence on, it's, it's, it is much more complicated, much more powerful.
Yeah, I agree with that. So people really love it when I ask what three books, technical, philosophical,
fiction had a big impact on your life, maybe you couldn't recommend. We went with movies,
we went with Billy Joel, and I forgot what music you recommended, but. I didn't, I just said I
have no taste in music, I just like pop music. That was actually really skillful the way you
avoided that question. Thank you, thanks. I was, I'm going to try to do the same with the books. So
do you have a skillful way to avoid answering the question about three books you would recommend?
I'd like to tell you a story. So my first job out of college was at Bellcore, I mentioned that
before, where I worked with Dave Ackley. The head of the group was a guy named Tom Landauer, and I
don't know how well known he's known now, but arguably he's the, he's the inventor and the
first proselytizer of word embeddings. So they developed a system shortly before I got to the
group, yeah, that, that called latent semantic analysis that would take words of English and
embed them in, you know, multi-hundred dimensional space, and then use that as a way of, you know,
assessing similarity and basically doing reinforcement learning, not sorry, not reinforcement,
information retrieval, you know, sort of pre-Google information retrieval. And he was trained as an
anthropologist, but then became a cognitive scientist. I was in the cognitive science research
group. It's, you know, like I said, I'm a cognitive science groupie. At the time, I thought I'd become
a cognitive scientist, but then I realized in that group, no, I'm a computer scientist, but I'm a
computer scientist who really loves to hang out with cognitive scientists. And he said, he studied
language acquisition in particular. He said, you know, humans have about this number of words of
vocabulary and most of that is learned from reading. And I said, that can't be true because I have a
really big vocabulary and I don't read. He's like, you must. I'm like, I don't think I do. I mean,
like stop signs, I definitely read stop signs, but like reading books is not, is not a thing that I
do a lot of. Do you read though? It might be just, maybe the red color. Do I read stop signs?
Yeah. No, it's just pattern recognition at this point. I don't sound it out.
Stop. So now I do, I wonder what that, oh yeah, stop the guns. So,
that's fascinating. So you don't, so I don't read very, I mean, obviously I read and I've read,
I've read plenty of books, but like some people like Charles, my friend Charles and others,
like a lot of people in my field, a lot of academics, like reading was really a central
topic to them in development. And I'm not that guy. In fact, I used to joke that when I got into
college that it was on kind of a help out the illiterate kind of program because I got to,
like I, in my house, I wasn't a particularly bad or good reader, but when I got to college,
I was surrounded by these people that were just voracious in their reading appetite. And they
would like, have you read this? Have you read this? Have you read this? And I'd be like, no,
I'm clearly not qualified to be at this school. Like there's no way I should be here. Now I've
discovered books on tape, like audio books. And so I'm much better. I'm more caught up. I read a
lot of books. A small tangent on that. It is a fascinating open question to me on the topic of
driving, whether, you know, supervised learning people, machine learning people think you have to
like drive to learn how to drive. To me, it's very possible that just by us humans, by first of all,
walking, but also by watching other people drive, not even being inside cars as a passenger. But
let's say being inside the car as a passenger, but even just like being a pedestrian and crossing
the road, you learn so much about driving from that. It's very possible that you can, without ever
being inside a car, be okay at driving once you get in it. Or like watching a movie, for example.
I don't know, something like that. Have you, have you taught anyone to drive? No. So, I have two
children and I learned a lot about car driving because my wife doesn't want to be the one in
the car while they're learning. So that's my job. So I sit in the passenger seat and it's really
scary. I have, you know, I have wishes to live. And they're, you know, they're figuring things out.
Now, they start off very, very much better than I imagine like a neural network would, right?
They get that they're seeing the world. They get that there's a road that they're trying to be on.
They get that there's a relationship between the angle of the steering. But it takes a while to
not be very jerky. And so that happens pretty quickly. Like the ability to stay in lane at
speed, that happens relatively fast. It's not zero shot learning, but it's pretty fast.
The thing that's remarkably hard, and this is I think partly why self-driving cars are really
hard, is the degree to which driving is a social interaction activity. And that blew me away.
I was completely unaware of it until I watched my son learning to drive. And I was realizing
that he was sending signals to all the cars around him. And those, in his case, he's always had
social communication challenges. He was sending very mixed confusing signals to the other cars.
And that was causing the other cars to drive weirdly and erratically. And there was no question in
my mind that he would, he would have an accident because they didn't know how to read him. There's
things you do with the speed that you drive, the positioning of your car, that you're constantly
like in the head of the other drivers and seeing him not knowing how to do that and having to be
taught explicitly. Okay, you have to be thinking about what the other driver is thinking was a
revelation to me. I was stunned. It's a creating kind of theories of mind of the other.
Theories of mind of the other cars. Yeah. Yeah. Which I just hadn't heard discussed in the self-driving
car talks that I've been to. Since then, there's some people who do consider those kinds of issues,
but it's way more subtle than I think. There's a little bit of work involved with that when you
realize, like when you especially focus not on other cars, but on pedestrians, for example,
it's literally staring you in the face. Yeah, yeah, yeah. So then when you're just like,
how do I interact with pedestrians? Like pedestrians, you're practically talking to an
octopus at that point. They've got all these weird degrees of freedom. You don't know what
they're going to do. They can turn around any second. But the point is we humans know what
they're going to do. Like we have a good theory of mind. We have a good mental model of what
they're doing. And we have a good model of the model that have a view and the model of the
model of the model. Like we're able to kind of reason about this kind of the social game of it.
The hope is that it's quite simple, actually, that it could be learned. That's why I just talked
to the Waymo. I don't know if you know of that company. It's the Google self-driving car.
They, I talked to their CTO about this podcast and they like I rode in their car and it's quite
aggressive and it's quite fast and it's good and it feels great. It also just like Tesla,
Waymo made me change my mind about like maybe driving is easier than I thought. Maybe I'm just
being speciest human. Maybe it's a speciest argument. Yeah, so I don't know. But it's
fascinating to think about like the same as with reading, which I think you just said,
you avoided the question, but I still hope you answered someone. You avoided it brilliantly.
It is, there's blind spots as artificial intelligence that artificial intelligence
researchers have about what it actually takes to learn to solve a problem. That's fascinating.
Have you had Anka Dragan on? Yeah. Okay. She's one of my favorites. So much energy.
She's amazing. And in particular, she thinks a lot about this kind of,
I know that you know that I know kind of planning. And the last time I spoke with her,
she was very articulate about the ways in which self-driving cars are not solved,
like what's still really, really hard. But even her intuition is limited. We're all
like new to this. So in some sense, the Elon Musk approach of being ultra confident and just like
plowing. Putting it out there. Putting it out there. Like some people say it's reckless and dangerous
and so on. But partly it seems to be one of the only ways to make progress in artificial
intelligence. So these are difficult things. Democracy is messy. Implementation of artificial
intelligence systems in the real world is messy. So many years ago, before self-driving cars were
an actual thing you could have a discussion about. Somebody asked me, what if we could use
that robotic technology and use it to drive cars around? Like, aren't people going to be killed?
And then it's, you know, blah, blah, blah. I'm like, that's not what's going to happen. I said with
confidence incorrectly, obviously. What I think is going to happen is we're going to have a lot more,
like a very gradual kind of rollout where people have these cars in like closed communities,
right? Where it's somewhat realistic, but it's still in a box, right? So that we can really get
a sense of what, what are the weird things that can happen? How do we, how do we have to change
the way we behave around these vehicles? Like it's, it's obviously requires a kind of co-evolution
that you can't just plop them in and see what happens. But of course, we're basically popping
them in and see what happens. So I was wrong, but I do think that would have been a better plan.
So that's, but your intuition is funny, just zooming out and looking at the forces of capitalism.
And it seems that capitalism rewards risk takers and rewards and punishes risk takers like it,
and like, try it out. The academic approach to let's try a small thing and try to understand
slowly the fundamentals of the problem. And let's start with one and do two and then see that and
then do the three. You know, the, the capitalists like startup entrepreneurial dream is let's build
a thousand and let's, right? And 500 of them fail, but whatever the other 500 we learned from them.
But if you're good enough, I mean, one thing is like your intuition would say like,
that's going to be hugely destructive to everything. But actually it's kind of the,
the, the forces of capitalism, like people are quite easy to be critical. But if you actually
look at the data at the, at the way our world has progressed in terms of the quality of life,
it seems like the competent good people rise to the top. This is coming from me from the Soviet
Union and so on. It's like, it's interesting that somebody like Elon Musk is the way you push
progress and artificial intelligence. Like it's forcing way more to step their, their stuff up
and way more is forcing Elon Musk to step up. It's fascinating. Cause I have this tension in,
in my heart and just being upset by the lack of progress in autonomous vehicles within academia.
So there's huge progress in the early days of the DARPA challenges. And then it just kind of stopped
like at MIT, but it's true everywhere else with, with an exception of a few sponsors here and there
is like, it's not seen as a sexy problem. Autonomous, like the moment artificial intelligence
starts approaching the problems of the real world, like academics kind of like, all right,
let, let the, they get really hard in a different way, in a different way. And that's right. I
think, yeah, right. Some of us are not excited about that other way, but I still think there's
fundamentals problems to be solved in those difficult things. It's not, it's still publishable,
I think, like we just need to, it's the same criticism you could have of all these conferences
in Europe, the CVPR, where application papers are often as powerful and as important as like
theory paper, even like theory just seems much more respectable and so on. I mean,
in machine learning communities changing that a little bit, I mean, at least in statements,
but it's, it's still not seen as the sexiest of pursuits, which is like, how do I actually make
this thing work in practice as opposed to on this toy data set? All that to say, are you still
avoiding the three books question? Is there something on audiobook that you can recommend?
Oh, I've, yeah, I mean, I've, yeah, I've read a lot of really fun stuff. In terms of books that I
find myself thinking back on that I read a while ago, like the test of time to some degree, I find
myself thinking of program or B programmed a lot by Douglas Roschkaup, which was, it basically
put out the premise that we all need to become programmers in one form or another. And it was
in analogy to once upon a time, we all had to become readers, we had to become literate. And
there was a time before that when not everybody was literate, but once literacy was possible,
the people who were literate had more of a say in society than the people who weren't. And so we
made a big effort to get everybody up to speed. And now it's, it's not 100% universal, but it's
quite widespread. Like the assumption is generally that people can read. The analogy that he makes
is that programming is a similar kind of thing that, that we need to have a say in, right? So
being a reader, being literate, being a reader means you can receive all this information,
but you don't get to put it out there. And programming is the way that we get to put it
out there. And that was the argument they made. I think he specifically has now
backed away from this idea. He doesn't think it's happening quite this way. And that might be true
that it didn't, society didn't sort of play forward quite that way. I still believe in the
premise. I still believe that at some point, we have the relationship that we have to these
machines and these networks has to be one of each individual can, has the wherewithal to
make the machines help them. Do, do the things that that person once done. And as, you know,
as software people, we know how to do that. And when we have a problem, we're like, okay, I'll just,
I'll hack up a pulse grip or something to make it. So if we lived in a world where everybody
could do that, that would be a better world. And computers would be have, I think less sway over
us. And other people's software would have less sway over us as a group.
Yeah, in some sense, software engineering, programming is power.
Programming is power, right? It's, yeah, it's like magic. It's like magic spells. And, and it's not
out of reach of everyone. But at the moment, it's just a sliver of the population who can,
who can commune with machines in this way. So I don't know. So that book had a big,
big impact on me. Currently, I'm reading The Alignment Problem actually by Brian Christian.
So I don't know if you've seen this out there yet.
Is this similar to Russell's work with the control problem?
It's in that same general neighborhood. I mean, they take, they have different
emphases that they're, they're concentrating on. I think, I think Stuart's book
did a remarkably good job. Like a, just a celebratory good job at describing AI technology
and sort of how it works. I thought that was great. It was really cool to see that in a book.
Yeah. I think he has some experience writing some books.
That's, you know, that's probably a possible thing. He's maybe thought a thing or two about
how to explain AI to people. Yeah. Yeah. That's a really good point. This book so far has been
remarkably good at telling the story of the, sort of the history, the recent history
of some of the things that have happened. This, I'm in the first third. He said this book is in
three thirds. The first third is essentially AI fairness and, you know, implications of AI on
society that we're seeing right now. And that's been great. I mean, he's telling those stories
really well. He's, he went out and talked to the frontline people who, whose names were associated
with some of these ideas and it's been terrific. He says the second half of the book is on
reinforcement learning. So maybe that'll be fun. And then the third half, third third, is on the
superintelligence alignment problem. And I suspect that that part will be less fun for me to read.
Yeah. Yeah. It's, it's an interesting problem to talk about. I find it to be the most interesting,
just like thinking about whether we live in a simulation or not, as a, as a thought experiment
to think about our own existence. So in the same way, talking about alignment problem with AGI
is a good way to think, similar to like the trolley problem with autonomous vehicles.
It's a useless thing for engineering, but it's the, it's a nice little thought experiment for
actually thinking about what are like our own human ethical systems, our moral systems to, to, to,
by thinking how we engineer these things, you start to understand yourself.
So sci-fi can be good at that too. So one sci-fi book to recommend is Exhalations by Ted Chiang,
a bunch of short stories. This, Ted Chiang is the guy who wrote the short story that became the
movie Arrival. And all of his stories, just from a, he was a computer scientist, actually he studied
at Brown. They all have this sort of really insightful bit of science or computer science that
drives them. And so it's just a romp, right? To just like, he creates these artificial worlds with
these, by extrapolating on these ideas that, that we know about, but hadn't really thought through
to this kind of conclusion. And so his stuff is, it's really fun to read. It's mind-warping.
So I'm not sure if you're familiar. I seem to mention this every other word,
is I'm from the Soviet Union and I'm Russian. Way too much to speak.
My roots are Russian too, but a couple of generations back.
Well, it's probably in there somewhere. So maybe we can, we can pull at that thread a little bit
of the existential dread that we all feel. You mentioned that you, I think somewhere in the
conversation you mentioned that you don't really pretty much like dying. I forget in which context
it might have been a reinforcement learning perspective. I don't know. I know, you know,
what it was, it was in teaching my kids to drive. That's, that's how you face your mortality. Yes.
From a human beings perspective, or from a reinforcement learning researchers perspective,
let me ask you the most absurd question. What's, what do you think is the meaning of this whole
thing, the meaning of life on this spinning rock? I mean, I think reinforcement learning
researchers maybe think about this from a science perspective more often than a lot of other people.
Right. As a supervised learning person, you're probably not thinking about the sweep of
a lifetime, but reinforcement learning agents are having little lifetimes, little weird little
lifetimes. And it's, it's hard not to project yourself into their world sometimes. But, you know,
as far as the meaning of life, so I, when I turned 42, you may know from, that's a, that is a book I
read, the, the, the, that is the meaning of life. So when I turned 42, I had a meaning of life party
where I invited people over and everyone shared their meaning of life. We, they, we had slides
made up. And so we had, we all sat down and did a slide presentation to each other about the
meaning of life. And mine, mine was balance. I think that life is balance. And so the activity
at the party for 42 year old, maybe this is a little bit non-standard, but I found all the
little toys and devices that I had that where you had to balance on them, you had to like stand on
it and balance or Pogo stick I brought, a ripstick, which is like a weird two-wheeled skateboard.
I got a unicycle, but I didn't know how to, I didn't know how to do it. I now can do it.
I would love watching you try.
Yeah, I'll send you a video. I'm not great, but I, but, but I managed. And so, so balance. Yeah. So,
so my, my wife has a really good one that she sticks to and is probably pretty accurate. And
it has to do with healthy relationships with people that you love and working hard for good causes.
But to me, yeah, balance, balance in a word. That's, that, that works for me. Not too much of
anything because too much of anything is iffy. That feels like a Rolling Stones song. I feel
like there must be. You can't always get what you want. But if you try sometimes, you can strike a
balance. Yeah, I think that's how it goes. Michael, I'll write your parody. It's a huge honor to
talk to you. This is really fun. I've been a big fan of yours. So can't, can't wait to see what you
do next in the world of education, the world of parody in the world of reinforcement learning.
Thanks for talking to me. My pleasure. Thank you for listening to this conversation with Michael
Littman. And thank you to our sponsors, Simply Safe, a home security company I use to monitor and
protect my apartment, ExpressVPN, the VPN I've used for many years to protect my privacy on the
internet, masterclass, online courses that I enjoy from some of the most amazing humans in history,
and better help online therapy with a licensed professional. Please check out these sponsors
in the description to get a discount and to support this podcast. If you enjoy this thing,
subscribe on YouTube, review it with five stars on a podcast, follow on Spotify,
support on Patreon, or connect with me on Twitter at Lex Friedman. And now let me leave you some
words from Groucho Marx. If you're not having fun, you're doing something wrong. Thank you for
listening and hope to see you next time.