This graph shows how many times the word ______ has been mentioned throughout the history of the program.
The following is a conversation with Dilip George, a researcher at the intersection of
neuroscience and artificial intelligence, co-founder of Vicarious with Scott Phoenix,
and formerly co-founder of New Menta with Jeff Hawkins, who has been on this podcast, and Dana
Dubinsky. From his early work on hierarchical temporal memory to recursive cortical networks
to today, Dilip's always sought to engineer intelligence that is closely inspired by the
human brain. As a side note, I think we understand very little about the fundamental principles
underlying the function of the human brain, but the little we do know gives hints that may be more
useful for engineering intelligence than any idea in mathematics, computer science, physics,
and scientific fields outside of biology. And so the brain is a kind of existence proof that says
it's possible, keep at it. I should also say that Brain Inspired AI is often overhyped and use this
fodder just as quantum computing for marketing speak, but I'm not afraid of exploring these
sometimes overhyped areas since where there's smoke, there's sometimes fire.
Quick summary of the ads, three sponsors, Babel, Raycon Earbuds, and Masterclass. Please consider
supporting this podcast by clicking the special links in the description to get the discount.
It really is the best way to support this podcast. If you enjoy this thing, subscribe on YouTube,
review it with five stars on Apple Podcast, support on Patreon, I'll connect with me on Twitter,
and Lex Friedman. As usual, I'll do a few minutes of ads now and never any ads in the middle that
can break the flow of the conversation. This show is sponsored by Babel, an app and website
that gets you speaking in a new language within weeks. Go to Babel.com and use code Lex to get
three months free. They offer 14 languages, including Spanish, French, Italian, German,
and yes, Russian. Daily lessons are 10 to 15 minutes, super easy, effective,
designed by over 100 language experts. Let me read a few lines from the Russian poem
Noch Ulitsa Fanar Abteka by Alexander Block that you'll start to understand if you sign up to Babel.
Now, I say that you'll only start to understand this poem because Russian starts with a language
and ends with the vodka. Now, the latter part is definitely not endorsed or provided by Babel
and will probably lose me this sponsorship. But once you graduate from Babel, you can enroll my
advanced course of late night Russian conversation over vodka. I have not yet developed enough for
that. It's in progress. So get started by visiting Babel.com and use code Lex to get three months free.
This show is sponsored by Raycon earbuds. Get them at buyraycon.com slash Lex. They become my main
method of listening to podcasts, audiobooks and music when I run, do push-ups and pull-ups, or
just living life. In fact, I often listen to brown noise with them when I'm thinking deeply about
something it helps me focus. They're super comfortable, pair easily, great sound, great bass,
six hours of playtime. I've been putting in a lot of miles to get ready for a potential ultramarathon
and listening to audiobooks on World War II. The sound is rich and really comes in clear. So again,
get them at buyraycon.com slash Lex. This show is sponsored by masterclass. Sign up at masterclass.com
slash Lex to get a discount and to support this podcast. When I first heard about masterclass,
I thought it was too good to be true. I still think it's too good to be true. For 180 bucks a year,
you get an all-access pass to watch courses from, to list some of my favorites, Chris Hadfield on
Space Exploration, Neil LaGrasse Tyson on Scientific Thinking and Communication, Will Wright,
creator of SimCity and Sims on Game Design. Every time I do this read, I really want to play
a city-builder game. Carlos Santana on Guitar, Gary Kasparov on Chess, Daniel Lagrano on Poker
and many more. Chris Hadfield explaining how rockets work and the experience of being launched into
space alone is worth the money. By the way, you can watch it on basically any device. Once again,
sign up at masterclass.com to get a discount and to support this podcast. And now, here's my
conversation with Dileep George. Do you think we need to understand the brain in order to build it?
Yes, if you want to build the brain, we definitely need to understand how it works. So Blue Brain
or Henry Markrum's project is trying to build the brain without understanding it, like, you know,
just trying to put details of the brain from neuroscience experiments into a giant simulation
by putting more and more neurons, more and more details. But that is not going to work because
when it doesn't perform as what you expect it to do, then what do you do? You just keep adding
more details. How do you debug it? So unless you understand, unless you have a theory about
how the system is supposed to work, how the pieces are supposed to fit together,
what they're going to contribute, you can't build it.
At the functional level, understand. So can you actually link around and describe the Blue Brain
project? It's kind of a fascinating principle and idea to try to simulate the brain. We're talking
about the human brain, right? Right. Human brains and rat brains or cat brains have lots in common
that the cortex, the neocortex structure is very similar. So initially, they were trying to
to just simulate a cat brain. And to understand the nature of evil.
To understand the nature of evil. Or as it happens in most of these simulations,
you easily get one thing out, which is oscillations. Yeah, if you simulate a large number of neurons,
they oscillate. And you can adjust the parameters and say that, oh, oscillation
match the rhythm that we see in the brain, etc. Oh, I see. So the idea is,
is the simulation at the level of individual neurons?
Yeah. So the Blue Brain project, the original idea as proposed was, you put very detailed
biophysical neurons, biophysical models of neurons, and you interconnect them according to the
statistics of connections that we have found from real neuroscience experiments. And then
turn it on and see what happens. And these neural models are incredibly complicated in
themselves, right? Because these neurons are modeled using this idea called Hodgkin-Huxley
models, which are about how signals propagate in a cable. And there are active dendrites,
all those phenomena, which those phenomena themselves, we don't understand that well.
And then we put in connectivity, which is part guesswork, part observed. And of course,
if you do not have any theory about how it is supposed to work, we just have to take whatever
comes out of it as, okay, this is something interesting. But in your sense, like these
models of the way signal travels along, like with axons and all the basic models, that's
they're too crude. Oh, well, actually, they are pretty detailed and pretty sophisticated.
And they do replicate the neural dynamics. If you take a single neuron, and you try to
turn on the different channels, the calcium channels and the different receptors,
and see what the effect of turning on or off those channels are in the neuron's spike output,
people have built pretty sophisticated models of that. And they are, I would say,
you know, in the regime of correct.
Well, see, the correctness, that's interesting, because you mentioned it several levels.
The correctness is measured by looking some kind of aggregate statistics.
It would be more of the spiking dynamics of a single neuron.
Yeah. And yeah, these models, because they are going to the level of mechanism, right?
So they are basically looking at, okay, what is the effect of turning on an ion channel?
And you can model that using electric circuits. And then so they are modeled,
so it is not just a function fitting, it is people are looking at the mechanism underlying it,
and putting that in terms of electric circuit theory, signal propagation theory,
and modeling that. And so those models are sophisticated, but getting a single neuron's
model 99% right does not still tell you how to, you know, it would be the analog of
getting a transistor model right, and now trying to build a microprocessor.
And if you just observe, you know, if you did not understand how a microprocessor works,
but you say, oh, I have, I know, can model one transistor well?
And now I will just try to interconnect the transistors according to whatever I could,
you know, guess from the experiments and try to simulate it.
Then it is very unlikely that you will produce a functioning microprocessor.
You want to, you know, when you want to produce a functioning microprocessor,
you want to understand Boolean logic, how does, how do the gates work, all those things,
and then, you know, understand how do those gates get implemented using transistors?
Yeah, there's actually, I remember this reminds me, there's a paper, maybe you're familiar with it,
that I remember going through in a reading group that approaches a microprocessor from a
perspective of a neuroscientist. I think it basically, it uses all the tools that we have
of neuroscience to try to understand, like as if we just aliens showed up to study computers,
and to see if those tools could be used to get any kind of sense of how the microprocessor works.
And I think the final, the takeaway from at least this initial exploration is that
we're screwed. There's no way that the tools of neuroscience would be able to get us to anything,
like not even Boolean logic. I mean, it's just any aspect of the architecture of the
function of the processes involved, the clocks, the timing, all that, you can't figure that out
from the tools of neuroscience. Yeah, so I'm very familiar with this particular paper. I think it
was called, can a neuroscientist understand a microprocessor or something like that?
Following the methodology in that paper, even an electrical engineer would not understand
microprocessors. So I don't think it is that bad in the sense of saying,
neuroscientists do find valuable things by observing the brain. They do find good insights,
but those insights cannot be put together just as a simulation. You have to investigate
what are the computational underpinnings of those findings? How do all of them fit together from
an information processing perspective? You have to, somebody has to painstakingly put those things
together and build hypothesis. So I don't want to, this all of neuroscience are saying, oh, they're
not finding anything. No, that paper almost went to that level of neuroscientists will never
understand. No, that's not true. I think they do find lots of useful things, but it has to be
put together in a computational framework. Yeah, but the AI systems will be listening to this
podcast a hundred years from now, and there will probably, there's some non-zero probability
they'll find your words laughable. There's like, I remember humans thought they understood
something about the brain that are totally clueless. There's a sense about neuroscience
that we may be in the very, very early days of understanding the brain, but I mean,
that's one perspective. In your perspective, how far are we into understanding
any aspect of the brain? So the dynamics of the individual neuron communication to the
how in a collective sense, how they're able to store information, transfer information,
how the intelligence then emerges, all that kind of stuff. Where are we on that timeline?
Yeah. So timelines are very, very hard to predict, and you can, of course, be wrong.
And it can be wrong on either side. We know that when we look back, the first flight was in 1903.
In 1900, there was a New York Times article on flying machines that do not fly. And humans
might not fly for another 100 years. That was what that article stated. And so, but no,
they flew three years after that. So it is, it's very hard to... So...
Well, and on that point, one of the Wright brothers, I think two years before said that, like he said,
like some number like 50 years, he has become convinced that it's impossible.
Even during their experimentation. Yeah. Yeah. I mean, that's a tribute to when
that's like the entrepreneurial battle of like depression of going through just like thinking
that this is impossible. But there, yeah, there's something even the person that's in it is not
able to see, estimate correctly. Exactly. But I can tell from the point of, objectively,
what are the things that we know about the brain and how that can be used to build AI models,
which can then go back and inform how the brain works. So my way of understanding the brain would
be to basically say, look at the insights neuroscientists have found, understand that from a
computational angle, information processing angle, build models using that. And then building that
model, which functions, which is a functional model, which is doing the task that we want
the model to do. It is not just trying to model a phenomena in the brain. It is trying to do what
the brain is trying to do on the whole functional level. And building that model will help you
fill in the missing pieces that biology just gives you the hints. And building the model
fills in the rest of the pieces of the puzzle. And then you can go and connect that back to
biology and say, okay, now it makes sense that this part of the brain is doing this or this
layer in the cortical circuit is doing this. And then continue this iteratively, because now
that will inform new experiments in neuroscience. And of course, building the model and verifying
that in the real world will also tell you more about does the model actually work? And you can
refine the model, find better ways of putting these neuroscience insights together. So I would say
it is, you know, so neuroscientists alone, just from experimentation, will not be able to build a
model of the brain or a functional model of the brain. So there's lots of efforts, which are
very impressive efforts in collecting more and more connectivity data from the brain. How are the
micro circuits of the brain connected with each other? Those are beautiful, by the way.
Those are beautiful. And at the same time, those do not itself, by themselves, convey the story of
how does it work? And somebody has to understand, okay, why are they connected like that? And
what are those things doing? And we do that by building models in AI, using hints from neuroscience
and repeat the cycle. So what aspect of the brain are useful in this whole endeavor, which, by the
way, I should say, you're both a neuroscientist and AI person. I guess the dream is to both
understand the brain and to build AGI systems. So it's like an engineer's perspective of trying
to understand the brain. So what aspects of the brain function is speaking, like you said,
you find interesting? Yeah, quite a lot of things. So one is, if you look at the visual cortex,
and visual cortex is a large part of the brain, I forgot the exact fraction, but it is a huge part
of our brain area is occupied by just vision. So vision, visual cortex is not just a feed-forward
cascade of neurons. There are a lot more feedback connections in the brain compared to the feed-forward
connections. And it is surprising to the level of detail neuroscientists have actually studied
this. If you go into neuroscience literature and poke around and ask, have they studied what will
be the effect of poking a neuron in level IT in level V1? And have they studied that? And you
will say, yes, they have studied that. Every possible combination. I mean, it's not random
exploration at all. It's very hypothesis driven. Experimental neuroscientists are very systematic
in how they probe the brain because experiments are very costly to conduct. They take a lot of
preparation. They need a lot of control. So they are very hypothesis driven in how they probe the
brain. And often what I find is that when we have a question in AI about, has anybody probed how
lateral connections in the brain works? And when you go and read the literature, yes,
people have probed it and people have probed it very systematically. And they have hypothesis about
how those lateral connections are supposedly contributing to visual processing. But of course,
they haven't built very, very functional detailed models of it. By the way, how do you know studies
side to interrupt? Do they stimulate like a neuron in one particular area of the visual cortex
and then see how the signal travels kind of thing? Fascinating, very, very fascinating
experiments. So I can give you one example I was impressed with. So before going to that,
let me give you a overview of how the layers in the cortex are organized.
Right. Visual cortex is organized into roughly four hierarchical levels. Okay, so V1, V2, V4,
IT. And in V1. What happened to V3? Well, yeah, there's another pathway. Okay, so there is this,
I'm talking about just object recognition pathway. And then in V1 itself, so it's,
there is a very detailed micro circuit in V1 itself, that is, that is organization within a
level itself. The cortical sheet is organized into multiple layers. And there are columnar structure.
And this layer wise and columnar structure is repeated and V1, V2, V4, IT, all of them.
And the connections between these layers within a level, in V1 itself, there are six layers
roughly. And the connections between them, there is a particular structure to them.
And now, so one example of an experiment people did is when I, when you present a stimulus,
which is, let's say, requires separating the foreground from the background of an object.
So it is a textured triangle on a textured background. And you can check, does the surface
settle first? Or does the contour settle first? Settle? Settle in the sense that the, so when
you finally form the percept of the, of the triangle, you understand where the contours of
the triangle are, and you also know where the inside of the triangle is, right? That's when
you form the final percept. Now, you can ask, what is the dynamics of forming that final percept?
Do the, do the neurons first find the edges and converge on where the edges are? And then
they find the inner surfaces? Or does it go the other way around? So what's the answer?
In this case, it turns out that it first settles on the edges. It converges on the edge hypothesis
first. And then the, the surfaces are filled in from the edges to the inside. And, and
the detail to which you can study this, it's, it's amazing that you can actually not only find
the temporal dynamics of when this happens. And then you can also find which layer in the,
you know, in V1, which layer is encoding the edges, which layer is encoding the surfaces,
and which layer is encoding the feedback, which layer is encoding the feed forward,
and what, what's the combination of them that produces the final person. And these kinds of
experiments stand out when you try to explain illusions. One, one example of a favorite illusion
of mine is the Kanitsa Triangle. I don't know that you are familiar with this one. So this is,
this is an example where it's a triangle, but, you know, the corners of the, only the corners
of the triangle are shown in the stimuli, the stimulus. So they look like kind of Pac-Man.
Oh, the black Pac-Man. Yeah.
And then you start to see your visual system hallucinates the edges. Yeah.
And you can, you know, you, when you look at it, you will see a faint edge, right? And you can go
inside the brain and look, you know, do actually neurons signal the presence of this edge.
And, and if they signal, how do they do it? Because they are not receiving anything from
the input. The input is black for those neurons, right? So how do they signal it? When does the
signaling happen? You know, does it, you know, so, so if a real contour is present in the input,
then the, you know, the neurons immediately signal, okay, there is a, there is an edge here.
When, when it is an illusory edge, it is clearly not in the input. It is coming from the context.
So those neurons fire later and, and you can say that, okay, these are, it's the feedback
connections that is causing them to fire. And, and they happen later and you can find the dynamics
of them. So, so these studies are pretty impressive and, and very detailed.
So by the way, just a step back, you said that there may be more feedback connections
than feed forward connections. Yeah. First of all, just for like a machine learning
folks. Yeah. I mean, that, that's crazy that there's all these feedback connections. I mean,
we often think about, I think, thanks to deep learning, you start to think about the human
brain as a kind of feed forward mechanism. Right. So what the heck are these feedback connections?
Yeah. What's their, what's the dynamics? Well, what are we supposed to think about them?
Yeah. So this is, this fits into a very beautiful picture about how the brain works, right? So
the, the beautiful picture of how the brain works is that our brain is building a model of the world.
I know. So our visual system is building a model of how objects behave in the world. And, and we
are constantly projecting that model back onto the world. So what we are seeing is not just a
feed forward thing that just gets interpreted in a forward part. We are constantly projecting our
expectations onto the world. And, and what the final percept is a combination of what we project
onto the world, uh, combined with what the actual sensory input is. Almost like trying to calculate
the difference and then trying to interpret the difference. Yeah. It's, it's, um, I wouldn't put
it as calculating the difference. It's more like, what is the best explanation for the input stimulus
based on the model of the world I have. Got it. Got it. And that's where all the illusions come in.
And that's, but that's, that's an incredibly efficient, so efficient process. So the feedback
mechanism, it just helps you constantly. Uh, yeah. So hallucinate how the world should be based on
your world model. And then just looking at, uh, if there's novelty, uh, like trying to explain it,
like that hence that's why movement would detect movement really well. There's all these kinds
of things. And this is like at all different levels of the cortex you're saying that this
happens at the lowest level, the highest level. Yes. Yeah. In fact, feedback connections are more
prevalent in everywhere in the cortex. And, and, um, so one way to think about it, and there's a lot
of evidence for this is inference. Um, so, you know, so basically if you have a model of the world
and when, when some evidence comes in, what you are doing is inference, right? You are trying to
now explain this evidence using your model of the world. And this inference includes
projecting your model onto the evidence and taking the evidence, uh, back into the model and, and
doing an iterative procedure. Um, and, uh, this iterative procedure is what happens
using the feed forward feedback propagation. Uh, and feedback affects what you see in the world.
And, you know, it also affects feed forward propagation and examples are everywhere. We,
we see these kinds of things everywhere. The idea that there can be multiple competing hypothesis,
uh, in our model, trying to explain the same evidence. And then you have to kind of make
them compete. And one hypothesis will explain away the other hypothesis through this competition
process. Wait, what? So you have competing models of the world that try to explain, what do you
mean by explain away? So this is a classic example in, uh, uh, graphical models, probabilistic models.
Um, so if you, uh, what are those? Um, okay. Um, I think it's useful to mention because we'll talk
about them more. Yeah. Yeah. So neural networks are one class of machine learning models. Um,
you know, you have distributed set of, uh, nodes, which are called the neurons, you know,
each one is doing a dot product and you can, you can approximate any function using this,
uh, multi-level, uh, uh, network of neurons. So that's, uh, uh, a class of models which are
used for, useful for function approximation. There is another class of models in machine
learning, uh, called probabilistic graphical models. And you can think of them as each
node in that model is variable, which is, which is talking about something, you know,
it can be a variable representing is, is an edge present in the input or not. Uh, and at the top
of the, uh, uh, network, uh, node can be, uh, representing, is there an object present in
the, uh, world or not? And, and then, so it can, it is, it is another way of encoding knowledge.
And, uh, um, and then you, once you encode the knowledge, you can, uh, do inference in the
right way. You know, how, what is the best way to, uh, you know, explain some set of evidence
using this model that you encoded, you know. So when you encode the model, you are encoding the
relationship between these different variables. How is the edge connected to my, uh, the model
of the object? How is the surface connected to the model of the object? Um, and then, um,
of course, this is a very distributed, complicated model. And inference is how do you explain a
piece of evidence when, when a set of stimulus comes in? If somebody tells me there is a 50%
probability that there is an edge here in this part of the model, how does that affect my belief
on whether I should think that there should be a, is the square present in the image. So,
so this is the process of inference. So one example of inference is having this
expiring of effect between multiple causes. So, uh, graphical models can be used to represent
causality in the world. Um, so let's say, um, you know, uh, your, uh, alarm, uh, the, uh, at home
can be, uh, triggered by a, uh, burglar getting into your house, uh, or it can be triggered by
an earthquake. Both, both can be causes of the alarm going off. So now you, you're, you know,
you're in your office, you heard burglar alarm going off. You are heading home, uh, thinking
that there's a burglar. But while driving home, if you hear on the radio that there was an earthquake
in the vicinity, now you're high, you know, uh, strength of evidence for, uh, a burglar getting
into their house is diminished because now that, that piece of evidence is explained by
the earthquake being present. So if you, if you think about these two causes explaining at
lower level, uh, variable, which is alarm. Now what we're seeing is that increasing the evidence
for some cause, you know, there is evidence coming in from below for alarm being present.
And initially it was flowing to a burglar being present. But now since somebody, some, this
side evidence for this other cause, it explains away this evidence and it, evidence will now
flow to the other cause. This is, you know, two competing causal, uh, things trying to explain
the same evidence. And the brain has a similar kind of mechanism for, uh, for doing so. That's
kind of interesting. I mean, and that, how's that all encoded in the brain? Like, where's
the storage of information? Are we talking just maybe to get it, uh, a little bit more specific?
Like, is it in the hardware of the actual connections? Is it in, uh, chemical communication?
Is it electrical communication? Do we, do we know?
So, so this is, you know, a paper that we are bringing out soon.
Which one is this?
Um, this is the cortical micro circuits paper that I sent you a draft of. Of course, this is,
a lot of it is still hypothesis. One hypothesis is that you can think of a cortical column
as encoding a, a concept, a concept, you know, think of it as a, uh, a, um, an example of a
concept is, um, is an edge present or not, or is, is an object present or not. Okay. So it can,
you can think of it as a binary variable, a binary random variable, the presence of an edge or not,
or the presence of an object or not. So each cortical column can be thought of as representing
that one concept, one variable. And then the connections between these cortical columns
are basically encoding the relationship between these random variables.
And then there are connections within the cortical column.
There are, each cortical column is implemented using multiple layers of neurons with very,
very, very rich, um, structure there. You know, there are thousands of neurons in a cortical column.
But, but that structure is similar across the different cortical columns.
Correct. Yeah. Correct. And also these cortical columns collect, connect to a
substructure called thalamus in the, uh, you know, so all, all cortical columns pass through this
substructure. So our hypothesis is, is that the connections between the cortical columns
implement this, uh, you know, that's where the knowledge is stood about, you know, how these
different concepts, concepts connect to each other. And then the, the neurons inside this
cortical column and in thalamus in combination implement this, uh, actual computations needed
for inference, which includes explaining a way and competing between the different, uh,
hypothesis. Um, and it is all very, so what is amazing is that, um,
neuroscientists have actually done ex experiments to the tune of showing these things.
They might not be putting it in the overall inference framework,
but they will show things like if I poke this higher level neuron, uh, it will inhibit through
this complicated loop through the thalamus, it will inhibit this other column.
So they will, they will do such experiments.
But do they use terminology of concepts, for example? So, so you're, I mean,
is it, uh, is it something where it's easy to anthropomorphize and think about concepts?
Like, uh, you start moving into logic-based kind of reasoning systems. So, um,
I would just think of concepts in that kind of way, or is it, uh, is it a lot messier,
a lot more gray area, you know, even, even more gray, even more messy than, uh,
the artificial neural network kinds of abstractions.
It's easiest way to think of it as a variable, right? It's a binary variable,
which is showing the presence or absence of something.
But I guess what I'm asking is, is that something, uh,
we're supposed to think of something that's human interpretable of that something?
It doesn't need to be. It doesn't need to be human interpretable.
There's no need for it to be human interpretable.
Uh, but it's, it's almost like, um, you, you will be able to find some interpretation of it,
uh, because it is connected to the other things.
Yeah. And the point is it's useful somehow.
Yeah.
It's useful as an entity in the graph that in connecting to the other entities that are,
let's call them concepts.
Right.
Okay. So, uh, by the way, what's, are these the cortical microcircuits?
Correct. These are the cortical microcircuits.
You know, that's what neuroscientists use to talk about the circuits in, in, uh,
within a level of the cortex.
So you can think of, you know, let's think of a neural network, you know, artificial neural
network terms, you know, people talk about the architecture of the, you know, so how many,
how many layers they build, uh, you know, what is the fan in fan out, et cetera.
That is the macro architecture.
Um, so, and then within a layer of the neural network, you can, you know, the cortical neural
network is much more structured within, you know, within a level.
There's a lot more intricate, uh, structure there.
Uh, but even, um, even within an artificial neural network, you can think of in feature
detection plus pooling as one, one level.
And so that is kind of a microcircuit.
Uh, it's much more, uh, complex in the real brain.
Uh, and, uh, and so within a level, whatever is that circuitry within a column of the cortex
and between the layers of the cortex, that's the microcircuitry.
Yeah. I love that terminology.
Uh, machine learning people don't use the circuit terminology, but they show
it should. It's, uh, it's a nice.
So, okay. Uh, okay.
So that's, uh, that's the, the, the cortical microcircuit.
So what's interesting about, uh, what, what can we say?
What is the paper that, uh, you're working on, uh, proposed about the ideas around
these cortical microcircuits?
So this is a fully functional model for the microcircuits of the visual cortex.
So the, the paper focuses on your idea and our discussions now is focusing on vision.
Yeah.
The, uh, visual cortex.
Okay. Yeah.
This is a model. This is a full model. This is, this is how vision works.
But this is, this is a, yeah.
A hypothesis.
A hypothesis.
Okay. So let me, let me step back a bit.
Um, so we looked at neuroscience for insights on how to build a vision model.
Right.
And, and, and we synthesized all those insights into a computational model.
This is called the recursive cortical network model that we, we used for breaking captures
and, uh, and we are using the same model for robotic picking and, uh, tracking of objects.
And that again is a vision system.
That's a vision system.
Computer vision system.
That's a computer vision system.
Takes in images and outputs.
What?
On one side, it outputs the class of the image, uh, and also segments the image.
And you can also ask it further queries.
Where is the edge of the object?
Where is the interior of the object?
So, so it's a, it's a model that you build to answer multiple questions.
So you're not trying to build a model for just classification or just segmentation,
et cetera.
It's a, it's a, it's a joint model that can do multiple things.
Um, and, um, so, so that's the model that we built using insights from neuroscience.
And some of those insights are what is the role of feedback connections?
What is the role of lateral connections?
So all those things went into the model.
The model actually uses feedback connections.
All these ideas from, from neuroscience.
Yeah.
So what, what, what the heck is a recursive cortical network?
Like what, what are the architecture approaches?
Interesting aspects here, which is essentially a brain inspired approach to a computer vision.
Yeah.
So there are multiple layers to this question.
I can go from the very, very top and then zoom in.
Okay.
So one important thing constrained that went into the model is that you should not think
vision, think of vision as something in isolation.
We should not think perception as something as a pre-processor for cognition.
Perception and cognition are interconnected.
And so you should not think of one problem in separation from the other problem.
Um, and so that means if you finally want to have a system that understand concepts
about the world and can learn a very conceptual model of the world and can reason and connect
to language, all of those things, you need to, you need to have, think all the way through
and make sure that your perception system is compatible with your cognition system and
language system and all of them.
And one aspect of that is top down controllability.
Um, what does that mean?
So that means, you know, so, so think of, you know, you can close your eyes and think about
the details of one object.
Right.
I can, I can zoom in further and further.
I can, you know, so, so think of the bottle in front of me.
Right.
And now you can think about, okay, what the cap of that bottle looks.
Uh, I know we can think about what's the texture on that bottle of the, uh, the cap,
you know, you can think about, you know, what will happen if, uh, something hits that.
Uh, so you can, you can, you can manipulate your visual knowledge in, uh, cognition driven ways.
Yes.
Uh, and so this top down controllability, uh, and being able to simulate scenarios in the world.
So you're not just a passive, uh, player in this perception game.
You, you can, you can control it.
You can, you, you have imagination.
Correct.
Correct.
So, so, so basically, you know, basically having a generating network, uh,
which is a model and, and it is not just some arbitrary generating network.
It has to be, it has to be built in a way that it is controllable top down.
It is, it is not just trying to generate a whole picture at once.
Uh, you know, it's not trying to generate photorealistic things of the world.
You, you know, you don't have good photorealistic models of the world.
Human brains do not have, if I, if I, for example, ask you the question, uh,
what is the color of the letter E in the Google logo?
You have no idea.
No idea.
Although I have seen it millions of times or hundreds of times.
So, uh, so it's not, our model is not photorealistic, but, but it is,
but it has other properties that we can manipulate it, uh, in the, uh,
and you can think about filling in a different color in that logo.
You can think about expanding the, the letter E, you know, you can see what'll in,
so you can imagine the consequence of, you know, actions that you have never performed.
So, so these are the kind of characteristics the generative model need to have.
So this is one constraint that went into our model.
Like, you know, so this is when you read the, just the perception side of the paper,
it is not obvious that this was a constraint into the, that went into the model,
this top down controllability of the generative model.
Uh, so what, what does top down controllability in a model look like?
That's a really interesting concept, fascinating concept.
What is that?
Is that the recursive, recursiveness gives you that or how do you, how do you do it?
Quite a few things.
It's like, what, what does the model factor factorize?
You know, what are the, what is the model representing as different pieces in the puzzle?
Like, you know, so, so in the RCN, uh, network, it, it thinks of the world, you know, so what I
say, the background of an image is modeled separately from the foreground of the image.
Got it. So, so the objects are separate from the background.
They are different entities.
So there's a kind of segmentation that's built in fundamentally.
And, and then even that object is composed of parts and also, and another one is the,
the shape of the object, uh, is differently modeled from the texture of the object.
Got it. So there's like these, um, uh, I've been, you know, who François Chalet is.
Uh, he's, so there's a, he developed this like IQ test type of thing for arc challenge for,
and it's kind of cool that there's, um, these concepts, priors that he defines that you bring
to the table in order to be able to reason about basic shapes and things in IQ tests.
So here you're making it quite explicit that here, here are the things that you should be,
be, these are like distinct things that you should be able to, um, model in this.
Keep in mind that you, you can derive this from much more general principles.
It doesn't, you don't need to explicitly put it as, oh, objects versus foreground versus
background, uh, the surface versus texture. No, these are, these are derived from, uh,
more fundamental principles of how, you know, what's the property of continuity of natural signals?
What's the property of continuity of natural signals?
Yeah.
By the way, that sounds very poetic, but yeah, uh, so you're saying that's a,
there's some low level properties from which emerges the idea that shapes would be different than
like, uh, you should, there should be a part of an object. There should be, I mean,
exactly. Kind of like François Chalet. I mean, there's objectness.
There's all these things that it's kind of crazy that we humans, uh, I guess evolved to have
because it's useful to perceive the world. Correct. Correct. And it derives mostly from
the properties of natural signals and, and so, um, natural signals. So natural signals are the
kind of things we'll perceive in the, in the natural world. Correct. I don't know. I don't,
I don't know why that sounds so beautiful. Natural signals. Yeah.
As opposed to a QR code, right? Which is an artificial signal that we created.
Humans are not very good at classifying QR codes. We are very good at saying something
is a cat or a dog, but not very good at, you know, where the computers are very good at
classifying QR codes. Um, so our, our visual system is tuned for natural signals. Uh,
and there are fundamental assumptions in the architecture that are derived from natural
signals, uh, properties. I wonder when you take, uh, hosts and genetic drugs,
does that go into natural or is that closer to the QR code?
It's still natural. It's still natural. Yeah. Because it's, it is still operating using our
brains. By the way, on that, on that topic, I mean, I haven't been following, I think they're
becoming legalized and certain. I can't wait until they become legalized to a degree that you,
like vision science researchers could study it. Yeah. Just like through, through medical, chemical
ways, modify. There could be ethical concerns, but modify. That's another way to study the brain
is to be, be able to chemically modify it. There's probably, um, probably very long
a way to, to figure out how to do it ethically. Yeah. But I, I think there are studies on that
already. Yeah. I think so. Uh, because it's, it's not unethical to give, uh, it to rats.
Oh, that's true. That's true. There's a lot of, there's a lot of drugged up rats out there. Okay.
Yeah. Sorry. Sorry to, it's okay. So there's, uh, uh, so there's these, uh, uh, low level, uh,
things from natural signals that, uh, that, that, uh, that can. From which these properties will
emerge. Yes. Uh, but it is still a very hard problem on how to encode that. Again, you know,
so you don't, you know, there is no, uh, so, uh, you mentioned, um, the, the, the priors, uh,
Franchot wanted to encode in, uh, in the, uh, abstract reasoning challenge, but it is not
straightforward how to encode those priors. Um, so, so some of those, uh, challenges, like, you
know, the objective completion challenges are things that we purely use our visual system to do.
It is, uh, it looks like abstract reasoning, but it is purely an output of the, the vision system.
For example, completing the corners of that canincer triangle, completing the lines of that
canincer triangle. It's a purely a visual system property. You know, it's, there is no abstract
reasoning involved. It uses all these priors, but it is stored in our visual system in a particular
way that is amenable to inference. And, and, and that is one of the things that we tackled in the,
you know, specifically saying, okay, these are the prior knowledge, uh, which, which will be derived
from the word. But then how is that prior knowledge represented in the model such that
inference when, when some piece of evidence comes in can be done very efficiently and in a very
distributed way. Um, because it is very, there are so many ways of representing knowledge,
which is not amenable to very quick inference, you know, quick lookups. Uh, and so that's one,
um, core part of what we tackled in, uh, the RCN model. Um, uh, how do you encode visual
knowledge to, uh, do very quick inference and yeah.
Can you maybe comment on, uh, so folks listening to this in general may be familiar with
different kinds of architectures of a neural networks. What, what are we talking about with the
RCN? Uh, what, what does, what does the architecture look like? What are different components?
Is it close to neural networks? Is it far away from neural networks? What does it look like?
Yeah. So, so you can, uh, think of the delta between the model and a convolutional neural
network. If, if people are familiar with convolutional neural networks. So convolutional
neural networks have this feed forward processing cascade, which is called, uh, feature detectors
and pooling. And that is repeated in the, in the hierarchy in a, in a, uh, multi-level, uh, system.
Um, and if you, if you want to, an intuitive idea of what, what is happening, feature detectors are,
uh, you know, detecting interesting co-occurrences in the input. It can be a line, a corner, a,
a, an eye or a piece of texture, et cetera. And the, the pooling neurons are doing some local
transformation of that and making it invariant to local transformations. So this is what the
structure of convolutional neural network is. Um, recursive cortical network has a similar structure
when you look at just the feed forward pathway. But in addition to that, it is also structured
in a way that it is generative so that it can run it backward and combine the forward with the
backward. Another aspect that it has is it has lateral connections. These lateral connections,
um, which is between, so if you have an edge here and an edge here, it has connections between
these edges. It is not just feed forward connections. It is, um, something between these edges, which
is, uh, the, the nodes representing these edges, which is to enforce compatibility between them.
So otherwise what will happen is that constraints, it's a constraint. It's basically, if you, if you
do just feature detection followed by pooling, then your, your transformations in different
parts of the visual field have not coordinated. Uh, and so you can, you will create, uh, jagged,
when you, when you generate from the model, you will create jagged, um, uh, things and
uncoordinated transformations. So these lateral connections are enforcing the, the transformations.
Is the whole thing still differentiable? Uh, no. Okay. No. It's not, it's not trained using, uh,
back prop. Okay. That's really important. So, uh, so there's these feed forward, there's feedback
mechanisms. There's some interesting connectivity things. It's still layered like, uh, there are
multiple levels, multiple layers. Okay. Very, very interesting. Uh, and yeah. Okay. So the
interconnection between, um, adjacent, the connections across service constraints that
keep the thing stable. Correct. Okay. So what else? Uh, and then there's this idea of doing
inference. A neural network does not do inference on the fly. So an example of why this inference
is important is, you know, so one of the first applications, uh, that we showed in the paper
was to crack, uh, text-based captures. What are captures, by the way? Uh, yeah. By the way, one
of the most awesome, like the people don't use this term anymore as human computation, I think.
Uh, I love this term. The guy who created captures, I think came up with this term. Yeah.
I love it. Anyway. Uh, yeah. Uh, what, what are captures? So captures are those strings that you
fill in, uh, when you're, you know, when, if you're open, opening a new account in Google, they show
you a picture, a, you know, usually it used to be set of garbled letters, uh, that you have to kind
of, uh, figure out what, what, what is that string of characters and type in. And the reason captures
exist is because, you know, um, Google or Twitter do not want automatic creation of accounts. You
can use a computer to create millions of accounts, uh, and, uh, use that for, you know, for these
purposes. Uh, so you want to make sure that to the extent possible, the interaction that,
uh, their system is having is with a human. So it's a, it's called a human interaction proof.
A capture is a human interaction proof. Um, so, so this is a captures are by design,
things that are easy for humans to solve, but hard for computer, hard for robots. Yeah.
Um, so, and text-based captures were, was the one which is prevalent on around 2014,
because at that time, text-based captures were hard for computers to crack. Even now, they are
actually in the sense of an arbitrary text-based capture will be unsolvable even now. But with
the techniques that we have developed, it can be, you know, you can quickly develop a mechanism that
solves the capture. Uh, they've probably gotten a lot harder too. They've been getting clever
and clever generating these text captures. Yeah. So okay. So that was one of the things you've
tested it on is these kinds of captures in 2014, 15, that kind of stuff. So what, uh,
I mean, why, by the way, why captures? Yeah. Yeah. Even now, I would say capture is a very,
very good challenge problem. Uh, if you want to understand how human perception works and if you
want to build, uh, systems that work like the human brain. Uh, and I wouldn't say capture is a solved
problem. We have cracked the fundamental defense of captures, but it is not solved in the way
that humans solve it. Um, so I can give you an example. I can, um, take a five-year-old child
who has just learned characters, uh, and, uh, show them any new capture that we create,
they will be able to solve it. Uh, I can show you pretty much any new capture from any new website.
You'll be able to solve it without getting any training examples from that particular style
of capture. You're assuming I'm human. Yeah. Yes. Yeah. Uh, that's right. So if you are human,
uh, if you, uh, otherwise I will be able to figure that out using this one. But, uh,
this whole podcast is just a touring test. That's a long, a long touring test.
Anyway, I'm sorry. So yeah. So she's human. Humans can figure it out with very few examples
or no training examples like no training examples from that particular style of capture. Um, and,
and so you can, you know, so, uh, even now this is unreachable for, uh, the current deep learning
system. So basically there is no, I don't think a system exists where you can basically say,
train on whatever you want. And then now say, Hey, I will show you a new capture,
which I did not show you in, in the, in the training setup. Will the system be able to
solve it? Um, it still doesn't exist. So that is the magic of human perception. Yeah. And
Doug have started, uh, put this, uh, very beautifully in, uh, one of his, uh, talks, the, the central
problem in AI is what is the letter A. If you can, if you can build a system that reliably
can detect all the variations of the letter A, you don't even need to go to the, the, the V and
the C. Yeah. Yeah. You don't even know to go to the V and the C or the strings of characters.
And, uh, so that, that is the spirit at which, you know, with, which we, uh, tackle that problem.
What does it mean by that? I mean, is it, uh, like without training examples,
try to figure out the fundamental, uh, elements that make up the letter A in all of its forms.
In all of its forms, it can be, A can be made with the two humans standing,
leaning against each other, holding the hands and, uh, it can be made of leaves. It can be.
Yeah. You might have to understand, uh, everything about this world in order to understand letter
A. Yeah. So it's common sense reasoning, essentially. Yeah. Right. So, so to finally,
to really solve, finally, to say that you have solved capture, uh, you have to solve the whole
problem. Yeah. Okay. So what, how does, uh, this kind of the RCN architecture help us to get,
uh, do a better job of that kind of thing. Yeah. So, uh, as I mentioned, one of the important
things was being able to do inference, being able to dynamically do inference. Can you,
can you, uh, can you, uh, clarify what you mean? Cause could you say like neural networks don't
do inference? Yeah. So what do you mean by inference in this context then? So, okay.
So in captures, what they do to confuse people is to make these characters crowd together.
Yes. Okay. And when you make the characters crowd together, what happens is that you will now
start seeing combinations of characters as some other new character or, or an existing character.
So you would, you would put an R and N together. It will start looking like an M. Uh, and, and so
locally they are, you know, there is very strong evidence for it being, uh, some, uh, incorrect
character. But globally, the only explanation that fits together is something that is different
from what you find locally. Yes. So, so, so this is inference. You are basically taking, uh, local
evidence and putting it in the global context and often coming to a conclusion locally, which is
conflicting with the local information. So actually, so you mean inference, like, uh, in the way it's
used to, when you talk about reasoning, for example, uh, as opposed to like inference, which is this
with, you know, with artificial neural networks, which is a single path to the network.
Okay. So like you, you're basically doing some basic forms of reasoning,
like integration of like, uh, how local things fit into the, the global picture.
And, and, and things like explaining a way coming into this one, because you are, you are, uh,
explaining that piece of evidence, uh, as something else, uh, because globally that's the only thing
that makes sense. Um, so now, uh, you can amortize this inference by, you know, in a neural network,
if you want to do this, what you, you can, you can brute force it. You can just show it all
combinations of things, uh, that you want to, you want to, uh, your reasoning to work over.
And you can, you know, like just train the help out of that neural network and it will
look like it is doing, uh, you know, inference on the fly, but it is, it is really just
doing amortized inference. It is because you, you have shown it a lot of these combinations
during training time. Um, so what you want to do is be able to do dynamic inference rather
than just being able to show all those combinations in the training time. And that's something we
emphasized in the model. What does it mean dynamic inference? Is that, that has to do with the feedback
thing? Yes. Like what, what is dynamic? I'm trying to visualize what dynamic inference would be in
this case. Like what is it doing with the input? It's shown the input the first time. Yeah. And
is, is like, what's changing over temporarily over? What's the dynamics of this inference process?
So, so you can think of it as you have, um, at the top of the model, the characters that you are
trained on, they are the causes that you're trying to explain the pixels using the characters as the
causes. The, you know, the characters are the things that cause the pixels. Yeah. So there's
this causality thing. So the reason you mentioned causality, I guess, is because there's a temporal
aspect of this whole thing. In this particular case, the temporal aspect is not important. It is
more like, when, if, if I turn the character on, the, the pixels will turn on. Yeah. It will be after
there's a little bit, but yeah. So that is the causality in the sense of like a logic causality,
like hence inference. Okay. The dynamics is that, uh, even though locally it will look like, okay,
this is an A, uh, and, and locally, just when I look at just that patch of the image, it looks
like an A, but when I look at it in the context of all the other causes, it might not, is not
the something that makes sense. So that is something we have to kind of, you know, recursively
figure out. Yeah. So, okay. So, uh, and, uh, this thing performed pretty well on the captures.
Correct. And, uh, I mean, is there some kind of interesting intuition you can provide? Why did
well? Like what did it look like? Is there visualizations that could be human interpretable
to us humans? Yes. Yeah. So the, the good thing about the model is that it is extremely, um,
so it is not just doing a classification, right? It is, it is, it is, it is providing a full
explanation for the scene. So when, when it, when it, uh, operates on a scene, it is coming at back
and saying, look, this is the part is the A and these are the pixels that turned on, uh, these
are the pixels in the input that tells makes me think that it is an A. And also these are the
portions I hallucinated. It provides a complete explanation of that form. And then these are
the contours. These are, this is the interior and this is in front of this other object.
So that, that's the kind of, um, explanation it, uh, the inference network provides. So,
so that, that is useful and interpretable. Um, and, uh, um, then the kind of errors it makes
are also, I don't want to, um, read too much into it, but the kind of errors the network
makes are, uh, very similar to the kinds of errors humans would make in a, in a similar
situation. So there's something about the structure that feels reminiscent of the way
humans, uh, visual system works. Well, I mean, uh, how hard-coded is this to the capture problem,
this idea? Uh, not really hard-coded because it's the, uh, the assumptions, as I mentioned,
are general, right? It is more, um, and, and those themselves can be applied in many situations
which are natural signals. Um, so it's, it's the foreground versus, uh, background factorization
and, uh, the factorization of the surfaces versus the contours. So these are all generally
applicable assumptions in all vision. So why, why capture, why attack the capture problem,
which is quite unique in the computer vision context versus like the traditional benchmarks
of ImageNet and all those kinds of image classification or even segmentation tasks,
all of that kind of stuff. Do you feel like that's, uh, I mean, what, what's your thinking
about those kinds of benchmarks in, um, in this, in this context?
I mean, those benchmarks are useful for deep learning kind of algorithms where you, you know,
so the, the settings, uh, that deep learning works in are, here is my huge training set
and here is my test set. So the, the training set is almost, uh, you know, a hundred X,
thousand X bigger than, uh, the test set in many, many, many cases. Uh, what we wanted to do was
invert that. The training set is way smaller than the, the test set. Uh, and, uh, uh, and,
you know, uh, capture is a problem that is by definition hard for computers and it has these
good properties of strong generalization, strong out of training distribution generalization.
If you are interested in studying that, uh, and putting, uh, having your model have that property,
then it's a, it's a good data set to tackle. So is there, have you attempted to, which I think,
I believe there's quite a growing body of work on looking at MNIS and ImageNet without training.
So like taking, like the basic challenge is how, what tiny fraction of the training set
can we take in order to do a reasonable job of the classification task? Have, have you explored
that angle on these classic benchmarks? Yes. So, so we did do MNIST. So, um, you know, so it's not
just capture. We, uh, so there was, uh, also, uh, uh, uh, versions of multiple versions of MNIST,
including the, the standard version, which, where we inverted the problem, which is basically saying,
rather than train on 60,000, uh, training data, uh, you know, how, uh, quickly can you get, uh,
to high level accuracy with very little training data.
Was, is there some, uh, performance that you remember? Like how well,
how well did it do? How many examples did it need?
Yeah. I, I, you know, I remember that it was, you know, uh, on the order of, uh,
tens or hundreds of examples to get into, uh, 95% accuracy. And it was, it was definitely better
than the systems, other systems out there at that time. At that time. Yeah. Yeah. They're
really pushing. I think that's a really interesting space actually. Uh, I think there's an actual
name for MNIST that, uh, like there's different names, the different sizes of training sets.
I mean, people are like attacking this problem. I think it's super interesting. Yeah. It's funny
how like the MNIST will probably be with us all the way to AGI as the data set that just sticks
by. It is, it's a clean, simple, uh, data set to, uh, to study the fundamentals of learning with
just like captures. It's interesting. Not enough people, I don't know, maybe you can correct me,
but I feel like captures don't show up as often in papers as they probably should.
That's correct. Yeah. Because, you know, um, usually these things have a momentum, uh, you know,
once, once, uh, something gets established as a standard benchmark, there is a, there is a, uh,
there is a dynamics of, uh, how graduate students operate and how academic system works that, uh,
pushes people to track that, uh, benchmark. Yeah. Nobody wants to think outside the box. Okay.
Yeah. Uh, okay. So good performance on the captures. What else is there interesting, um,
on the RCN side before we talk about the cortical micros? Yeah. So the, the same model,
so the, the, the important part of the model was that it trains very quickly with very little
training data and it's, uh, you know, quite robust to out of distribution, uh, perturbations. Um,
and, uh, and we are using that, uh, very, uh, fruitfully in, uh, advocated in many of the robotic
stocks we are solving. Uh, well, let me ask you this kind of touchy question. I have to,
I've spoken with your friend colleague, Jeff Hawkins, too. I mean, he's, uh, I have to kind
of ask, there is a bit of, whenever you have brain inspired stuff and you make big claims,
uh, big sexy claims, there's a, you know, uh, there's critics, I mean, machine learning sub
Reddit. Don't get me started on those people. Uh, they're hard. I mean, criticism is good,
but they're a bit, they're a bit over the top. Um, there is quite a bit of sort of skepticism
and criticism. You know, is this work really as good as it promises to be? Yeah. Do you have
thoughts on that kind of skepticism? Do you have comments on the kind of criticism you might have
received, uh, about, you know, is this approach legit? Is this, is this a promising approach?
Yeah. Or at least as promising as it seems to be, you know, advertised as?
Yeah, I can comment on it. Um, so, you know, our, uh, our scene paper is, uh, published in science,
which I would argue is, is a very high quality journal, very hard to, uh, publish in and use,
you know, usually it is indicative of the, of the quality of the work. And, um, uh, I can, I can,
I am very, very certain that the ideas that we brought together in that paper, uh, in terms of
the importance of feedback connections, uh, recursive inference lateral connections, uh,
coming to best explanation of the scene as the problem to solve, trying to solve, uh, recognition,
segmentation, uh, all jointly in a way that is compatible with higher level cognition,
top-down attention, all those ideas that we brought together into something, you know,
coherent and workable in the, uh, in the world and solving a challenging, tackling a challenging
problem. I think that will, that will stay and that, that contribution I stand by, right?
Now, uh, I can, I can tell you a story, uh, which is funny in the, in the context of this, right?
So if you read the abstract of the paper and the, you know, the argument we are putting in,
you know, we are putting in, look, current deep learning systems take a lot of training data.
They don't use these insights and here is our new model, which is not a deep neural network,
it's a graphical model. It does inference. This is how the paper is, right? Now,
once the paper was accepted and everything, um, it went to the press department in, in science,
you know, to play as science office. We didn't do any press release when it was published. It was,
it went to the press department. What did the, what was the press release that they wrote up?
A new deep learning model.
Solves captchas.
Solves captchas. And, uh, so, so you can see where was, you know, what, what was being hyped, uh,
in that, uh, thing, right? So, so it's like, um, there is the, there is a dynamic in the, uh,
in the community of, you know, so, uh, um, that's especially happens when there are
lots of new people coming into the field and they get attracted to one thing.
And some people are trying to think different, uh, compared to that. So there is, there is some,
uh, I think skepticism is science is important and it is, um, you know, very much, uh, required.
But it's also, it's not a skepticism. Usually it's mostly bandwagon effect that is happening
rather than in,
Well, but that's not even that. I mean, I'll tell you what they react to, which is like,
I'm sensitive to as well. If you, if you look at just companies, open AI, deep mind, um,
the curious, I mean, it just, there's, uh, there's a little bit of a race to the top and hype, right?
Right. It's, it's like, it doesn't pay off to be humble.
So like, uh, and, and the press is just, uh, irresponsible often. They, they just, I mean,
don't get me started on the state of journalism today. Like it seems like the people who write
articles about these things, they literally have not even spent an hour on the Wikipedia
article about what is neural networks. Like they haven't like invested just even the language
to laziness. It's like, uh, robots beat humans. Like they write this kind of stuff that just, uh,
and then, and then of course the researchers are quite sensitive to that, uh, because it gets a lot
of attention. They're like, why did this work get so much attention? Uh, you know, that's,
that's over the top and people get really sensitive, you know, the same kind of criticism
with, um, opening. I did work with, uh, Rubik's cube with a robot that people criticized, uh,
same with GPT two and three. They criticize, uh, same thing with, uh, deep minds with alpha zero.
I mean, yeah, I, I'm sensitive to it. Um, but, and of course with your work, you mentioned
deep learning, but there's something super sexy to the public about brain inspired.
I mean, that immediately grabs people's imagination, not even like neural networks,
but like really brain inspired, like brain like neural networks. That seems really compelling
to people. And, um, to me as well, to, to the world as a narrative. And so, uh, people hook up,
hook on to that. And, uh, sometimes you, uh, the skepticism engine turns on in the research
community and they're skeptical, but I think putting aside the ideas of the actual performance
and captures or performance, any data set, I mean, to me, all these data sets are useless.
They're useless anyway. It's nice to have them. Uh, but in the grand scheme of things,
they're silly toy examples. The point is, is there intuition about the, the idea is just
like you mentioned, bringing the ideas together in the unique way. Is there something there?
Is there some value there? And is it going to stand the test of time?
Yes. And that's the hope. That's the hope.
Uh, I'm, my confidence there is very high. I, you know, I don't treat brain inspired as a marketing
term. Uh, you know, I am looking into the details of biology and I'm puzzling over, uh, those things
and I am, I'm grappling with those things. And so it is, it is not a marketing term at all. It,
you know, you can use it as a marketing term and, and people often use it and you can get
combined with them. And when, when people don't understand how we are approaching the problem,
it is, it is easy to be misunderstood and, you know, think of it as, you know, purely, uh,
marketing, but that's not the way we are.
So you really, I mean, as a scientist, you believe that if we kind of just stick to really
understanding the brain, that's going to, that's the right, like you, you should constantly meditate
on the, how does the brain do this? Cause that's going to be really helpful for engineering
intelligent systems.
Yes. You need to. So I think it is, it's one input and it is, it is helpful,
but you, you should know when to deviate from it too. Um, so an example is convolutional neural
networks, right? Uh, convolution is not an operation brain in, uh, implements the visual
cortex is not convolutional. Visual cortex has local receptive fields, local connectivity,
but the, you know, the, um, there is no translation in, in variants in the, um, uh,
the network weights, um, in, in the visual cortex, that is a, uh, computational trick,
which is a very good engineering trick that we use for sharing the training between the
different, uh, nodes. Um, so, uh, and, and that trick will be with us for some time,
it will go away when we have, um, uh, uh, robots with eyes and heads that move. Uh,
and so then the, that trick will go away. It will not be, uh, useful at that time.
So, uh, so the brain doesn't, so the brain doesn't have translational invariance. It has
the focal point, like it has a thing it focuses on. Correct. It has, it has a phobia and, and
because of the phobia, um, the, the receptive fields are not like the copying of the weights,
like the, the weights in the center are very different from the weights in the periphery.
Yes. At the periphery. I mean, I did this, uh, actually wrote a paper and just got an
then she has to really study peripheral peripheral vision, which is a fascinating thing. Very under
understood thing of what the brain, you know, at every level, the brain does with the periphery.
It does some funky stuff. Yeah. So it's, uh, it's another kind of trick than, uh, convolutional.
Like it does, it, uh, it's a, you know, convolution, your convolution in neural networks is a trick to,
for efficiency is efficiency trick. And the brain does a whole nother kind of thing.
Correct. Correct. So, so you need to understand the principles or processing so that you can
still apply engineering tricks when, where you want it to be. You don't want to be slavishly
mimicking all the things of the brain. Um, and, and so yeah, so it should be one input. And I
think it is extremely helpful. Uh, but you, it should be the point of really understanding so
that you know when to deviate from it. So, okay. That's really cool. That's worked from a few years
ago. So you, uh, you did work in New Menta with Jeff Hawkins. Yeah. Uh, with, uh, hierarchical
temporal memory. How is your just, if you could give a brief history, how is your view of the way
the models of the brain changed over the past few years leading up to, to now? Is there some
interesting aspects where there was an adjustment to your understanding of the brain or is it all
just building on top of each other? In terms of the higher level ideas, uh, especially the ones
Jeff wrote about in the book, if you, if you blur out, right, you know, on intelligence, right,
on intelligence, if you, if you blur out the details and, and if you just zoom out and the
higher level idea, uh, things are, I would say consistent with what he wrote about, but,
but many things will be consistent with that because it is, it's a blur, you know, when you,
when you, you know, deep learning systems are also, you know, multilevel hierarchical, all of
those things, right? So, so at the, but, um, in terms of the detail, a lot of things are different.
Uh, and, and, and those details matter a lot. Um, so, so one point of difference I had with Jeff,
uh, uh, was, uh, how to approach, you know, how much of biological possibility and realism
do you want in the learning algorithms? Um, so, uh, when I was there, uh, this was, you know,
almost 10 years ago now. So, yeah, I don't know, I don't know what Jeff thinks now, but 10 years
ago, uh, the difference was that I did not want to be so constrained on saying, uh, my learning
algorithms want to need to be biologically plausible, um, based on some filter of biological
possibility available at that time. To me, that is a dangerous cut to make because we are, you know,
discovering more and more things about the brain all the time. New biophysical mechanisms, new
channels, uh, are being discovered all the time. So I don't want to upfront kill off and, uh, a
learning algorithm just because we don't really understand the fold, uh, the full, uh, the biophysics
or whatever of how the brain learns. Exactly. Exactly. But let me ask a search and drop,
like what's our, what's your sense? What's our best understanding of how the brain learns?
So things like back propagation, credit assignment. So, so many of these algorithms
have learning algorithms have things in common, right? It is a back propagation is one way of
credit assignment. There is another algorithm called expectation maximization, which is,
you know, another weight adjustment algorithm. But is it your sense? The brain does something
like this has to, there is no way around it in the sense of saying that you do have to adjust the
the connections. So yeah. And you're saying credit assignment, you have to reward the
connections that were useful in making a correct prediction and not, yeah, I guess,
brought up, but yeah, it doesn't have to be differentiable. I mean, it doesn't have to be
differentiable. Yeah. But you have to have a, you know, you have a model that you start with,
you have data comes in and you have to have a way of adjusting the model such that it better
fits the data. Yeah. So that, that is all of learning, right? And some of them can be using
backprop to do that. Some of it can be using, you know, very local graph changes to do that.
That can be, you know, many of these learning algorithms have similar update properties locally
in terms of what the neurons need to do locally. I wonder if small differences in learning
algorithms can have huge differences in the actual effect. So the dynamics of,
I mean, sort of the reverse like spiking, like if credit assignment is like a lightning versus
like a rainstorm or something, like whether, whether there's like a looping local type of
situation with the credit assignment, whether there is like regularization, like how,
how, how it injects robustness into the whole thing, like whether it's chemical or electrical
or mechanical. Yeah. All those kinds of things. Yes. I feel like it, that,
yeah, I feel like those differences could be essential, right? It could be. It's just that
you don't know enough to, on the learning side, you don't know enough to say that is definitely
not the way the brain does it. Got it. So you don't want to be stuck to it. So that, yeah,
so you've been open minded on that side of things. On the inference side, on the recognition side,
I am much more amenable to being constrained because it's much easier to do experiments
because, you know, it's like, okay, here's the stimulus, you know, how many steps did it get
to take the answer? I can trace it back. I can, I can understand the speed of that computation,
et cetera, much more readily on the inference side. Got it. And then you can't do good experiments
on the learning side. Correct. So let's, let's go right into the cortical microcircuits right
back. So what, what are these ideas beyond recursive cortical network that you're looking at now?
So we have made a, you know, pass through, you know, multiple of the steps that, you know,
as I, as I mentioned earlier, you know, we were looking at perception from the angle of cognition,
right? It was not just perception for perception's sake. How do you, how do you connect it to
cognition? How do you learn concepts? And how do you learn abstract reasoning? Similar to some
of the things Francois talked about, right? So, so we have taken one pass through it basically
saying, what is the basic cognitive architecture that you need to have, which has a perceptual
system, which has a system that learns dynamics of the world and then has something like a routine
program learning system on top of it to learn concepts. So we have, we have built one, the,
you know, the version point one of that system. This was another science robotics paper. It is,
it's the title of that paper was, you know, something like cognitive programs. How do you
build cognitive programs? And, and the application there was on manipulation, robotic manipulation?
It was, it was, so think of it like this. Suppose you wanted to tell a new person
that you met, you don't know the language or that person uses, you want to communicate to
that person to achieve some task, right? So I want to say, hey, you need to pick up all the,
the red cups from the kitchen counter and put it here, right? How do you communicate that, right?
You can show pictures. You can basically say, look, this is the starting state.
The things are here. This is the ending state. And, and what does the person need to understand
from that? The person need to understand what conceptually happened in those pictures from
the input to the output, right? So, so we are looking at pre-verbal conceptual understanding
without language. How do you, how do you have a set of concepts that you can manipulate in your
head? And from a set of images of input and output, can you infer what is happening in those
images? Got it with concepts that are pre-language. Okay. So what does it mean for concept to be
pre-language? Yeah. Why, why so, why is language so important here? So I want to make a distinction
between concepts that are just learned from text by just, just feeding brute force text.
You can, you can start extracting things like, okay, cow is likely to be on grass.
So those kinds of things, you can extract purely from text. But that's kind of a simple
association thing rather than a concept as an abstraction of something that happens in the
real world, you know, in a, in a grounded way, that I can, I can simulate it in my mind and
connect it back to the real world. And you think kind of the visual, the visual world concepts
in the visual world are somehow lower level than just the language. The lower level kind of makes
it feel like, okay, that's like an unimportant, like it's more like, I would say the concepts
in the visual and the motor system and, you know, the, the concept learning system, which
if you cut off the language part, just, just what we learn by interacting with the world
and abstractions from that, that is a prerequisite for any real language understanding.
So you're, so you disagree with Chomsky, because he says language is at the bottom of everything.
No, I, I, yeah, I disagree with Chomsky completely on from universal grammar to.
So that was a paper in science beyond the recursive cortical network.
What, what other interesting problems are there, the open problems and brain inspired approaches
that you're thinking about? I mean, everything is open, right? Like, you know, no, no, no problem is
solved, solved, right? I think of perception as kind of the, the, the first thing that you have to
build, but the last thing that you will be actually solved. So, because if you do not build
perception system in the right way, you cannot build concept system in the right way.
So, so you have to build a perception system. However wrong that might be, you have to still
build that and learn concepts from there and then, you know, keep it trading. And, and finally,
perception will get solved fully when perception, cognition, language, all those things work together
finally. So what, and so great, we've talked a lot about perception, but then maybe on the
concept side and like common sense, or just general reasoning side, is there something,
some intuition you can draw from the brain about how we could do that?
So I have, I have this classic example I give. So suppose I give you a few sentences,
and then ask you a question following that sentence. This is a natural language processing
problem, right? So here goes. I'm telling you, Sally pounded a nail on the ceiling.
Okay. That's a sentence. Now I'm asking you a question. Was the nail horizontal or vertical?
Vertical. Okay. How did you answer that?
Well, I imagined Sally, it was kind of hard to imagine what the hell she was doing, but
but I imagined the visual of the whole situation.
Exactly. Exactly. So, so, so here, you know, I, I posed a question in natural language.
The answer to that question was you, you got the answer from actually simulating the scene.
Now I can go more and more detail about, okay, was Sally stand standing on something while
doing this, you know, could, could she have been standing on a light bulb to do this? You know,
I could, I could, I could ask more and more questions about this. And I can ask,
make you simulate the scene in, seen in more and more detail, right? Where is all that knowledge
that you're accessing stored? It is not in your language system. It is not, it was not just by
reading text, you got that knowledge. It is stored from the everyday experiences that you have had
from and by the, by the age of five, you, you have pretty much all of this, right? And
it is stored in your visual system, motor system, in a way such that it can be accessed through
language. Got it. I mean, right. So here, the language is just almost serves the query into
the whole visual cortex and that does the whole feedback thing. But I mean, it is all reasoning
kind of connected to the perception system in some way. You can do a lot of it, you know,
you can still do a lot of it by quick associations without having to go into the depth and most
of the time you will be right, right? You can just do quick associations, but I can easily
create tricky situations for you where that quick associations is wrong and you have to
actually run the simulation. So the figuring out the, how these concepts connect. Do you have a
good idea of how to do that? That's exactly what that's the one of the problems that we are working
on. And the way we are approaching that is basically saying, okay, you need to, so the
takeaway is that language is simulation control. And your perceptual plus motor system is building
a simulation of the world. And so that's basically the way we are approaching it. And
the first thing that we built was a controllable perceptual system. And we built a schema networks
which was a controllable dynamic system. Then we built a concept learning system that puts all
these things together into programs as abstractions that you can run and simulate. And now we are
taking the step of connecting it to language. And it will be very simple examples initially,
it will not be the GPT three like examples, but it will be grounded simulation based language.
And for like the querying would be like question answering kind of thing. Correct.
Correct. And it will be in some simple world initially on, you know, but it will be about,
okay, can the system connect the language and ground it in the right way and run the right
simulations to come up with the answer. And the goal is to try to do things that, for example,
GPT three couldn't do. Speaking of which, if we could talk about GPT three a little bit,
I think it's an interesting thought provoking set of ideas that open the eyes pushing forward.
I think it's good for us to talk about the limits and the possibilities in your network. So
in general, what are your thoughts about this recently released very large 175 billion parameter
language model? So I haven't directly evaluated it yet. From what I have seen on Twitter and
you know, other people evaluating it, it looks very intriguing. You know, I am very intrigued by
some of the properties it is displaying. And of course, the text generation part of that was
already evident in GPT two, you know, that it can generate coherent text over long distances.
That was, but of course, the weaknesses are also pretty visible in saying that, okay, it is not
really carrying a world state around. And, you know, sometimes you get sentences like,
I went up the hill to reach the valley or the thing that there's some, you know,
completely incompatible statements, or when you're traveling from one place to the other,
it doesn't take into account the time of travel, things like that. So those things,
I think will happen less in GPT three, because it is trained on even more data. And so it can do
even more longer distance coherence. But it will still have the fundamental limitations that it
doesn't have a world model. And it can't run simulations in its head to find whether something
is true in the world or not. Do you think within, so it's taking a huge amount of text from the
internet and forming a compressed representation, do you think in that could emerge something
that's an approximation of a world model, which essentially could be used for reasoning?
I mean, it's a, it's a, I'm not talking about GPT three, I'm talking about GPT four, five,
and GPT 10. Yeah, I mean, they will look more impressive than GPT three. So you can, if you
take that to the extreme, then a Markov chain of just first order, and if you go to, I'm taking
the other extreme, if you read Shannon's book, right? He has a model of English text, which is
based on first order Markov chains, second order Markov chains, third order Markov chains, and
saying that, okay, third order Markov chains look better than first order Markov chains. So does
that mean a first order Markov chain has a model of the world? Yes, it does. So, yes, in that level,
when you go higher order models, or more sophisticated structure in the model, like the
transformer networks have, yes, they have a model of the text world. But that is not a model of
the world. It's a model of the text world, and it will have interesting properties, and it will
be useful. But just scaling it up is not going to give us a GI or natural language understanding,
or meaning. The question is whether being forced to compress a very large amount of text,
forces you to construct things that are very much like, because the ideas of concepts and
meaning is a spectrum. Sure, yeah. So in order to form that kind of compression,
maybe it will be forced to figure out the abstractions, which look awfully a lot like
the kind of things that we think about as concepts, as world models, as common sense.
Is that possible? No, I don't think it is possible, because the information is not there.
The information is there behind the text, right? No, unless somebody has written on all the details
about how everything works in the world, to the absurd amounts like, okay, it is easier to
walk forward than backward, that you have to open the door to go out of the thing,
doctors wear underwear, unless all these things somebody has written down somewhere,
or somehow the program found it to be useful for compression from some other text.
The information is not there. So that's an argument that text is a
lot lower fidelity than the experience of our physical world.
Right, yeah. Pictures worth a thousand words, like that kind of thing.
Well, in this case, pictures aren't really... So the richest aspect of the physical world
isn't even just pictures, it's the interactivity of the world.
Exactly. Yeah, it's being able to interact. It's almost like...
It's almost like if you could interact... Well, maybe I agree with you that pictures were a
thousand words, but a thousand... You could say you could capture it with the GPTX.
So I wonder if there's some interactive element where a system could live in text world, where it
could be part of the chat, be part of talking to people. It's interesting. I mean, fundamentally...
So you're making a statement about the limitation of text.
Okay, so let's say we have a text corpus that includes basically every experience we could
possibly have. I mean, just a very large corpus of text and also interactive components.
I guess the question is whether the neural network architecture, these very simple transformers,
but if they had hundreds of trillions or whatever comes after a trillion parameters,
whether that could store the information needed. That's architecturally. Do you have thoughts about
the limitation on that side of things with neural networks?
I mean, so transformer is still a feed-forward neural network. It has a very interesting
architecture, which is good for text modeling and probably some aspects of video modeling,
but it is still a feed-forward architecture and...
You believe in the feedback mechanism, the recursion.
Oh, and also causality, being able to do counterfactual reasoning, being able to do
interventions, which is actions in the world. So all those things require different kinds of
models to be built. I don't think a transformer captures that family. It is very good at statistical
modeling of text and it will become better and better with more data, bigger models,
but that is only going to get so far. Finally, when you... So I had this joke on Twitter saying
that, hey, this is a model that has read all of quantum mechanics and theory of relativity,
and we are asking you to do text completion or we are asking you to solve simple puzzles.
When you have AGI, that's not what you ask a system to do. We'll ask the system to do
experiments and come up with hypothesis and revise the hypothesis based on evidence from
experiments, all those things. Those are the things that we want the system to do when we have
AGI, not solve simple puzzles. Like impressive demos, somebody generating a red button in HTML.
Which are all useful. There's no dissing the usefulness of it.
So I get, by the way, I'm playing a little bit of a devil's advocate. So calm down, internet.
So I'm curious almost in which ways will a dumb but large neural network will surprise us.
So it's kind of your... I completely agree with your intuition. It's just that I don't want to
dogmatically, like 100%, put all the chips there. We've been surprised so much. Even the current
GPT2 and 3 are so surprising. The self-play mechanisms of AlphaZero are really surprising.
The fact that reinforcement learning works at all to me is really surprising. The fact that
neural networks work at all is quite surprising. Given how nonlinear the space is, the fact that
it's able to find local minima that are all reasonable, it's very surprising. So I wonder
sometimes whether us humans just want it for AGI not to be such a dumb thing.
Because exactly what you're saying is the ideas of concepts and be able to
reason with those concepts and connect those concepts in hierarchical ways and then to be
able to have world models. I mean, just everything we're describing in human language in this poetic
way seems to make sense that that is what intelligence and reasoning are like. I wonder if
at the core of it it can be much dumber. Well, finally it is still connections and messages
passing over them. So in that way it's done. So I guess the recursion, the feedback mechanism
that does seem to be a fundamental kind of thing. The idea of concepts, also memory.
Correct. Having an episodic memory. That seems to be an important thing. So how do we get
memory? So yeah, we have another piece of work that which came out recently on how do you form
episodic memories and form abstractions from them. And we haven't figured out all the connections of
that to the overall cognitive architecture. But yeah, what are your ideas about how you could
have episodic memory? So at least it's very clear that you need to have two kinds of memory.
That's very, very clear. There are things that happen as statistical patterns in the world.
But then there is the one timeline of things that happen only once in your life.
And this day is not going to happen ever again. And that needs to be stored as a stream of
string. This is my experience. And then the question is about how do you take that experience and
connect it to the statistical part of it? How do you now say that, okay, I experienced this thing.
Now I want to be careful about similar situations. And so you need to be able to index that similarity
using your other giant statistics, you know, the model of the world that you have learned. Although
the situation came from the episode, you need to be able to index the other one. So the episodic
memory being implemented as an indexing over the other model that you're building.
So the memories remain and they're an index into this like the statistical thing that you formed.
Yeah, statistical or causal structural model that you built over time. So it's basically the idea is
that the hippocampus is just storing or sequencing in a set of pointers that happens over time.
And then whenever you want to reconstitute that memory and evaluate the different aspects of it,
whether it was good, bad, do I need to encounter the situation again, you need the cortex
to re-instantiate, to replay that memory. So how do you find that memory? Which direction is the
important direction? Both directions are again bi-directional. I mean, I guess how do you retrieve
the memory? So this is again hypothesis. We're making this up. So when you come to a new situation,
your cortex is doing inference over in the new situation. And then of course, hippocampus is
connected to different parts of the cortex. And you have this deja vu situation. Okay, I have seen
this thing before. And then in the hippocampus, you can have an index of, okay, this is when it
happened as a timeline. And then you can use the hippocampus to drive the similar timelines to say
now I am, rather than being driven by my current input stimuli, I am going back in time and rewinding
my experience from there, but putting back into the cortex. And then putting it back into the
cortex, of course affects what you're going to see next in your current situation.
Got it. Yeah, so that's the whole thing, having a world model and then yeah, connecting to the
perception. Yeah, it does seem to be that that's what's happening. It'd be on the neural network
side. It's interesting to think of how we actually do that. To have a knowledge base.
Yes, it is possible that you can put many of these structures into neural networks and we will find
ways of combining properties of neural networks and graphical models. So, I mean, it's already
started happening. Graph neural networks are kind of a merge between them. And there will be more
of that thing. So, but to me, it is the direction is pretty clearly in looking at biology and the
history of evolutionary history of intelligence. It is pretty clear that, okay, what needs to be
needed is more structure in the models and modeling of the world and supporting dynamic
inference. Well, let me ask you, there's a guy named Elon Musk. There's a company called Neuralink
and there's a general field called Brain-Computer Interfaces. It's kind of an interface between
your two loves, the brain and the intelligence. So, there's like very direct applications of
brain-computer interfaces for people with different conditions, more in the short term.
But there's also these sci-fi, futuristic kinds of ideas of AI systems being able to
communicate in a high bandwidth way with the brain, bi-directional. What are your thoughts
about Neuralink and BCI in general as a possibility? So, I think BCI is a cool
research area and in fact, when I got interested in brains initially, when I was enrolled at
Stanford and when I got interested in brains, it was through a brain-computer interface talk
that Krishna Shannoi gave. That's when I even started thinking about the problem. So,
it is definitely a fascinating research area and the applications are enormous. So,
there's a science fiction scenario of brains directly communicating. Let's keep that
aside for the time being. Even just the intermediate milestones that pursuing, which are very reasonable
as far as I can see, being able to control an external limb using direct connections from
the brain and being able to write things into the brain. So, those are all good steps to take
and they have enormous applications. People losing limbs, being able to control prosthetics,
quadriplegics, being able to control something. So, I'm therapeutics and I also know about
another company working in the space called Paradromics. It's based on a different
electrode array, but trying to attack some of the same problems. So, I think it's a very…
Also, surgery?
Correct, surgically implanted. So, yeah, I think of it as a very, very promising field,
especially when it is helping people overcome some limitations. Now, at some point, of course,
it will advance the level of being able to communicate.
How hard is that problem, do you think? So, okay, let's say we magically solve what I think is a
really hard problem of doing all of this safely. So, being able to connect electrodes and not just
thousands but like millions to the brain. Yeah, I think it's very, very hard because you also
do not know what will happen to the brain with that, right? In the sense of how does the brain
adapt to something like that? And it's… As we were learning, the brain is quite…
In terms of neuroplasticity, it's pretty malleable.
Correct.
So, it's going to adjust.
Correct.
So, the machine learning side, the computer side is going to adjust and then the brain is going to
adjust. Exactly. And then what soup does this land us into is…
The kind of hallucinations you might get from this.
That might be pretty intense.
Yeah.
Just connecting to all of Wikipedia.
It's interesting whether we need to be able to figure out the basic protocol of the brain's
communication schemes in order to get them to machine and the brain to talk because another
possibility is the brain actually just adjusts to whatever the heck the computer is doing.
Exactly. That's the way I think that… I find that to be a more promising way.
It's basically saying, okay, attach electrodes to some part of the cortex.
Maybe if it is done from birth, the brain will adapt. It says that that part is not
damaged. It was not used for anything. These electrodes are attached there.
And now you train that part of the brain to do this high bandwidth communication between
something else. And if you do it like that, then it is brain adapting to…
And of course, your external system is designed so that it is adaptable.
Just like we design computers or mouse, keyboard, all of them to be interacting with humans.
So, of course, that feedback system is designed to be human compatible.
But now it is not trying to record from the all of the brain and now two systems trying to
adapt to each other. It's the brain adapting into one way.
That's fascinating. The brain is connected to the internet.
Just imagine it's connecting it to Twitter and just taking that stream of information.
Yeah. But again, if we take a step back, I don't know what your intuition is.
I feel like that is not as hard of a problem as doing it safely. There's a huge barrier to
surgery because the biological system, it's a mush of weird stuff.
Correct. So that, the surgery part of it, biology part of it, the long-term repercussions part of
it. Again, I don't know what else will… We often find after a long time in biology that,
okay, that idea was wrong. So, people used to cut off the gland called the thymus or something.
And then they found that, oh, no, that actually causes cancer.
So, yeah. And then there's a subtle like millions of variables involved.
But this whole process, the nice thing, and just like again with Elon, just like colonizing Mars,
seems like a ridiculously difficult idea. But in the process of doing it, we might learn a lot
about the biology of the, the neurobiology of the brain, the neuroscience side of things.
It's like, if you want to learn something, do the most difficult version of it.
Yeah. See what you learned.
The intermediate steps that they are taking sounded always reasonable to me.
Yeah. It's great. Well, but like everything with Elon is the timeline seems insanely fast.
So, that's the only awful question. Well, we've been talking about cognition a little bit,
so like reasoning. We haven't mentioned the other C word, which is consciousness.
Do you ever think about that one? Is that useful at all in this whole context of what it takes to
create an intelligent reasoning being? Or is that completely outside of your, like the engineering
perspective of intelligence? So, it is not outside the realm, but it doesn't, on a day-to-day way,
you know, basis inform what we do. But it's more, so in many ways, the company name is connected to
this idea of consciousness. What's the company name?
Vicarious. So, Vicarious is the company name. And so, what does Vicarious mean?
At the first level, it is about modeling the world. And it is internalizing the external actions.
So, you interact with the world and learn a lot about the world. And now, after having learned
a lot about the world, you can run those things in your mind without actually having to act in
the world. So, you can run things vicariously, just in your brain. And similarly, you can
experience another person's thoughts by, you know, having a model of how that person works
and running their, you know, putting yourself in some other person's shoes. So, that is being
vicarious. Now, it's the same modeling apparatus that you're using to model the external world
or some other person's thoughts. You can turn it to yourself. You can, you know, if that same
modeling thing is applied to your own modeling apparatus, then that is what gives rise to
consciousness, I think. Well, that's more like self-awareness. There's the hard problem of
consciousness, which is, like, when the model becomes, when the model feels like something,
when the whole process is like, it's like, you really are in it. You feel like an entity in
this world, not just you know that you're an entity, but it feels like something to be that entity.
It, you know, and thereby, we attribute this, you know, then it starts to be where in something
that has consciousness can suffer, you start to have these kinds of things that we can reason about
that is much, much heavier. It seems like there's much greater cost to your decisions. And like,
mortality is tied up into that, like, the fact that these things end. Right. First of all,
I end at some point, and then other things end. And, you know, that somehow seems to be,
at least for us humans, a deep motivator. Yes. And that, you know, that idea of motivation in
general, we talk about goals in AI, but goals aren't quite the same thing as like the, our
mortality. It feels like, it feels like, first of all, humans don't have a goal. And they just kind
of create goals at different levels. They like make up goals, because we're terrified by the
mystery of the thing that gets us all. So we make these goals up. So we're like a goal
generation machine, as opposed to a machine which optimizes the trajectory towards a singular goal.
So it feels like that's an important part of cognition, that whole mortality thing.
Well, it is, it is a part of human cognition. But there is no reason for that mortality to
come to the equation for a artificial system, because we can copy the artificial system.
The problem with humans is that we can't, I can't clone you. I can't, you know, I can,
I can, even if I clone you as a, you know, the hardware, your experience that was stored in
your brain, your episodic memory, all those will not be captured in the, in the new clone.
So, but that's not the same with an AI system, right? So, but it's also possible that the,
the thing that you mentioned with us humans is actually fundamental, fundamental importance
for intelligence. So like the fact that you can copy an AI system, yeah, means that that AI system
is not yet an AGI. So like, so if you look at existence proof, yeah, if we reason, yeah,
based on existence proof with, you could say that it doesn't feel like death is a fundamental
property of an intelligent system, but we don't yet give me an example of an immortal
intelligent being. We don't have those. It could, it's very possible that, you know,
that's, that is a fundamental property of intelligence is a thing that has a deadline
for itself. So you can think of it like this. So suppose you invent a way to freeze people
for a long time, it's not dying, right? So, so you can be frozen and woken up thousands of years
from now. So it's no fear of death. Well, no, you're still, that's not, it's not about time,
it's about the knowledge that it's temporary. And the, that aspect of it, the finiteness of it,
I think creates a kind of urgency. Correct. For us, for humans.
Yeah, for humans. Yes. And that, that is part of our drives. But, and that's why I'm not too
worried about AI, you know, having motivations to kill all humans and those kinds of things. Why
just wait, you know? So why do you need to do that? I've never heard that before. That's a good,
that's a good point. Because yeah, it's just murder seems like a lot of work. It looks just
wait and wait it out. They'll probably hurt themselves. Let me ask you, people often kind of
wonder, world-class researchers such as yourself, what kind of books, technical fiction, philosophical,
were, had an impact on you in your life? And maybe ones you could possibly recommend
and that others read. Maybe if you have three books that pop in the minds.
Yeah. So I definitely liked Judea Poole's book, probabilistic reasoning and intelligent systems.
It's, it's a very deep technical book. But what I liked is that in, so there are many places where
you can learn about probabilistic graphical models from. But throughout this book, Judea Poole kind
of sprinkles his philosophical observations and, and he thinks about, connects us to how the brain
thinks and attentions and resources, all those things. So, so that whole thing makes it more
interesting to read. He emphasizes the importance of causality. So that was in his later book.
So this was the first book, probabilistic reasoning and intelligent systems. He mentions
causality, but he hadn't really sunk his teeth into like, you know, how do you actually formalize?
Got it. Yeah. And the second book, causality, 2000, the one in 2000, that one is really hard.
So I wouldn't recommend that. Oh, yeah. So that looks at the, like the mathematical, like his
model of do calculus. Yeah. It was pretty dense mathematically. Right. Right. Right. Right.
The book of why is definitely more enjoyable. Oh, for sure. Yeah. So, yeah. So I would,
I would recommend probabilistic reasoning in intelligent systems. Another book I liked was
one from Doug Hofstadter. This is a long time ago. He has a book, he had a book I think called,
it was called the Mind's Eye. It was probably Hofstadter and Daniel Dennett together.
Yeah. So, and I actually was, I bought that book. It's on my show. I haven't read it yet,
but I, I couldn't get an electronic version of it, which is annoying because I read everything
on Kindle. Oh, okay. I had to actually purchase the physical. It's like one of the only physical
books I have because anyway, there's a lot of people recommended it highly. So, yeah.
And the third one I would definitely recommend reading is, this is not a technical book. It is
history. It's called, it's the name of the book I think is Bishop's Boys. It's about
Wright Brothers and their, their, their path and how it was, it's, there are multiple books on
this topic and all of them are great. It's, it's fascinating how a flight was, you know, treated
us an unsolvable problem. And, and, and also, you know, what aspects did people emphasize?
You know, people thought, oh, it is all about just powerful engines, you know, just need to have
powerful lightweight engines. And so, you know, some people thought of it as how far can we just
throw the thing, you know, just throw it. Like a catapult. Yeah. So, so it's a very fascinating
and even after they made the invention, like, you know, people not believing it and
the social aspects of it. Yeah. The social aspect. You know, very fascinating.
Do you draw any parallels between, you know, birds fly? So there's the natural approach to,
to flight and then there's the engineered approach. Do you, do you see the same kind
of thing with the brain and are trying to engineer intelligence?
Yeah, it's, it's a good analogy to have. Of course, all analogies have their, you know,
you know, so, so, so people in, you know, AI often use airplanes as an example of, hey,
we didn't learn anything from birds. Look, but the funny thing is that, and the saying is
airplanes don't flap wings, right? This is what they say. The funny thing and the ironic thing
is that, that you don't need to flap to fly is something right with those found by observing
birds. Yeah. So they have in their notebook, like, you know, in some of these books, they show
their notebook drawings, right? They, they make detailed notes about buzzards, just soaring over
thermals. And they basically say, look, flapping is not the important, propulsion is not the
important problem to solve here. We want to solve control. And once you solve control,
propulsion will fall into place. All of, all of these are people, you know, they relate this
by observing birds. So beautifully put. That's actually brilliant. Because people do use that
analogy a lot. I'm going to have to remember that one. Do you have advice for people interested
in artificial intelligence, like young folks today? I talked to undergraduate students all the time,
interested in neuroscience, interesting in understanding how the brain works. Is there
advice you would give them about their career, maybe about their life in general? Sure. I think
every, you know, every piece of advice should be taken with a pinch of salt. Because, you know,
each person is different, their motivations are different. But I can, I can definitely say,
if your goal is to understand the brain from the angle of wanting to build one, you know, then
being an experimental neuroscientist might not be the way to go about it. A better way to pursue it
might be through computer science, electrical engineering, machine learning and AI. And of
course, you have to study up the neuroscience, but that you can do on your own. If you are more
attracted by finding something intriguing about discovering something intriguing about the brain,
then of course, it is better to be an experimentalist. So, find that motivation. What are you
interested in? And of course, find your strengths too. Some people are very good experimentalists
and they enjoy doing that. And it's interesting to see which department, if you're, if you're picking
in terms of like your education path, whether to go with like an MIT, it's a brain and computer,
no, the BCS, brain and cognitive sciences, yeah, or the CS side of things. And actually,
the brain folks, the neuroscience folks are more and more now embracing of the, you know,
learning TensorFlow and PyTorch, right? They see the power of trying to engineer ideas
that they get from the brain into and then explore how those could be used to create
intelligent systems. So, that might be the right department actually.
Yeah. So, this was a question in, you know, one of the Red Bull Neuroscience Institute
workshops that Jeff Hawkins organized almost 10 years ago. This question was put to a panel,
right? What should be the undergrad major you should take if you want to understand the brain?
And the majority opinion that one was electrical engineering.
Interesting. Because, I mean, I'm a doubly undergrad, so I got lucky in that way. But
it, I think it does have some of the right ingredients because you learn about circuits.
You learn about how you can construct circuits to, you know, approach, you know, do functions.
You learn about microprocessors. You learn information theory. You learn signal processing.
You learn continuous math. So, in that way, it's a good step to, if you want to go to
computer science or neuroscience, it's a good step.
The downside, you're more likely to be forced to use MATLAB.
One of the interesting things about, I mean, this is changing, the world is changing, but
like certain departments lagged on the programming side of things, on developing good,
good happenings as a software engineering. But I think that's more and more changing.
And students can take the answer of their own hands, like learn to program. I feel like everybody
should learn to program because it, like everyone in the sciences, because it empowers,
it puts the data at your fingertips. So, you can organize it. You can find all kinds of things
in the data. And then you can also, for the appropriate sciences, build systems that,
like, based on that. So, like, then engineer intelligence systems.
We already talked about mortality, so we hit a ridiculous point. But let me ask you the,
you know, one of the things about intelligence is it's goal driven.
And you study the brain. So, the question is, like, what's the goal that the brain is operating
under? What's the meaning of it all for us humans, in your view? What's the meaning of life?
The meaning of life is whatever you construct out of it. It's completely open.
It's open. So, there's nothing, like you mentioned, you like constraints. So,
what's, it's wide open. Is there some useful aspect that you think about in terms of, like,
the openness of it and just the basic mechanisms of generating goals
in studying cognition in the brain that you think about? Or is it just about,
because everything we've talked about, kind of, the perception system is to understand the
environment. That's like, to be able to, like, not die. Exactly. Like, not fall over and, like,
be able to, you don't think we need to think about anything bigger than that.
Yeah, I think so, because it's basically being able to understand the machinery of the world,
such that you can pursue whatever goals you want, right?
So, the machinery of the world is really ultimately what we should be striving to understand. The
rest is just, the rest is just whatever the heck you want to do, or whatever fun you're doing.
What is culturally popular, you know?
I think that's, that's beautifully put. I don't think there's a better way to end it. I'm so
honored that you would show up here and waste your time with me. It's been an awesome conversation.
Thanks so much for talking today. Oh, thank you so much. This was,
this was so much more fun than I expected. Thank you.
If you enjoyed this thing, subscribe on YouTube, review the Five Stars and Apple Podcast,
support on Patreon, or connect with me on Twitter, Alex Friedman, spelled yes, without the E,
just F-R-I-D-M-A-N. And now, let me leave you with some words from Marcus Aurelius.
You have power over your mind, not outside events. Realize this, and you will find strength.
Thank you for listening, and hope to see you next time.