This graph shows how many times the word ______ has been mentioned throughout the history of the program.
              
              
             
           
         
        
        
  
  The following is a conversation with Tomaso Poggio.
He's the professor at MIT and is a director of the Center
for Brains, Minds, and Machines.
Cited over 100,000 times, his work
has had a profound impact on our understanding
of the nature of intelligence in both biological
and artificial neural networks.
He has been an advisor to many highly impactful researchers
and entrepreneurs in AI, including
Demisus Habus of DeepMind, Amnon Shashwa of Mobileye,
and Christoph Koch of the Allen Institute for Brain Science.
This conversation is part of the MIT course
on artificial general intelligence
and the artificial intelligence podcast.
If you enjoy it, subscribe on YouTube, iTunes,
or simply connect with me on Twitter at Lex Freedman,
spelled F-R-I-D.
And now, here's my conversation with Tomaso Poggio.
You've mentioned that in your childhood,
you've developed a fascination with physics,
especially the theory of relativity.
And that Einstein was also a childhood hero to you.
What aspect of Einstein's genius, the nature of his genius,
do you think was essential for discovering
the theory of relativity?
You know, Einstein was a hero to me,
and I'm sure to many people, because he
was able to make, of course, a major, major contribution
to physics with simplifying a bit just a gedanken experiment,
a thought experiment, imagining communication
with lights between a stationary observer
and somebody on a train.
And I thought the fact that just with the force of his thought,
of his thinking, of his mind, it could get to something
so deep in terms of physical reality,
how time depend on space and speed.
It was something absolutely fascinating.
It was the power of intelligence, the power of the mind.
Do you think the ability to imagine, to visualize as he did,
as a lot of great physicists do, do you think that's
in all of us human beings, or is there something special to that
one particular human being?
I think, you know, all of us can learn and have,
in principle, similar breakthroughs.
There is lesson to be learned from Einstein.
He was one of five PhD students at ETA,
the Heidgenossische Technische Hochschule in Zurich, in physics.
And he was the worst of the five.
The only one who did not get an academic position
when he graduated, when he finished his PhD,
and he went to work, as everybody knows,
for the patent office.
So it's not so much that he worked for the patent office,
but the fact that obviously he was smart,
but he was not the top student, obviously
was the anti-conformist.
He was not thinking in the traditional way
that probably his teachers and the other students were doing.
So there is a lot to be said about trying to do the opposite
or something quite different from what other people are doing.
That's certainly true for the stock market.
Never buy if everybody's buying.
And also true for science.
Yes.
So you've also mentioned staying on the theme of physics
that you were excited at a young age
by the mysteries of the universe that physics could uncover.
Such, as I saw mentioned, the possibility of time travel.
So out of the box question I'll get to ask today,
do you think time travel is possible?
Well, it would be nice if it were possible right now.
In science, you never say no.
But your understanding of the nature of time.
Yeah.
It's very likely that it's not possible to travel in time.
We may be able to travel forward in time
if we can, for instance, freeze ourselves
or go on some spacecraft traveling
close to the speed of light.
But in terms of actively traveling,
for instance, back in time, I find probably very unlikely.
So do you still hold the underlying dream
of the engineering intelligence that we'll
build systems that are able to do such huge leaps
like discovering the kind of mechanism that
would be required to travel through time?
Do you still hold that dream or echoes of it
from your childhood?
Yeah.
I don't think there are certain problems
that probably cannot be solved, depending what you believe
about the physical reality.
Like maybe totally impossible to create energy from nothing
or to travel back in time, but about making
machines that can think as well as we do or better or more
likely, especially in the short and mid term,
help us think better, which is, in a sense,
is happening already with the computers we have.
And it will happen more and more.
But that I certainly believe.
And I don't see in principle why computers at some point
could not become more intelligent than we are,
although the word intelligence is a tricky one.
And one should discuss what I mean with that.
Intelligence, consciousness, words like love is all these
are very, need to be disentangled.
So you've mentioned also that you believe
the problem of intelligence is the greatest problem
in science, greater than the origin of life
and the origin of the universe.
You've also, in the talk, I've listened to said
that you're open to arguments against you.
So what do you think is the most captivating aspect
of this problem of understanding the nature
of intelligence?
Why does it captivate you as it does?
Well, originally, I think one of the motivations
that I had as a teenager when I was infatuated
with theory of relativity was really
that I found that there was the problem of time and space
and general relativity.
But there were so many other problems
of the same level of difficulty and importance
that I could, even if I were Einstein,
it was difficult to hope to solve all of them.
So what about solving a problem whose solution allowed
me to solve all the problems?
And this was what if we could find the key to an intelligence
10 times better or faster than Einstein?
So that's sort of seeing artificial intelligence
as a tool to expand our capabilities.
But is there just an inherent curiosity in you
and just understanding what it is in here that
makes it all work?
Yes, absolutely.
You're right.
So I was starting, I started saying
this was the motivation when I was a teenager.
But soon after, I think the problem of human intelligence
became a real focus of my science and my research.
Because I think for me, the most interesting problem
is really asking who we are.
It is asking not only a question about science,
but even about the very tool we are using to do science,
which is our brain.
How does our brain work?
From where does it come from?
What are its limitations?
Can we make it better?
And that in many ways is the ultimate question
that underlies this whole effort of science.
So you've made significant contributions
in both the science of intelligence
and the engineering of intelligence.
In a hypothetical way, let me ask,
how far do you think we can get in creating intelligence
systems without understanding the biological,
the understanding how the human brain creates intelligence?
Put another way, do you think we can build a strong AI system
without really getting at the core,
understanding the functional nature of the brain?
Well, this is a real difficult question.
We did solve problems like flying without really
using too much our knowledge about how birds fly.
It was important, I guess, to know that you could have things
heavier than air being able to fly, like birds.
But beyond that, probably, we did not learn very much.
Some, the brothers right did learn a lot of observation
about birds and designing their aircraft.
But you can argue we did not use much of biology
in that particular case.
Now, in the case of intelligence,
I think that it's a bit of a bet right now.
If you ask, OK, we all agree we'll get, at some point,
maybe soon, maybe later, to a machine
that is indistinguishable from my secretary,
say, in terms of what I can ask the machine to do.
I think we'll get there.
And now, the question is, you can ask people,
do you think we'll get there without any knowledge
about the human brain or the best way
to get there is to understand better the human brain?
OK, this is, I think, an educated bet
that different people with different backgrounds
will decide in different ways.
The recent history of the progress
in AI in the last, I would say, five years or 10 years
has been that the main breakthroughs,
the main recent breakthroughs, really start from neuroscience.
I can mention reinforcement learning as one,
is one of the algorithms at the core of AlphaGo,
which is the system that beat the kind of an official world
champion of Go, Lee Siddle, and two or three years ago in Seoul.
That's one.
And that started really with the work of Pavlov in 1900,
Marvin Miski in the 60s, and many other neuroscientists
later on.
And deep learning started, which is the core again of AlphaGo
and systems like autonomous driving systems for cars,
like the systems that Mobileye, which
is a company started by one of my ex-postdoc,
Hamano Shashua, did.
So that is the core of those things, and deep learning,
really the initial ideas in terms of the architecture
of these layered hierarchical networks
started with the work of Thorston Wiesel and David
Hubel at Harvard up the river in the 60s.
So recent history suggests that neuroscience played
a big role in these breakthroughs.
My personal bet is that there is a good chance
they continue to play a big role, maybe not
in all the future breakthroughs, but in some of them.
At least in inspiration.
At least in inspiration, absolutely, yes.
So you studied both artificial and biological neural networks.
You said these mechanisms that underlie deep learning
and reinforcement learning.
But there is nevertheless significant differences
between biological and artificial neural networks
as they stand now.
So between the two, what do you find
is the most interesting, mysterious, maybe even
beautiful difference as it currently
stands in our understanding?
I must confess that until recently I
found that the artificial networks too simplistic relative
to real neural networks.
But recently I've been started to think that, yes,
they're a very big simplification of what
you find in the brain.
But on the other hand, they are much closer
in terms of the architecture to the brain
than other models that we had, that computer science used
as model of thinking, which were mathematical logics,
you know, Lisp, Prologue, and those kind of things.
So in comparison to those, they're much closer to the brain.
You have networks of neurons, which
is what the brain is about.
And the artificial neurons in the models
are, as I said, caricature of the biological neurons.
But they're still neurons, single units
communicating with other units, something
that is absent in the traditional computer type models
of mathematics, reasoning, and so on.
So what aspect would you like to see in artificial neural
networks added over time as we try to figure out ways
to improve them?
So one of the main differences and problems
in terms of deep learning today, and it's not only
deep learning, and the brain, is the need for deep learning
techniques to have a lot of labeled examples.
You know, for instance, for ImageNet,
you have a training set, which is one million images,
each one labeled by some human in terms of which object is there.
And it's clear that in biology, a baby
may be able to see million of images
in the first years of life, but will not
have million of labels given to him or her by parents
or caretakers.
So how do you solve that?
You know, I think there is this interesting challenge
that today, deep learning and related techniques
are all about big data, big data,
meaning a lot of examples labeled by humans.
Whereas in nature, you have so that this big data is n
going to infinity.
That's the best, you know, n meaning labeled data.
But I think the biological world is more n going to 1.
A child can learn from a very small number
of labeled examples.
Like you tell a child, this is a car.
You don't need to say, like in ImageNet, you know,
this is a car, this is a car, this is not a car,
this is not a car, one million times.
So and of course, with AlphaGo and or at least AlphaZero
variants, because of the world of Go is so simplistic
that you can actually learn by yourself through self play.
You can play against each other.
And the real world, I mean, the visual system
that you've studied extensively is a lot more complicated
than the game of Go.
So on the comment about children, which
are fascinatingly good at learning new stuff,
how much of it do you think is hardware
and how much of it is software?
Yeah, that's a good, deep question.
In a sense, it's the old question of nurture and nature.
How much is in the gene and how much
is in the experience of an individual?
Obviously, it's both that play a role.
And I believe that the way evolution gives
put prior information, so to speak, hardwired
is not really hardwired.
But that's essentially an hypothesis.
I think what's going on is that evolution as almost
necessarily, if you believe in Darwin,
is very opportunistic.
And think about our DNA and the DNA of Drosophila.
Our DNA does not have many more genes than Drosophila.
The fly.
The fly, the fruit fly.
Now, we know that the fruit fly does not
learn very much during its individual existence.
It looks like one of these machinery
that it's really mostly, not 100%, but 95%
hardcoded by the genes.
But since we don't have many more genes than Drosophila,
evolution could encode in us a kind of general learning
machinery and then had to give very weak priors.
Like, for instance, let me give a specific example, which
is recent to work by a member of our Center for Brains, Mines,
and Machines.
We know because of work of other people in our group
and other groups that there are cells
in a part of our brain, neurons, that are tuned to faces.
They seem to be involved in face recognition.
Now, this face area seems to be present in young children
and adults.
And one question is there from the beginning.
Is hardwired by evolution?
Or somehow it's learned very quickly?
So what's your, by the way, a lot of the questions I'm asking,
the answer is we don't really know.
But as a person who has contributed
some profound ideas in these fields,
you're a good person to guess at some of these.
So of course, there's a caveat before a lot of the stuff
we talk about.
But what is your hunch?
Is the part of the brain that seems
to be concentrated on face recognition?
Are you born with that?
Or you just is designed to learn that quickly,
like the face of the mother and so on?
My hunch, my bias was the second one, learned very quickly.
And it turns out that Marge Livingstone at Harvard
has done some amazing experiments in which she raised
baby monkeys depriving them of faces
during the first weeks of life.
So they see technicians, but the technicians have a mask.
Yes.
And so when they looked at the area in the brain
of these monkeys that were usually you find faces,
they found no face preference.
So my guess is that what evolution does in this case
is there is a plastic area, which is plastic, which
is kind of predetermined to be imprinted very easily.
But the command from the gene is not
a detailed circuitry for a face template.
Could be.
But this will require probably a lot of bits.
You have to specify a lot of connection
of a lot of neurons.
Instead, the command from the gene
is something like imprint, memorize
what you see most often in the first two weeks of life,
especially in connection with food and maybe nipples.
I don't know.
Right.
Well, source of food.
And so in that area is very plastic at first and it solidifies.
It'd be interesting if a variant of that experiment
would show a different kind of pattern associated
with food than a face pattern, whether that could stick.
There are indications that during that experiment
where the monkey saw quite often where
the blue gloves of the technicians that were giving
to the baby monkeys the milk.
And some of the cells, instead of being face sensitive
in that area, are hand sensitive.
That's fascinating.
Can you talk about what are the different parts of the brain
and in your view sort of loosely
and how do they contribute to intelligence?
Do you see the brain as a bunch of different modules
and they together come in the human brain to create intelligence
or is it all one mush of the same kind
of fundamental architecture?
Yeah, that's an important question.
And there was a phase in neuroscience back in the 1950s
or so in which it was believed for a while
that the brain was equipotential.
This was the term.
You could cut out a piece and nothing special happened
apart a little bit less performance.
There was a surgeon, Lashley, who did a lot of experiments
of this type with mice and rats and concluded
that every part of the brain was essentially
equivalent to any other one.
It turns out that that's really not true.
There are very specific modules in the brain, as you said.
And people may lose the ability to speak
if you have a stroke in a certain region or may lose control
of their legs in another region or so they're very specific.
The brain is also quite flexible and redundant.
So often it can correct things and take over functions
from one part of the brain to the other.
But really there are specific modules.
So the answer that we know from this old work, which
was basically based on lesions, either on animals
or very often there was a mine of very interesting data
coming from the war, from different types of injuries
that soldiers had in the brain.
And more recently, functional MRI,
which allow you to check which part of the brain
are active when you are doing different tasks,
as you can replace some of this.
You can see that certain parts of the brain
are involved, are active in certain tasks.
In certain tasks.
Yeah, that's right.
But taking a step back to that part of the brain
that discovers, that specializes in the face
and how that might be learned, what's your intuition behind?
Is it possible that from a physicist's perspective,
when you get lower and lower, that it's all the same stuff
and it just, when you're born, it's plastic and it quickly
figures out this part is going to be about vision,
this is going to be about language,
this is about common sense reasoning.
Do you have an intuition that that kind of learning
is going on really quickly or is it really
kind of solidified in hardware?
That's a great question.
So there are parts of the brain like the cerebellum
or the hippocampus that are quite different from each other.
They clearly have different anatomy,
different connectivity.
Then there is the cortex, which is the most developed part
of the brain in humans.
And in the cortex, you have different regions
of the cortex that are responsible for vision,
for audition, for motor control, for language.
Now, one of the big puzzles of this
is that in the cortex, it looks like it
is the same in terms of hardware,
in terms of type of neurons and connectivity
across these different modalities.
So for the cortex, letting aside these other parts
of the brain like spinal cord, hippocampus, cerebellum,
and so on, for the cortex, I think
your question about hardware and software and learning
and so on, I think it's rather open.
And I find it very interesting for us
to think about an architecture, computer architecture, that
is good for vision, and at the same time,
it's good for language.
Seems to be so different problem areas
that you have to solve.
But the underlying mechanism might be the same.
And that's really instructive for artificial intelligence.
For artificial neural networks.
So we've done a lot of great work in vision,
in human vision, computer vision.
And you mentioned the problem of human vision
is really as difficult as the problem of general intelligence.
And maybe that connects to the cortex discussion.
Can you describe the human visual cortex
and how the humans begin to understand the world
through the raw sensory information?
So what's, for folks who are not familiar,
especially on the computer vision side,
we don't often actually take a step back,
except saying with a sentence or two
that one is inspired by the other.
What is it that we know about the human visual cortex?
That's interesting.
So we know quite a bit at the same time.
We don't know a lot.
But the bit we know, in a sense, we
know a lot of the details and many we don't know.
And we know a lot of the top level,
the answer to top level question.
But we don't know some basic ones,
even in terms of general neuroscience forgetting vision.
Why do we sleep?
It's such a basic question.
And we really don't have an answer to that.
So taking a step back on that.
So sleep, for example, is fascinating.
Do you think that's a neuroscience question?
Or if we talk about abstractions,
what do you think is an interesting way
to study intelligence, or most effective,
on the levels of abstraction?
Is it chemical?
Is it biological?
Is it electrophysical?
Mathematical?
As you've done a lot of excellent work on that side.
Which psychology?
Which level of abstraction do you think?
Well, in terms of levels of abstraction,
I think we need all of them.
It's like if you ask me, what does it mean
to understand a computer?
That's much simpler.
But in a computer, I could say, well,
understand how to use PowerPoint.
That's my level of understanding a computer.
It has reasonable.
It gives me some power to produce slides and beautiful slides.
And now somebody else says, well,
I know all the transistor work that are inside the computer.
I can write the equation for transistor and diodes
and circuits, logical circuits.
And I can ask this guy, do you know how to operate?
PowerPoint, no idea.
So do you think if we discovered computers walking
amongst us, full of these transistors,
that are also operating under windows and have PowerPoint,
do you think it's digging in a little bit more,
how useful is it to understand the transistor in order
to be able to understand PowerPoint
in these higher level intelligent processes?
So I think in the case of computers,
because they were made by engineers,
by us, this different level of understanding
are rather separate on purpose.
They are separate modules so that the engineer that
designed the circuit for the chips
does not need to know what is inside PowerPoint.
And somebody can write to the software
translating from one to the other.
So in that case, I don't think understanding the transistor
help you understand PowerPoint or very little.
If you want to understand the computer, this question,
I would say you have to understand it at different levels
if you really want to build one.
But for the brain, I think these levels
of understanding, so the algorithms,
which kind of computation, the equivalent PowerPoint,
and the circuits, the transistors,
I think they are much more intertwined with each other.
There is not a neatly level of the software
separate from the hardware.
And so that's why I think in the case of the brain,
a problem is more difficult and more than for computers
requires the interaction, the collaboration
between different types of expertise.
The brain is a big hierarchical mess
that you can't just disentangle levels.
I think you can, but it's much more difficult.
And it's not completely obvious.
And as I said, I think it's one of the, personally,
I think it's the greatest problem in science.
So I think it's fair that it's difficult.
That's a difficult one.
That said, you do talk about compositionality
and why it might be useful.
And when you discuss why these neural networks
in our artificial or biological sense learn anything,
you talk about compositionality.
There's a sense that nature can be disentangled.
Well, all aspects of our cognition
could be disentangled a little to some degree.
So why do you think, first of all,
how do you see compositionality
and why do you think it exists at all in nature?
I spoke about, I use the term compositionality
when we looked at deep neural networks,
multi-layers, and trying to understand
when and why they are more powerful
than more classical one-layer networks
like linear classifier, kernel machines, so-called.
And what we found is that in terms of approximating
or learning or representing a function,
a mapping from an input to an output,
like from an image to the label in the image,
if this function has a particular structure,
then deep networks are much more powerful
than shallow networks to approximate
the underlying function.
And the particular structure is a structure of compositionality.
If the function is made up of functions of function,
so that you need to look on,
when you are interpreting an image,
classifying an image,
you don't need to look at all pixels at once,
but you can compute something from small groups of pixels
and then you can compute something
on the output of this local computation and so on.
It is similar to what you do when you read a sentence.
You don't need to read the first and the last letter,
but you can read syllables, combine them in words,
combine the words in sentences.
So this is this kind of structure.
So that's as part of a discussion
of why deep neural networks
may be more effective than the shallow methods.
And is your sense for most things
we can use neural networks for,
those problems are going to be compositional in nature,
like language, like vision.
How far can we get in this kind of way?
Right.
So here is almost philosophy.
Well, let's go there.
Yeah, let's go there.
So friend of mine, Max Tegmark, who is a physicist at MIT.
I've talked to him on this thing.
Yeah, and he disagrees with you, right?
A little bit.
Yeah, we agree on most,
but the conclusion is a bit different.
His conclusion is that for images, for instance,
the compositional structure of this function
that we have to learn or to solve these problems
comes from physics,
comes from the fact that you have local interactions
in physics between atoms and other atoms,
between particle of matter and other particles,
between planets and other planets,
between stars and others.
It's all local.
And that's true,
but you could push this argument a bit further.
Not this argument, actually.
You could argue that, you know,
maybe that's part of the truth,
but maybe what happens is kind of the opposite,
is that our brain is wired up as a deep network.
So it can learn, understand, solve problems
that have this compositional structure.
And they cannot solve problems
that don't have this compositional structure.
So the problems we are accustomed to,
we think about, we test our algorithms on,
are this compositional structure
because our brain is made up.
And that's, in a sense, an evolutionary perspective
that we've, so the ones that weren't dealing
with the compositional nature of reality died off?
Yes, but also could be, maybe the reason
why we have this local connectivity in the brain,
like simple cells in cortex looking only
at the small part of the image, each one of them,
and then other cells looking at the small number
of these simple cells and so on.
The reason for this may be purely
that it was difficult to grow long range connectivity.
So suppose it's, you know, for biology,
it's possible to grow short range connectivity
but not long range also because there is a limited
number of long range that you can.
And so you have this limitation from the biology.
And this means you build a deep convolutional network.
This would be something like a deep convolutional network.
And this is great for solving certain class of problems.
These are the ones we find easy and important for our life.
And yes, they were enough for us to survive.
And you can start a successful business
on solving those problems with Mobileye.
Driving is a compositional problem.
So on the learning task, I mean, we don't know much
about how the brain learns in terms of optimization.
But so the thing that's stochastic gradient descent
is what artificial neural networks use for the most part
to adjust the parameters in such a way that it's able
to deal based on the labeled data,
it's able to solve the problem.
So what's your intuition about why it works at all?
How hard of a problem it is to optimize
a neural network, artificial neural network.
Is there other alternatives?
Yeah, just in general, your intuition is behind
this very simplistic algorithm.
That seems to do pretty good.
Surprising.
Yes, yes.
So I find neuroscience, the architecture of cortex
is really similar to the architecture of deep networks.
So there is a nice correspondence there
between the biology and this kind of local connectivity
hierarchical architecture.
The stochastic gradient descent, as you said,
is a very simple technique.
It seems pretty unlikely that biology could do that
from what we know right now about cortex and neurons
and synapses.
So it's a big question open whether there are other
optimization learning algorithms
that can replace stochastic gradient descent.
And my guess is yes,
but nobody has found yet a real answer.
I mean, people are trying, still trying,
and there are some interesting ideas.
The fact that stochastic gradient descent
is so successful, this has become clearly
it's not so mysterious.
And the reason is that it's an interesting fact
is a change in a sense in how people think about statistics.
And this is the following is that typically
when you had data and you had say a model with parameters,
you are trying to fit the model to the data,
you know, to fit the parameter.
Typically the kind of kind of crowd wisdom type ideas,
you should have at least, you know, twice the number
of data than the number of parameters.
Maybe 10 times is better.
Now, the way you train neural network these days
is that they have 10 or 100 times more parameters
than data.
Exactly the opposite.
And which, you know, it has been one of the puzzles
about the neural networks.
How can you get something that really works
when you have so much freedom in-
From that little data you can generalize somehow.
Right, exactly.
Do you think the stochastic nature of it
is essential to randomness?
So I think we have some initial understanding
why this happens.
But one nice side effect of having this over parameterization
more parameters than data is that when you look
for the minima of a loss function,
like stochastic degree of descent is doing,
you find I made some calculations based on
some old basic theorem of algebra called Bezut theorem
that gives you an estimate of the number of solution
of a system of polynomial equation.
Anyway, the bottom line is that there are probably
more minima for a typical deep networks
than atoms in the universe.
Just to say there are a lot.
Because of the over parameterization.
Yes.
More global minimum, zero minimum, good minimum.
So it's not too soon.
More global minimum.
Yeah, a lot of them.
So you have a lot of solutions.
So it's not so surprising that you can find them
relatively easily.
And this is because of the over parameterization.
The over parameterization sprinkles
that entire space with solutions that are pretty good.
Yeah.
And so it's not so surprising, right?
It's like, if you have a system of linear equation
and you have more unknowns than equations,
then you have, we know,
you have an infinite number of solutions.
And the question is to pick one.
That's another story.
But you have an infinite number of solutions.
So there are a lot of value of your unknowns
that satisfy the equations.
But it's possible that there's a lot of those solutions
that aren't very good.
So what's surprising is that they're pretty good.
So that's a question.
Why can you pick one that generalizes one?
Yeah.
But that's a separate question with separate answers.
Yeah.
One theorem that people like to talk about
that kind of inspires imagination
of the power of neural networks is the universality,
a universal approximation theorem
that you can approximate any computable function
with just a finite number of neurons
in a single hidden layer.
Do you find this theorem one surprising?
Do you find it useful, interesting, inspiring?
No, this one, I never found it very surprising.
It was known since the 80s since I entered the field
because it's basically the same as Viastras theorem
which says that I can approximate any continuous function
with a polynomial of sufficiently,
with a sufficient number of terms, monomials.
Yeah.
It's basically the same and the proofs are very similar.
So your intuition was there was never any doubt
that neural networks, in theory,
could be very strong approximations.
Right.
The question, the interesting question is that
if this theorem says you can approximate fine,
but when you ask how many neurons, for instance,
or in the case of polynomial, how many monomials,
I need to get a good approximation.
Then it turns out that that depends
on the dimensionality of your function,
how many variables you have.
But it depends on the dimensionality of your function
in a bad way.
It's, for instance, suppose you want an error
which is no worse than 10% in your approximation.
You come up with a network that approximates your function
within 10%.
Then it turns out that the number of units you need
are in the order of 10 to the dimensionality, d,
how many variables.
So if you have two variables,
d is two and you have 100 units and okay.
But if you have, say, 200 by 200 pixel images,
now this is 40,000, whatever.
We again go to the size of the universe pretty quickly.
Exactly, 10 to the 40,000 or something.
Yeah.
And so this is called the curse of dimensionality,
not quite appropriately.
And the hope is with the extra layers
you can remove the curse.
What we proved is that if you have deep layers
or hierarchical architecture with the local connectivity
of the type of convolutional deep learning,
and if you're dealing with a function
that has this kind of hierarchical architecture,
then you avoid completely the curse.
You've spoken a lot about supervised deep learning.
Yeah.
What are your thoughts, hopes, views
on the challenges of unsupervised learning
with GANs, with generative adversarial networks?
Do you see those as distinct?
The power of GANs, do you see those as distinct
from supervised methods in neural networks
or are they really all in the same representation ballpark?
GANs is one way to get estimation of probability densities,
which is a somewhat new way
that people have not done before.
I don't know whether this will really play
an important role in intelligence,
or it's interesting.
I'm less enthusiastic about it than many people in the field.
I have the feeling that many people in the field
are really impressed by the ability
of producing realistic looking images
in this generative way.
Which describes the popularity of the methods,
but you're saying that while that's exciting and cool
to look at, it may not be the tool that's useful for it.
So you describe it kind of beautifully.
Current supervised methods go N to infinity
in terms of number of labeled points,
and we really have to figure out how to go to N to one.
And you're thinking GANs might help,
but they might not be the right.
I don't think for that problem,
which I really think is important.
I think they may help, they certainly have applications,
for instance, in computer graphics.
And I did work long ago,
which was a little bit similar in terms of saying,
okay, I have a network and I present images
and I can input its images and output is,
for instance, the pose of the image,
you know, a face, how much is smiling,
is rotated 45 degrees or not.
What about having a network that I train
with the same data set,
but now I invert input and output.
Now the input is the pose or the expression,
a number, certain numbers,
and the output is the image, and I train it.
And we did pretty good interesting results
in terms of producing very realistic looking images.
It was, you know, less sophisticated mechanism,
but the output was pretty less than GANs,
but the output was pretty much of the same quality.
So I think for computer graphics type application,
yeah, definitely GANs can be quite useful
and not only for that,
for, but for, you know,
helping, for instance, on this problem of unsupervised
example of reducing the number of labeled examples.
I think people, it's like they think they can get out
more than they put in, you know, it's...
There's no free lunches.
Yeah, right.
So what do you think, what's your intuition?
How can we slow the growth event to infinity
in supervised, and to infinity in supervised learning?
So for example, Mobileye has very successfully,
I mean, essentially annotated large amounts of data
to be able to drive a car.
Now, one thought is, so we're trying to teach machines,
school of AI, and we're trying to,
so how can we become better teachers, maybe?
That's one way.
No, you're, you know, I like that because one,
again, one caricature of the history of computer science,
you could say, is, it begins with programmers,
expensive, continue with the labelers, cheap,
and the future will be schools like we have for kids.
Yeah.
Currently, the labeling methods,
we're not selective about which examples
we teach networks with.
So I think the focus of making one,
networks that learn much faster
is often on the architecture side,
but how can we pick better examples with which to learn?
Do you have intuitions about that?
Well, that's part of the problem,
but the other one is, you know, if we look at biology,
a reasonable assumption, I think, is,
in the same spirit as I said,
evolution is opportunistic and has weak priors.
You know, the way I think the intelligence
of a child, a baby may develop is,
by bootstrapping weak priors from evolution.
For instance, in, you can assume that you have
most organisms, including human babies,
built in some basic machinery to detect motion
and relative motion.
And in fact, there is, you know,
we know all insects from fruit flies to other animals,
they have this.
Even in the readiness of, in the very peripheral part,
it's very conserved across species,
something that evolution discovered early.
It may be the reason why babies tend to look
in the first few days to moving objects
and not to not moving objects.
Now moving objects means, okay,
they're attracted by motion,
but motion also means that motion gives
automatic segmentation from the background.
So because of motion boundaries,
you know, either the object is moving
or the eye of the baby is tracking the moving object
and the background is moving, right?
Yeah, so just purely on the visual characteristics
of the scene, that seems to be the most useful.
Right, so it's like looking at an object
without background, it's ideal for learning the object,
otherwise it's really difficult
because you have so much stuff.
So suppose you do this at the beginning, first weeks,
then after that, you can recognize the object.
Now they are imprinted, the number one,
even in the background, even without motion.
So that's, by the way, I just wanna ask
on the object recognition problem,
so there is this being responsive to movement
and doing edge detection, essentially.
What's the gap between being effectively,
effectively visually recognizing stuff,
detecting where it is, and understanding the scene?
Is this a huge gap in many layers or is it close?
No, I think that's a huge gap.
I think present algorithm with all the success
that we have and the fact that are a lot of very useful,
it's, I think we are in a golden age for applications
of low level vision and low level speech recognition
and so on, you know, Alexa and so on.
There are many more things of similar level to be done,
including medical diagnosis and so on,
but we are far from what we call understanding
of a scene, of language, of actions, of people.
That is, despite the claims, that's, I think, very far.
We're a little bit off, so in popular culture
and among many researchers, some of which I've spoken with,
the Sue Russell and Elon Musk, in and out of the AI field,
there's a concern about the existential threat of AI.
Yeah.
And how do you think about this concern?
And is it valuable to think about large scale,
long-term unintended consequences
of intelligent systems we try to build?
I always think it's better to worry first, you know,
early rather than late.
So worry is good.
Yeah, I'm not against worrying at all.
Personally, I think that it will take a long time
before there is real reason to be worried.
But as I said, I think it's good to put in place
and think about possible safety against,
what I find a bit misleading are things like
that have been said by people, I know, like Elon Musk
and what is Bostrom in particular,
what is his first name?
Nick Bostrom, right.
You know, and a couple of other people that, for instance,
AI is more dangerous than nuclear weapons.
Right.
I think that's really wrong.
And that can be misleading, right?
Because in terms of priority,
we should still be more worried about nuclear weapons
and, you know, what people are doing about it
and so on than AI.
And you're talking about them as Sabes
and yourself saying that you think it'll be
about a hundred years out before we have
a general intelligent system that's on par
with the human being.
Do you have any updates for those predictions?
Well, I think he said,
he said 20, I think.
He said 20, right.
This was a couple of years ago.
I have not asked him again.
So I should have.
Your own prediction.
What's your prediction about when you'll be truly surprised
and what's the confidence interval on that?
You know, it's so difficult to predict the future
and even the present.
It's pretty hard to predict.
But I would be, as I said, this is completely,
it would be more like Rod Brooks.
I think he's about 200 years.
200 years.
When we have this kind of AGI system,
artificial general intelligent system,
you're sitting in a room with her, him, it.
Do you think it will be the underlying design
of such a system is something we'll be able to understand?
It'll be simple.
Do you think it'll be explainable?
Understandable by us?
Your intuition, again,
we're in the realm of philosophy a little bit.
Well, probably no.
But it, again, it depends
what you really mean for understanding.
So I think,
you know, we don't understand
how deep networks work.
I think we are beginning to have a theory now.
But in the case of deep networks,
or even in the case of the simple,
simpler kernel machines or linear classifier,
we really don't understand the individual units or so.
We, but we understand, you know,
what the computation and the limitations
and the properties of it are.
It's similar to many things.
You know, we, what does it mean
to understand how a fusion bomb works?
How many of us, you know,
many of us understand the basic principle
and some of us may understand deeper details.
In that sense, understanding is,
as a community, as a civilization,
can we build another copy of it?
Okay.
And in that sense,
do you think there'll be,
there'll need to be some evolutionary component
where it runs away from our understanding?
Or do you think it could be engineered from the ground up?
The same way you go from the transistor to PowerPoint.
All right, so many years ago,
this was actually 40, 41 years ago,
I wrote a paper with David Maher,
who was one of the founding father
of computer vision, computational vision.
I wrote a paper about levels of understanding,
which is related to the question we discussed earlier
about understanding PowerPoint, understanding transistors
and so on.
And, you know, in that kind of framework,
we had a level of the hardware
and the top level of the algorithms.
We did not have learning.
Recently, I updated adding levels.
And one level I added to those three was learning.
So, and you can imagine,
you could have a good understanding
of how you construct learning machine, like we do.
But being unable to describe in detail
what the learning machines will discover, right?
Now, that would be still a powerful understanding
if I can build a learning machine,
even if I don't understand in detail every time
it learns something.
Just like our children,
if they start listening to a certain type of music,
I don't know, Miley Cyrus or something,
you don't understand why they came
to that particular preference,
but you understand the learning process.
That's very interesting.
So,
unlearning for systems to be part of our world,
it has a certain,
one of the challenging things that you've spoken about
is learning ethics, learning morals.
And how hard do you think is the problem
of, first of all, humans understanding our ethics?
What is the origin on the neural and low level of ethics?
What is it at the higher level?
Is it something that's learnable
for machines in your intuition?
I think, yeah, ethics is learnable, very likely.
I think it's one of these problems where,
I think understanding the neuroscience of ethics,
you know, people discuss,
there is an ethics of neuroscience.
Yeah, yes.
You know, how a neuroscientist should or should not behave.
I can think of a neurosurgeon and the ethics
that Ruri has to be, or Hishi has to be.
But I'm more interested on the neuroscience of ethics.
You're blowing my mind right now.
The neuroscience of ethics, it's very meta.
Yeah, and I think that would be important to understand
also for being able to design machines
that are ethical machines in our sense of ethics.
And you think there is something in neuroscience,
there's patterns, tools in neuroscience
that could help us shed some light on ethics,
or is it mostly on the psychologist's sociology
in which higher level?
No, there is psychology, but there is also,
in the meantime, there is evidence,
FMRI, of specific areas of the brain
that are involved in certain ethical judgment.
And not only this, you can stimulate those areas
with magnetic fields and change the ethical decisions.
Yeah.
Okay. Wow.
So that's work by a colleague of mine, Rebecca Sacks,
and there is other researchers doing similar work.
And I think, you know, this is the beginning,
but ideally at some point,
we'll have an understanding of how this works
and why it evolved, right?
The big why question.
Yeah, it must have some purpose.
Yeah, obviously it has, you know,
some social purposes, probably.
If neuroscience holds the key
to at least eliminate some aspect of ethics,
that means it could be a learnable problem.
Yeah, exactly.
And as we're getting into harder and harder questions,
let's go to the hard problem of consciousness.
Is this an important problem for us to think about
and solve on the engineering of intelligence side
of your work, of our dream?
You know, it's unclear.
So, you know, again, this is a deep problem,
partly because it's very difficult
to define consciousness.
And there is a debate among neuroscientists about whether consciousness
and philosophers, of course,
whether consciousness is something that requires flesh and blood,
so to speak, or could be, you know,
that we could have silicon devices that are conscious
or up to a statement like everything has some degree
of consciousness and some more than others.
This is like Giulio Tognoni and Phi.
We just recently talked to Christophe Cove.
Christophe was my first graduate student.
Do you think it's important
to illuminate aspects of consciousness
in order to engineer intelligence systems?
Do you think an intelligence system
would ultimately have consciousness?
Are they interlinked?
You know, most of the people working
in artificial intelligence, I think, would answer,
we don't strictly need consciousness
to have an intelligence system.
That's sort of the easier question,
because it's a very engineering answer to the question.
It has a Turing test, we don't need consciousness.
But if you were to go, do you think it's possible
that we need to have that kind of self-awareness?
We may, yes.
So for instance, I personally think that when test a machine
or a person in a Turing test
and an extended Turing test,
I think consciousness is part of what we require in that test,
you know, implicitly, to say that this is intelligent.
Christophe disagrees.
So he does, despite many other romantic notions he holds,
he disagrees with that one.
Yes, that's right.
So, you know, we'll see.
Do you think, as a quick question,
Ernest Becker's fear of death,
do you think mortality and those kinds of things
are important for consciousness and for intelligence,
the finiteness of life, finiteness of existence,
or is that just a side effect of evolutionary side effect
that's useful for natural selection?
Do you think this kind of thing that this interview
is going to run out of time soon?
Our life will run out of time soon?
Do you think that's needed to make this conversation good
and life good?
You know, I never thought about it.
It's a very interesting question.
I think Steve Jobs in his commencement speech at Stanford
argued that, you know, having a finite life was important
for stimulating achievements, so it was a different.
Yeah, live every day like it's your last, right?
Yeah, yeah.
So, rationally, I don't think strictly
you need mortality for consciousness, but...
Who knows?
They seem to go together in our biological system, right?
Yeah, yeah.
You've mentioned before, and the students
are associated with, AlphaGo mobilized
the big recent success stories in AI.
And I think it's captivated the entire world of what AI can do.
So what do you think will be the next breakthrough?
What's your intuition about the next breakthrough?
Of course, I don't know where the next breakthrough is.
I think that there is a good chance, as I said before,
that the next breakthrough would also
be inspired by, you know, neuroscience.
But which one?
I don't know.
And there's, so MIT has this quest for intelligence,
and there's a few moon shots, which in that spirit,
which ones are you excited about?
Which projects kind of?
Well, of course, I'm excited about one
of the moon shots, which is our center for brains, minds,
and machines, the one which is fully funded by NSF.
And it is about visual intelligence.
And that one is particularly about understanding?
Visual intelligence, or the visual cortex,
and visual intelligence in the sense of how we look around
ourselves and understand the world around ourselves,
you know, meaning what is going on,
how we could go from here to there without hitting obstacles.
You know, whether there are other agents, people
in the environment, these are all things that we perceive
very quickly.
And it's something actually quite close to being conscious,
not quite.
But there is this interesting experiment
that was run at Google X, which is, in a sense,
is just a virtual reality experiment,
but in which they had subjects sitting,
say, in a chair with goggles, like Oculus and so on, earphones.
And they were seeing through the eyes of a robot
nearby, two cameras, microphones for receiving.
So their sensory system was there.
And the impression of all the subjects, very strong,
they could not shake it off, was that they
were where the robot was.
They could look at themselves from the robot
and still feel they were where the robot is.
They were looking at their body.
Their self had moved.
So some aspect of seeing understanding
has to have ability to place yourself,
have a self-awareness about your position in the world
and what the world is.
So we may have to solve the hard problem of consciousness
to solve it.
On their way, yes.
It's quite a moonshot.
So you've been an advisor to some incredible minds,
including Demis Osabes, Christof Koch, Amna Shashwar,
like you said, all went on to become seminal figures
in their respective fields.
From your own success as a researcher
and from perspective as a mentor of these researchers,
having guided them in the way of advice,
what does it take to be successful in science
and engineering careers?
Whether you're talking to somebody in their teens, 20s,
and 30s, what does that path look like?
It's curiosity and having fun.
And I think it's important also having fun
with other curious minds.
It's the people you surround with too, so fun and curiosity.
Is there, mentioned Steve Jobs, is there also
an underlying ambition that's unique that you saw
or is it really does boil down to insatiable curiosity
and fun?
Well, of course, it's been curious in an active and ambitious
way, yes, definitely.
But I think sometime in science, there
are friends of mine who are like this.
There are some of the scientists
who like to work by themselves and communicate only
when they complete their work or discover something.
I think I always found the actual process of discovering
something is more fun if it's together
with other intelligent and curious and fun people.
So if you see the fun in that process,
the side effect of that process would
be that you'll actually end up discovering some things.
So as you've led many incredible efforts here,
what's the secret to being a good advisor, mentor, leader
in a research setting?
Is it a similar spirit?
Or what advice could you give to people, young faculty,
and so on?
It's partly repeating what I said
about an environment that should be friendly and fun
and ambitious.
And I think I learned a lot from some of my advisors
and friends and some of our physicists.
And there was, for instance, this behavior
that was encouraged of when somebody comes with a new idea
in the group, unless it's really stupid,
but you are always enthusiastic.
And then you're enthusiastic for a few minutes,
for a few hours.
Then you start asking critically a few questions,
testing this.
But this is a process that is, I think, it's very good.
You have to be enthusiastic.
Sometimes people are very critical from the beginning.
That's not.
Yes, you have to give it a chance to see it grow.
That said, with some of your ideas
that you're quite revolutionary, so there's a witness,
especially in the human vision side and neuroscience side,
there could be some pretty heated arguments.
Do you enjoy these?
Is that a part of science and academic pursuits
that you enjoy?
Is that something that happens in your group as well?
Yeah, absolutely.
I also spent some time in Germany again.
There is this tradition in which people are more
forthright, less kind than here.
So in the US, when you write a bad letter,
you still say, this guy is nice.
Yes, yes.
So here in America, it's degrees of nice.
Yes.
It's all just degrees of nice.
Right, right.
So as long as this does not become personal,
and it's really like a football game with its rules,
that's great.
It's fun.
So if you somehow find yourself in a position
to ask one question of an oracle, like a genie, maybe a god,
and you're guaranteed to get a clear answer,
what kind of question would you ask?
What would be the question you would ask?
In the spirit of our discussion, it could be,
how could I become 10 times more intelligent?
And so, but see, you only get a clear, short answer.
So do you think there's a clear, short answer to that?
No.
And that's the answer you'll get.
So you've mentioned Flowers of Algernon.
Oh, yeah.
It's a story that's inspired you in your childhood.
It's this story of a mouse, a human achieving
genius level intelligence, and then
understanding what was happening while slowly becoming
not intelligent again in this tragedy of gaining intelligence
and losing intelligence.
Do you think in that spirit, in that story,
do you think intelligence is a gift or a curse
from the perspective of happiness and meaning of life?
You try to create an intelligent system that
understands the universe, but on an individual level,
the meaning of life.
Do you think intelligence is a gift?
It's a good question.
I don't know.
As one of the, as one people who consider the smartest
people in the world, in some dimension at the very least,
what do you think?
I don't know, it may be invariant to intelligence,
let's agree of happiness.
It would be nice if it were.
That's the hope.
Yeah.
You could be smart and happy and clueless and happy.
Yeah.
As always, on the discussion of the meaning of life,
it's probably a good place to end.
Tomasso, thank you so much for talking today.
Thank you.
This was great.