This graph shows how many times the word ______ has been mentioned throughout the history of the program.
The following is a conversation with Vladimir Vapnik.
He's the co-inventor of Support Vector Machines,
Support Vector Clustering, VC Theory,
and many foundational ideas in statistical learning.
He was born in the Soviet Union
and worked at the Institute of Control Sciences in Moscow.
Then in the United States, he worked at AT&T,
NEC Labs, Facebook Research,
and now as a professor at Columbia University.
His work has been cited over 170,000 times.
He has some very interesting ideas
about artificial intelligence and the nature of learning,
especially on the limits of our current approaches
and the open problems in the field.
This conversation is part of MIT course
on artificial general intelligence
and the artificial intelligence podcast.
If you enjoy it, please subscribe on YouTube
or rate it on iTunes or your podcast provider of choice,
or simply connect with me on Twitter
or other social networks at Lex Friedman spelled F-R-I-D.
And now here's my conversation with Vladimir Vapnik.
Einstein famously said that God doesn't play dice.
Yeah.
You have studied the world through the eyes of statistics.
So let me ask you in terms of the nature of reality,
fundamental nature of reality, does God play dice?
We don't know some factors, and because we don't know
some factors, which could be important,
it looks like God play dice, but we only should describe.
In philosophy, they distinguish between two positions,
positions of instrumentalism,
where you're creating theory for prediction
and position of realism, where you're trying to understand
what could be.
Can you describe instrumentalism and realism a little bit?
For example, if you have some mechanical laws,
what is that?
Is it law which true always and everywhere,
or it is law which allow you to predict position
of moving element?
What you believe, you believe that it is God's law
that God created the world,
which adhered to this physical law,
or it is just law for predictions.
And which one is instrumentalism?
For predictions.
If you believe that this is law of God,
and it's always true everywhere,
that means that you're a realist.
So you're trying to really understand God's thought.
So the way you see the world as an instrumentalist?
You know, I'm working for some models,
models of machine learning.
So in this model, we can see
the setting, and we try to solve,
resolve the setting, to solve the problem.
And you can do it in two different way,
from the point of view of instrumentalists.
And that's what everybody does now,
because they say the goal of machine learning
is to find the rule for classification.
That is true, but it is instrument for prediction.
But I can say the goal of machine learning
is to learn about conditional probability.
So how God played use, is he play
what is probability for one,
what is probability for another given situation.
But for prediction, I don't need this.
I need the rule.
But for understanding, I need conditional probability.
So let me just step back a little bit first
to talk about, you mentioned, which I read last night,
the parts of the 1960 paper by Eugene Wigner,
unreasonable effectiveness of mathematics
and natural sciences, such a beautiful paper, by the way.
It made me feel, to be honest, to confess my own work
in the past few years on deep learning, heavily applied.
Made me feel that I was missing out
on some of the beauty of nature
in the way that math can uncover.
So let me just step away from the poetry of that for a second.
How do you see the role of math in your life?
Is it a tool, is it poetry?
Where does it sit?
And does math for you have limits of what it can describe?
Some people saying that math is language, which use God.
So I believe that-
Speak to God or use God or-
Use God.
Use God.
Yeah, so I believe that this article
about the effectiveness, unreasonable effectiveness of math
is that if you're looking in mathematical structures,
they know something about reality.
And the most scientists from natural science,
they're looking on equation in trying to understand reality.
So the same in machine learning.
If you're trying very carefully to look on all equations
which define traditional probability,
you can understand something about reality
more than from your fantasy.
So math can reveal the simple underlying principles
of reality, perhaps.
You know, what means simple is that
it is very hard to discover them.
But then when you discover them and look at them,
you see how beautiful they are.
And it is surprising why people did not see that before
you're looking on equation and derive it from equations.
For example, I talked yesterday about least square method.
And so I think that's a good question.
And people have a lot of fantasy
have to improve least square method.
But if you look going step by step
by solving some equations,
you suddenly you get some term which after thinking,
you understand that it described position
of observation point.
In least square method,
we throw out a lot of information.
We don't look in composition of point of observations.
We're looking only on residuals.
But when you understood that,
that's very simple idea,
but it's not too simple to understand.
And you can derive this just from equations.
So some simple algebra,
a few steps will take you to something surprising
that when you think about, you understand.
And that is proof that human intuition
not too rich and very primitive.
And it does not see very simple situations.
So let me take a step back in general.
Yes, right.
But what about human,
is it supposed to intuition, ingenuity,
moments of brilliance?
So are you so,
do you have to be so hard on human intuition?
Are there moments of brilliance in human intuition
that can leap ahead of math?
And then the math will catch up?
I don't think so.
I think that the best human intuition,
it is putting in axioms.
And then it is technical way.
See where the axioms take you.
Yeah, but if they correctly take axioms,
but it axiom polished during generations of scientists.
And this is integral wisdom.
So that's beautifully put.
But if you maybe look at,
when you think of Einstein and special relativity,
what is the role of imagination coming first there
in the moment of discovery of an idea?
So there's obviously a mix of math
and out of the box imagination there.
That I don't know.
Whatever I did, I exclude any imagination.
Because whatever I saw in machine learning
that come from imagination, like features,
like deep learning, they are not relevant to the problem.
When you're looking very carefully
from mathematical equations,
you're deriving very simple theory
which goes far beyond theory actually
than whatever people can imagine.
Because it is not good fantasy.
It is just interpretation, it is just fantasy,
but it is not what you need.
You don't need any imagination to derive
say main principle of machine learning.
When you think about learning and intelligence,
maybe thinking about the human brain
and trying to describe mathematically
the process of learning,
that is something like what happens in the human brain.
Do you think we have the tools currently?
Do you think we will ever have the tools
to try to describe that process of learning?
You, it is not description what's going on.
It is interpretation.
It is your interpretation.
Your vision can be wrong.
You know, when guy invent microscope,
living book for the first time,
only he got this instrument and nobody,
only he kept secret about microscope.
But he wrote report in London Academy of Science.
In his report, when he looking at the blood,
he look everywhere on the water,
on the blood, on the spin,
but he describe blood like fight between queen and king.
So he saw blood cells, red cells,
and he imagines that it is army fighting each other.
And it was his interpretation of situation.
And he sent this report in Academy of Science.
They very carefully look because they believe
that he's right, he saw something.
But he gave wrong interpretation.
And they believe the same can happen with brain.
Because the most important part,
you know, I believe in human language.
In some product is so much wisdom.
For example, people say that it is better
than 1000 days of diligent studies one day
with great teacher.
But if I will ask you what teacher does, nobody knows.
And that is intelligence.
But we know from history,
and now from math and machine learning,
that teacher can do a lot.
So what, from a mathematical point of view,
is the great teacher?
I don't know.
That's an open question.
I don't know what we can say what teacher can do.
He can introduce some invariance,
some predicate for creating invariance.
How he doing it?
I don't know because teacher knows reality
and can describe from this reality a predicate invariance.
But he knows that when you're using invariance,
he can decrease number of observations 100 times.
That's.
So, but maybe try to pull that apart a little bit.
I think you mentioned like a piano teacher
saying to the student, play like a butterfly.
Yeah, I played piano, I played guitar for a long time.
And yeah, that's, there's maybe it's romantic, poetic,
but it feels like there's a lot of truth in that statement.
Like there is a lot of instruction in that statement.
And so can you pull that apart?
What is that?
The language itself may not contain this information.
It's not blah, blah, blah.
It does not blah, blah, blah, yeah.
Effect you.
It's what?
Effect you, effect your playing.
Yes, it does.
But it's not the lane.
It's, it feels like what is the information
being exchanged there?
What is the nature of information?
What is the representation of that information?
I believe that it is sort of predicate, but I don't know.
That's exactly what, what intelligence
in machine learning should be.
Yes.
Because the rest is just mathematical technique.
I think that what was discovered recently
is that there is two type, two mechanism of learning.
One called strong convergence mechanism
and weak convergence mechanism.
Before people use only one.
In weak convergence mechanism,
you can use predicate, that's what play like butterfly
and it will immediately affect your playing.
You know, there is English proverb, great.
If it looks like a duck, swims like a duck
and quack like a duck, then it is probably duck.
Yes.
But this is exact about predicates.
Looks like a duck, what it means.
So you have so many ducks that you're training data.
So you have description of how looks integral looks ducks.
Yeah, the visual characteristics of a duck.
Yeah, but you want, and you have model
for recognition ducks.
So you would like so that theoretical description
from model coincide with empirical description,
which you saw on the text there.
So about looks like a duck, it is general.
But what about swims like a duck?
You should know the duck swims.
You can say it play chess like a duck.
Okay, duck doesn't play chess.
And it is completely legal predicate, but it is useless.
So half teacher can recognize not useless predicate.
So up to now, we don't use this predicate
in existing machine learning.
And you think that.
So why we need zillions of data.
But in this English proverb,
they use only three predicate.
Looks like a duck, swims like a duck and quack like a duck.
So you can't deny the fact that swims like a duck
and quacks like a duck has humor in it, has ambiguity.
Let's talk about swim like a duck.
It does not say jumps, jump like a duck.
Why?
Because.
It's not relevant.
But that means that you know, ducks,
you know different birds, you know animals.
And you derive from this that it is really one
to say swim like a duck.
So underneath, in order for us to understand
swims like a duck, it feels like we need to know
millions of other little pieces of information.
We pick up along the way, you don't think so.
That doesn't need to be this knowledge base.
In those statements, carries some rich information
that helps us understand the essence of duck.
How far are we from integrating predicates?
You know that when you can see the complete story,
machine learning.
So what it does, you have a lot of functions
and then you're talking, it looks like a duck.
You see your training data.
From training data, you recognize like
expected duck should look.
Then you remove all functions which does not look
like you think it should look from training data.
So you decrease amount of function
from which you pick up one.
Then you give a second predicate
and again decrease the set of function.
And after that, you pick up the best function you can find.
It is standard machine learning.
So why you need not too many examples.
Because your predicates aren't very good.
Yeah, that means the predicate variable.
Because every predicate is invented
to decrease admissible set of function.
So you talk about admissible set of functions
and you talk about good functions.
So what makes a good function?
So admissible set of function is set of function
which has small capacity or small diversity,
small VC dimension example.
Which contain good function inside.
By the way, for people who don't know VC,
you're the V in the VC.
So how would you describe to a layperson
what VC theory is?
How would you describe VC?
Okay, we're machine, so machine capable to pick up
one function from the admissible set of function.
But set of admissible function can be big.
They contain all continuous functions and it uses.
You don't have so many examples to pick up function.
But it can be small, small, we call it capacity.
We call it capacity, but maybe better call diversity.
So not very different function in the set.
It's infinite set of function, but not very diverse.
So it is small VC dimension.
When VC dimension is small,
you need small amount of training date.
So the goal is to create admissible set of functions
which have small VC dimension and contain good function.
Then you will be able to pick up the function
using small amount of observations.
So that is the task of learning?
Yeah.
Is creating a set of admissible functions
that has a small VC dimension.
And then you figure out a clever way
of picking up the.
That is goal of learning which I formulated yesterday.
Statistical learning theory does not involve
in creating admissible set of function.
In classical learning theory,
everywhere 100% in textbook,
the set of function admissible set of function is given.
But this is science about nothing
because the most difficult problem
to create admissible set of functions
given say a lot of functions continue set of function,
create admissible set of functions.
That means that it is finite VC dimension,
small VC dimension and contain good function.
So this was out of consideration.
So what's the process of doing that?
I mean, it's fascinating.
What is the process of creating
this admissible set of functions?
That is invariance.
That's invariance.
Can you describe invariance?
Yeah, you're looking of properties of training data.
And properties means that you have some function
and you just count what is value,
average value of function on training data.
You have model and what is expectation
of this function on the model.
And they should coincide.
So the problem is about how to pick up functions.
It can be any function.
In fact, it is true for all functions.
But because when I'm talking set,
say duck does not jumping.
So you don't ask question, jump like a duck
because it is trivial.
It does not jump, it doesn't help you to recognize jump.
But you know something, which question to ask.
And you're asking if it seems like a duck.
But looks like a duck at this general situation.
Looks like, say, guy who have this illness,
is this disease, it is legal.
Yeah.
So there is a general type of predicate looks like
and special type of predicate
which related to this specific problem.
And that is intelligence part of all this business.
And that were teachers involved.
Incorporating the specialized predicates.
Okay.
What do you think about deep learning
as neural networks, these arbitrary architectures
as helping accomplish some of the tasks
you're thinking about?
Their effectiveness or lack thereof.
What are the weaknesses and what are the possible strengths?
You know, I think that this is fantasy.
Everything which like deep learning, like features.
Let me give you this example.
One of the greatest book is Churchill book
about history of Second World War.
And he's starting this book describing that in all time
when war is over, so the great kings,
they gathered together, almost all of them were relatives.
And they discussed what should be done
have to create peace.
And they came to agreement.
And when happened First World War,
the general public came in power.
And they were so greedy that rob Germany.
And it was clear for everybody that it is not peace.
That peace will last only 20 years
because they was not professionals.
And the same I see in machine learning.
There are mathematicians who are looking for the problem
from a very deep point of view, a mathematical point.
And there are computer scientists.
This mostly does not know mathematics.
They just have interpretation of that.
And they invented a lot of blah, blah, blah interpretations
like deep learning.
Why you did deep learning?
Metamatic does not know deep learning.
Metamatic does not know neurons.
It is just function.
If you like to say peace-wise linear function, say that.
And do it in class of peace-wise linear function.
But they went something.
And then they try to prove advantage of that
through interpretations, which mostly wrong.
And when not enough, they appeal to brain,
which they know nothing about that.
Nobody knows what's going on in the brain.
So I think that more reliable, look on math.
This is a mathematical problem.
Do your best to solve this problem.
Try to understand that there is no only one way
of convergence, which is strong way of convergence.
There is a weak way of convergence,
which requires predicate.
And if you will go through all this stuff,
you will see that you don't need deep learning.
Even more, I would say one of the theorem,
which called representar theorem.
It says that optimal solution of mathematical problem,
which is, which describe learning,
is on shadow network, not on deep learning.
And a shallow network, yeah.
The ultimate problem is there.
Absolutely, so in the end,
what you're saying is exactly right.
The question is, you have no value
for throwing something on the table, playing with it,
not math, so like in your old network,
or you said throwing something in the bucket
and or by the biological example
and looking at kings and queens
or the cells with a microscope.
You don't see value in imagining the cells
of kings and queens and using that as inspiration
and imagination for where the math
will eventually lead you.
You think that interpretation basically deceives you
in a way that's not productive.
I think that if you're trying to analyze
this business of learning
and especially discussion about deep learning,
it is discussion about interpretation,
not about things, about what you can say about things.
That's right, but aren't you surprised
by the beauty of it?
So not mathematical beauty,
but the fact that it works at all.
Or are you criticizing that very beauty,
our human desire to interpret,
to find our silly interpretations in these constructs?
Like let me ask you this.
Are you surprised and does it inspire you?
How do you feel about the success of a system
like AlphaGo at beating the game of Go?
Using neural networks to estimate
the quality of a board and the quality of the position.
And the quality of the position.
That is your interpretation, quality of the board.
Yeah, yes.
But it works.
So it's not our interpretation.
The fact is a neural network system doesn't matter.
A learning system that we don't,
I think, mathematically understand that well,
beats the best human player.
That's something that was thought impossible.
That means that it's not very difficult problem.
That's it.
So we've empirically have discovered
that this is not a very difficult problem.
Yeah.
It's true.
So maybe, can't argue.
So...
Even more, I would say,
that if they use deep learning,
it is not the most effective way of learning theory.
And usually when people use deep learning,
they're using zillions of training data.
Yeah, but you don't need this.
So I describe challenge.
Can we do some problems which do well
deep learning method with deep net
using 100 times less training data?
Even more, some problems deep learning cannot solve,
because it's not necessary.
They create admissible set of function.
To create deep architecture
means to create admissible set of functions.
You cannot say that you're creating
good admissible set of functions.
You're just at your fantasy.
It does not come from us.
But it is possible to create admissible set of functions
because you have your training data.
That actually for mathematicians,
when you're considering variant,
you need to use law of large numbers.
When you're making training in existing algorithm,
you need uniform law of large numbers,
which is much more difficult it requires.
You see dimension and all this stuff.
But nevertheless, if you use both
weak and stroke way of convergence,
you can decrease a lot of training data.
Yeah, you could do the three.
This swims like a duck and quacks like a duck.
But our, so let's step back
and think about human intelligence in general.
And clearly that has evolved in a non-mathematical way.
It wasn't, as far as we know,
God or whoever didn't come up with a model
in place in our brain of admissible functions
that kind of evolved.
I don't know, maybe you have a view on this,
but so Alan Turing in the 50s in his paper,
asked and rejected the question,
can machines think is not a very useful question,
but can you briefly entertain this useful, useless question?
Can machines think?
So talk about intelligence and your view of it.
I don't know that.
I know that Turing described imitation
if computer can imitate human being.
Let's call it intelligent.
And he understands that it is not thinking computer.
Yes.
He completely understand what he doing,
but he set up problem of imitation.
So now we understand that the problem not imitation.
I'm not sure that intelligence just inside of us.
It may be also outside of us.
I have several observations.
So when I prove some theorem, it's very difficult theorem.
In couple of years, in several places,
people prove the same theorem saying,
soil lemma after us was done,
then another guys proved the same theorem.
In the history of science, it's happened all the time.
For example, geometry, it's happened simultaneous.
At first, it did Lava Chevsky and then Gauss and Boyai
and then other guys, and it approximately
in 10 times period, 10 years period of time.
And I saw a lot of examples like that.
And many mathematicians thinks that
when they develop something,
they develop something in general,
which affect everybody.
So maybe our models of intelligence only inside of us
is incorrect.
It's our interpretation, yeah.
It might be there exist some connection
with world intelligence.
I don't know that.
You're almost like plugging in into...
Yeah, exactly.
And contributing to this.
Into a big network.
Into a big, maybe a neural network.
No, no, no, not here.
On the flip side of that, maybe you can comment
on big O complexity and how you see
classifying algorithms by worst case running time
in relation to their input.
So that way of thinking about functions.
Do you think P equals NP?
Do you think that's an interesting question?
Yeah, it is an interesting question.
But let me talk about
complexity in about worst case scenario.
There is a multi-magical setting.
When I came to United States in 1990,
those people did not know.
This is theory, they did not know.
Statistical learning theory.
So in Russia, it was published to monographs,
our monographs, but in America, they did not know.
Then they learned.
And somebody told me that it is worst case theory
and they will create real case theory,
but till now, they did not.
Because it is multi-magical tool.
You can do only what you can do using multi-magics.
And which has a clear understanding and clear description.
And for this reason, we introduce complexity.
And we need this, because using actually
it is diversity like this one more.
You see dimension, you can prove some theorems.
But we also create theory for case
when you know probability measure.
And that is the best case that can happen
in this entropy theory.
So from a multi-magical point of view,
you know the best possible case
and the worst possible case.
You can derive different model in medium.
But it's not so interesting.
You think the edges are interesting?
The edges are interesting.
Because it is not so easy to get good bound, exact bound.
It's not many cases where you have the bound is not exact.
But interesting principles, which discover the mass.
Do you think it's interesting because it's challenging
and reveals interesting principles
that allow you to get those bounds?
Or do you think it's interesting
because it's actually very useful
for understanding the essence of a function,
of an algorithm?
And I think it's very useful
for understanding the essence of a function, of an algorithm.
So it's like me judging your life as a human being
by the worst thing you did and the best thing you did
versus all the stuff in the middle.
It seems not productive.
I don't think so because you cannot describe
situation in the middle.
Or it will be not general.
So you can describe edges cases.
And it is clear it has some model.
But you cannot describe model for every new case.
So you will be never accurate when you're using model.
But from a statistical point of view,
the way you've studied functions
and the nature of learning and the world,
don't you think that the real world has a very long tail?
That the edge cases are very far away from the mean?
The stuff in the middle?
Or no?
I don't know that.
I think that,
but from my point of view,
if you will use formal statistic,
you need uniform law of large numbers.
If you will use this invariance business,
you will need just law of large numbers.
You don't, and there's a huge difference
between uniform law of large numbers and large numbers.
Is it useful to describe that a little more?
Or should we just take it to?
No, for example, when I'm talking about doc,
I gave three predicates and that was enough.
But if you will try to do formal distinguish,
you will need a lot of observations.
Gotcha.
And so that means that information about looks like a doc.
Contain a lot of bits of information,
formal bits of information.
So we don't know that,
how much bit of information contain things
from artificial intelligence.
And that is the subject of analysis.
Till now,
old business,
I don't like how people consider
artificial intelligence.
They consider us some codes which imitate activity
of human being.
It is not science.
It is applications.
You would like to imitate go ahead.
It is very useful and a good problem.
But you need to learn something more.
How people try to do,
how people can to develop, say,
a predicate seems like a doc
or play like butterfly or something like that.
They're not the teacher says you
how it came in his mind.
How he chooses image.
So that process.
That is problem of intelligence.
That is the problem of intelligence.
And you see that connected to the problem of learning?
Absolutely.
Are they?
Because you immediately give this predicate
like a specific predicate seems like a doc
or guac like a doc.
It was chosen somehow.
So what is the line of work, would you say?
If you were to formulate as a set of open problems,
that will take us there.
We'll play like a butterfly.
We'll get a system to be able to.
Let's separate two stories.
One much magical story
that if you have predicate, you can do something.
And another story you have to get predicate.
It is intelligence problem
and people even did not start to understand intelligence.
Because to understand intelligence, first of all,
try to understand what do in teachers.
How teacher teach?
Why want one teacher better than another one?
Yeah, so you think we really even haven't started
on the journey of generating the predicate.
No, you don't understand.
You even don't understand that this problem exists.
Because did you feel?
You do.
No, I just know name.
I want to understand why one teacher better than another.
And how affect teacher student.
It is not because he repeating the problem
which is in textbook.
He makes some remarks.
He makes some philosophy of reasoning.
Yeah, that's a beautiful,
so it is a formulation of a question
that is the open problem.
Why is one teacher better than another?
Right, but he does better.
Yeah, why at every level?
How do they get better?
What does it mean to be better?
The whole.
Yeah, from whatever model I have,
one teacher can give a very good predicate.
My teacher can say swims like a dog
and another can say jump like a dog.
And jump like a dog, carries zero information.
Yeah.
So what is the most exciting problem
in statistical learning you've ever worked on
or are working on now?
I just finished this invariant story.
And I'm happy that I believe that it is ultimate learning story.
At least I can show that there are no another mechanism,
only two mechanisms.
But they separate statistical part from intelligent part.
And I know nothing about intelligent part.
And if we will know the intelligent part,
so it will help us a lot in teaching.
It will help us a lot in teaching, in learning.
In learning, do you know we'll know it when we see it?
So for example, in my talk,
the last slide was a challenge.
So you have say NIST, digital recognition problem.
And deep learning claims that they did it very well,
say 99.5% of correct answers.
But say you use 60,000 observations.
Can you do the same using 100 times less?
But incorporating invariants,
what it means, you know, digit one, two, three,
but just looking at that,
explain with each invariant I should keep
to use 100 examples or say 100 times less examples
to do the same job.
Yeah, that last slide, unfortunately,
you're talking it quickly,
but that last slide was a powerful open challenge
and a formulation of the essence here.
That is exact problem of intelligence.
Because everybody, when machine learning start
and it was developed by the mathematician,
they immediately recognized
that we use much more training data than human needed.
But now again, we came to the same story, half to decrease.
And that is the problem of learning.
It is not like in deep learning,
they use zillions of training data.
Because maybe zillions are not enough
if you have a good invariance,
maybe you will never collect some number of observations.
But now it is a question to intelligence, how to do that?
Because statistical part is ready.
As soon as you supply us with predicate,
we can do a good job with small amount of observations.
And the very first challenge is
we'll know digit recognition.
And you know digits, and please tell me invariance.
I think about that, I can say for digit three,
I would introduce concept of horizontal symmetry.
So the digit three has horizontal symmetry,
say more than say digit two or something like that.
But as soon as I get to the horizontal symmetry,
I can mathematically invent a lot of measure
of horizontal symmetry or the vertical symmetry
or the diagonal symmetry, whatever,
if I have a deal of symmetry.
But what else?
Looking on digit, I see that it is metapredicate,
which is not shape, it is something like symmetry,
like how dark this whole picture, something like that.
Which can self rise a predicate.
You think such a predicate could rise
out of something that's not general,
meaning it feels like for me to be able to understand
the difference between a two and a three
I would need to have had a childhood of 10 to 15 years
playing with kids, going to school,
being yelled by parents, all of that walking, jumping,
looking at ducks, and now then I would be able
to generate the right predicate
for telling the difference between two and a three.
Or do you think there's a more efficient way?
I don't know.
I know for sure that you must know
something more than digits.
Yes, and that's a powerful statement.
Yeah, but maybe there are several languages
of description, these elements of digits.
So I'm talking about symmetry, about some properties
of geometry, I'm talking about something abstract.
I don't know that.
But this is a problem of intelligence.
So in one of our article, it is trivial to show
that every example can carry not more than one bit
of information in real, because when you show example
and you say this is one, you can remove, say,
a function which does not tell you one.
Say, it's a best strategy.
If you can do it perfectly, it's remove half of the functions.
But when you use one predicate, which looks like a duck,
you can remove much more functions than half.
And that means that it contains a lot of bit of information
from formal point of view.
But when you have a general picture
of what you want to recognize,
and general picture of the world,
can you invent this predicate?
And that predicate carry a lot of information.
Beautifully put.
Maybe just me, but in all the math you show,
in your work, which is some of the most profound
mathematical work in the field of learning AI
and just math in general, I hear a lot of poetry
in philosophy, you really kind of talk about philosophy
of science, there's a poetry and music to a lot
of the work you're doing and the way you're thinking
about it, so do you, where does that come from?
Do you escape to poetry, do you escape to music or not?
Do you think that there exists ground truth?
There exists ground truth?
Yeah, and that can be seen everywhere.
Yeah.
The smart guy, philosopher, sometimes I surprise
how they deep see, sometimes I see that some of them
are completely out of subject, but the ground truth
I see in music.
Music of the ground truth?
Yeah, and in poetry, many poets, they believe
that they take dictation.
So what piece of music as a piece of empirical evidence
gave you a sense that they are, they're touching
something in the ground truth?
It is structure.
The structure, the math of music.
If you're listening to Bach, you see the structure.
Very clear, very classic, very simple,
and the same in math, when you have axioms in geometry,
you have the same feeling, and in poetry sometimes
you see the same.
Yeah, and if you look back at your childhood,
you grew up in Russia, you maybe were born as a researcher
in Russia, you've developed as a researcher in Russia,
you've came to the United States, and a few places.
If you look back, what were, what was some of your
happiest moments as a researcher?
Some of the most profound moments,
not in terms of their impact on society,
but in terms of their impact on how damn good you feel
that day, and you remember that moment.
You know, every time when you found something,
it is great, every simple things,
but my general feeling is that most of my time was wrong.
You should go again, and again, and again,
and try to be honest in front of yourself,
not to make interpretation, but try to understand
that it related to ground truth, it is not my,
blah, blah, blah, interpretation, and something like that.
But you're allowed to get excited at the possibility
of discovery.
Oh yeah.
You have to double check it, but.
No, but how it related to the other ground truth,
is it just temporary, or it is forever?
Forever, you know, you always have a feeling
when you found something, how big is that?
So 20 years ago when we discovered statistical learning,
so nobody believed, except for one guy,
Dudley from MIT, and then in 20 years it became fashion,
then in 20 years it became fashion,
and the same you support the active machines,
the external machines.
So with support of active machines and learning theory,
when you were working on it, you had a sense,
that you had a sense of the profundity of it,
how, this seems to be right, this seems to be powerful.
Right, absolutely, immediately.
I recognize that it will last forever.
And now, when I found this invariant story,
You feel the same?
I have a feeling that it is complete logic,
because I have proved that there are no different mechanism.
You can have some, say, cosmetic improvement you can do,
but in terms of invariance, you need both invariance
and statistical learning condition work together.
But also, I'm happy that we can formulate
what is intelligence from that,
and to separate from technical part.
And that is completely different.
Absolutely, well, Vladimir, thank you so much
for talking today.
Thank you.
Bye-bye, bye-bye, bye-bye, bye-bye.