What algorithm to choose and how to apply it to your product?

Itnig

Itnig es un ecosistema de startups, un fondo de inversión para proyectos en etapa inicial, un espacio de coworking y un medio de comunicación con el objetivo de construir y ayudar a otros emprendedores a crear negocios escalables. Nuestro objetivo es liderar negocios de alto crecimiento, y construir un ecosistema y una economía independientes donde nuestras startups y equipos puedan colaborar, fortalecerse y crecer más rápido. El podcast de Itnig es un podcast de negocios, tecnología y emprendimiento. Invitamos semanalmente a emprendedores y perfiles tecnológicos para hablar sobre sus startups de éxito. Siempre estamos buscando aprender y compartir conocimiento de las personas más interesantes del ecosistema. A través del fondo de inversión de Itnig, buscamos invertir en equipos con el talento y la ambición de crear negocios escalables con el potencial de cambiar mercados e industrias. Itnig es un ecosistema de startups, un fondo de inversión para proyectos en etapa inicial, un espacio de coworking y un medio de comunicación con el objetivo de construir y ayudar a otros emprendedores a crear negocios escalables. Nuestro objetivo es liderar negocios de alto crecimiento, y construir un ecosistema y una economía independientes donde nuestras startups y equipos puedan colaborar, fortalecerse y crecer más rápido. El podcast de Itnig es un podcast de negocios, tecnología y emprendimiento. Invitamos semanalmente a emprendedores y perfiles tecnológicos para hablar sobre sus startups de éxito. Siempre estamos buscando aprender y compartir conocimiento de las personas más interesantes del ecosistema. A través del fondo de inversión de Itnig, buscamos invertir en equipos con el talento y la ambición de crear negocios escalables con el potencial de cambiar mercados e industrias.

Transcribed podcasts: 697
Time transcribed: 26d 23h 57m 17s

results.

Mention graph

This graph shows how many times the word ______ has been mentioned throughout the history of the program.

Así que, ¿cómo funciona la máquina de aprendizaje?
Esta es la primera pregunta que uno puede tener, ¿no?
Perdón, algunos problemas logísticos.
¿Puedes ver que funciona? ¿Puedes ver que funciona?
Así que, ¿cómo podemos hacer una máquina para aprender algo?
Y esto no es solo un problema tecnológico, sino un problema cognitivo.
Entonces, ¿cuál es el aprendizaje? ¿Puedemos definir qué es el aprendizaje?
Y eso trae nuevos problemas a nosotros, ¿no?
Porque lo que nosotros normalmente decimos es que la máquina de aprendizaje es...
Bueno, puedo decirte una cosa, la máquina de aprendizaje no es nada.
Entonces, en realidad, el aprendizaje no suele suceder, si lo consideras en el camino estricto del mundo.
Pero es solo que la máquina es ahora capaz de hacer algunas predicciones,
predicciones basadas en ciertas informaciones que te han dado para él.
Entonces, es principalmente capaz de repetir en un nivel muy complejo.
Por ejemplo, vamos a hablar de este concepto de un nivel primero.
¿Qué tengo aquí?
Una dog, una dog, tres variaciones de una dog.
Entonces, eso fue bastante fácil para él, ¿no?
¿Qué es su nombre?
Greg.
Entonces, para Greg, eso fue muy fácil, pero eso es muy difícil para mi computadora, ¿por qué?
Porque, si tú veas el primer pictorial,
tú, como humanos, tienes obstrucción.
Entonces, puedes entender, no solo esta figura, pero la idea detrás de esto, ¿no?
Si yo intento... Bueno, eso será un desastre,
pero si yo intento dibujar el pictorial que ve, veo algo como una bolsa,
veo algo como esto.
Bueno, esto es un desastre, pero esto es algo como lo veo, ¿no?
Y esto podría ser una dog o algo más, ¿no?
Pero no ves una dog, pero relato a esta pieza de información
en la forma de una figura ahora, con un concepto que tú ya sabías, ¿no?
Y gracias a eso, gracias a esa razón,
entiendes que la siguiente figura es la misma cosa,
porque estás capaz de entender que esta dog tiene un cuerpo
que no es esta forma, pero es la forma de una dog.
Y lo mismo va para la tercera figura,
porque tú sabes que las dog son de diferentes colores,
el color no es un fito muy importante para considerar
que analizar algo es una dog, ¿no?
Pero la máquina está en riesgo de hacer este simple trabajo.
La inteligencia artificial.
Hemos hablado mucho lately sobre este concepto,
y el aprendizaje de la máquina es solo una pequeña parte de esto,
pero la inteligencia artificial ha estado ahí por mucho tiempo, ¿no?
O sea, en los años 60, 70, hay muchos libros, novos,
que han escrito sobre la inteligencia artificial.
¿Por qué es ahora que hablamos mucho sobre esto?
Bueno, podrías considerar lo que voy a decir en opinión,
pero creo que es cierto.
Lo que me ha dicho es que tenemos mucha data,
y hay muy rápido y buenas formas de procesar.
Entonces, antes, podríamos tener problemas
para generar millones de diferentes versiones de una dog
para la computadora para aprender qué una dog es,
con todas las posibilidades diferentes
que tú sabes que una dog es.
Ahora podemos hacerlo fácilmente.
Así que es por eso que el aprendizaje de la máquina
está dentro de la inteligencia artificial ahora mismo,
y por qué hablamos mucho sobre esto ahora mismo.
Así que veamos cómo este aprendizaje,
este aprendizaje de la máquina,
primero necesitamos data.
Esto es, por supuesto, lo que sabemos.
La data es como la comida para la máquina para aprender.
Si no se proporciona data,
digamos que en vez de data experiencia para la máquina,
no se aprende.
Imagínate un newborne.
Un newborne comienza a coger data,
experiencias,
experimentando con cosas.
Entonces, primero, el newborne no saberá
que algo hará.
Así que tenemos que protegerlo de hacerlo.
Pero la experiencia de caer
y experimentar el dolor es lo que mantiene
el newborne para hacerlo de nuevo.
Así que lo que hacemos es que
obtenemos la experiencia de la data
y luego creemos expectaciones.
Entonces, me tocó algo que tenía un fuego en él.
Me fue quemado,
fue doloroso.
Mejor no me tocó el fuego de nuevo.
Así que es muy similar
a lo que estamos haciendo con la máquina.
Con una diferencia,
sin la data, la máquina se olvidará.
Pero todavía tendrás información en tu cabeza.
Y, de nuevo,
tenemos que ir al proceso conceptual,
porque ¿cómo aprende un humano?
Bueno, no soy un experto en el campo,
pero puedo decirles
que aprendemos a hacer expectaciones.
No esperamos algo que suceda.
Esto ocurra algo en otro modo,
el modo en el que no esperamos esto.
Por lo tanto,
hay un falso en lo que esperamos que suceda
y aprendemos.
Así que el paso 2 es muy aburrido.
Tiene mucho tiempo.
Es sobre estructurar la data,
limpiar la data.
No es interesante ahora mismo.
El tercer paso es el modelo de construcción.
Ok, vamos a construir el modelo.
¿Cómo es eso?
El paso 4.
El modelo dirá algo.
Entonces, el modelo es, en realidad,
el pequeño brain que hemos creado para la máquina.
Y este pequeño brain hace un trabajo.
Y si lo fail o no,
es más adecuado o no.
Podemos hablar de errores si quieres.
Pero no hay nada más en ello.
Entonces, este brain
tiene que tener nuevos input,
para predicar nuevos output.
Y estos input tendrán
la misma estructura
que los input que usamos
para fitar el modelo.
Entonces, este brain
espera ciertas cosas
en ciertas estructuras.
Y si cambiamos la estructura,
¿cuál es el brain clave?
Se fail demasiado para predicar.
Entonces, tenemos que fitar
el modelo con la misma data
que tendremos que usar later
para la predicción.
Y el 5º punto,
bueno,
¿cual es lo bueno o no?
Agrega con este punto.
No es necesario
para el proceso de aprendizaje de máquina.
Pero,
¿qué es algo si no le da valor?
Entonces,
vamos a extrasar algo de valor.
Y, como dije aquí,
el 3º punto no es tan simple.
Entonces,
quiero dejar un minuto
en este punto.
Entonces, imagínate
que tienes una cierta cantidad de data.
Let's talk in easy terms.
So,
for example,
you have an Excel
matrix,
an Excel table
with some columns
and some rows.
I'm not going to draw it.
It's very boring.
Okay.
So,
you have this data set
or this table.
And you want to create a model
that
you want to create
a model
that
you want
to create
a model
that
you want
to create
a model
that
being given
some of these features
is able to predict
the outcome
of one of the other features.
Well,
it's very easy.
We say to the machine,
okay,
get all this information.
Find some patterns.
We will go through
some more detail later.
Find some patterns
and then
try to predict
this column.
So,
at first,
the machine
will
look at the whole data set,
will be staring at it,
learning from it,
and then
the machine
will hide
itself
this column
and will try
to do the predictions.
Okay.
How can we evaluate then
how the machine performed
in doing so?
Was the machine
accurate or not?
Well,
this creates
big problems.
Why?
Because
if we train
with all the experiences
that we have at disposal,
then the machine
will probably be successful
on
foreseeing the outcome
of these experiences.
He has seen everything.
So,
he will probably,
it will probably
predict accurately
these things
that he has already seen,
it has already seen.
In order to
be able to evaluate
what will happen
with new
instances,
with new experiences,
we need to save
a tiny bit
of our data set
in order to create
what we call
a test set.
So,
we test with data
that the machine
has not used
to train the model.
Okay.
So,
some part of my data set
will be saved
for testing.
Which percentage
and how am I going to do that
varies a lot.
Unfortunately,
it depends on a lot of factors.
How large was my primary data set
is the most important factor.
But also,
how am I going to choose this,
this test set,
it's important.
Imagine that they have this table
sorted by year.
My first column, for example, is year.
So,
here am I at 2010
and at the end,
I am in 2017.
What will be the problem?
If I create this artificial cat,
I'm going to train
only with successes
that had happened
until a time.
So,
the new behavior
will be excluded
from the training.
Right?
So, we don't want that.
Or yes,
depends on the situation.
But,
let's say we don't want that.
We should do a random sample.
But,
what happens if we have
different proportions
on the classes
that we want to
predict?
Well,
we will need to do
different selections.
So, for example,
a stratified selection or whatever.
So, this again is very complex.
But let's say we can do a random sample.
Okay?
Now I have this training set
and the machine
excludes
the column I call P
as prediction
and starts
first looking at the whole picture,
trying to create some rules
and then
builds the model
with the information
that has here.
And now we have to predict
to do the predictions
to compare with the test set
and we can check
if the outcomes were correct or not.
So,
imagine my test set had,
I don't know,
class A, class B,
class A
and the machine says,
well,
this will be class A,
this will be class B,
this will be class B.
So,
we just made one mistake.
Okay?
It's quite accurate.
Well,
it's 32% of error,
which is not so nice,
but this is an example.
Okay?
Well,
probably you are seeing
a problem here
or more than one problem.
First problem,
I lose data,
how boring.
I don't want to lose data,
if possible.
Second problem,
what happens with
2018?
I don't have 2018 yet.
Well,
I'm not a magician,
so who cares,
we will train again in 2018.
In any case,
the problem of losing data is real.
It's real.
So,
there are methods
to do estimations of error
without
without losing data.
The problem is that
the model will not be evaluated.
Let me explain it
with a tiny bit more detail.
So,
imagine I have the data set
that I had before,
and I want it to be
all training set.
All of it,
so 100%
training set.
I have a problem.
I will have to test with itself,
right?
As I said,
this can bias
our
forecasting of error,
because we are
testing with instances
that we have already used
to train.
Well, no problem.
What I'm going to do
is I'm going to train
with 90% of the model,
and I'm going to save 10%
as a test set.
Okay?
I'm going to build here model 1.
Model 1,
90% of my data set.
Okay?
And then I'm going to test here,
and I will get
some error results.
I will say error 1.
Okay?
Now,
I do the same,
but I use this 90%
to train,
and this 10%
to test.
So,
actually what I'm doing
is exactly what I was doing here,
but saving another part
of the data set.
So now,
this other part of the data set
will be the test set.
Before it was a training set,
now it's going to be the test set.
Then,
I will have another model,
tiny bit different,
and
my error metrics
will be different.
Well,
I already did the maps, right?
If I do this 10 times,
my whole data set
would have been creating models,
and my whole data set has been
tested,
has been tested.
Okay?
So, let's say that
at the end,
I will expect
that
the sumatory of the errors
will give me the mean
of the errors
of all the tiny models
that I created,
and we say
that we expect this model
to have these error metrics.
Got it?
Quite simple.
But still,
we must
validate these
expectations
on what we call
validation set.
So, a new set.
And this validation set
needs to have new experiences
as different as possible
from the experiences
that we used to train.
So, if we go back to the doc,
imagine that I have
a data set built
with the first two pictures,
and some other animals.
And my classifier
is finding the docs,
and it's telling me
it's very accurate.
But now, in the validation set,
I have the third picture as well,
because a funny guy
added it to a data set.
And now my model fails miserably
and says this is a cat.
So, the problem is that
as we didn't have
different colors on the docs,
the classifier didn't understand
that he has to look at the color
as a way to identify
an animal.
And that's a problem.
Okay, let's move on.
So, predictions.
I'm talking about predicting
a class all the time.
So, classification
is to classify something
that is non-numeric
into different bins,
different boxes.
So, for example,
I have a data set
with parachutors.
That's weird as it looks.
Tennis balls and dogs again.
And I want to classify
these data points
in these categories.
So, the problem is that
I can't numerically operate
with them.
So, two parachutors plus one ball
is two parachutors with
and one ball.
I can't do any mathematical operation.
While on regression,
you probably see this graph,
saw this graph already
a lot of times
and you hate it.
On regression,
my data
can be
of any numeric variable,
numeric value possible.
So, and you have an infinite amount
of
values to become.
So, that creates a problem.
How do I measure the error?
Imagine, in this situation,
I try to classify
and I've added one ball
to the parachutors box.
So, I count how many
misplaced items were in the boxes
and I know that I've made
one mistake out of,
I don't know, 10.
So, I have a 10%
of misclassified instances
or a 10% of error, right?
But how do I deal with this here?
Because I can predict that,
I don't know,
a car costs 30,000 euros
and it costs 29,500.
Am I accurate or not?
What do you think?
I'm quite accurate, right?
So, in this case,
in this case,
I can't measure the error
in terms of being correct or not
because that will be very depressing.
It's very difficult
to be exactly correct
with a number, right?
Instead, I will need
to use different metrics, okay?
If you want to talk about
metrics of error,
we can do it later
because there is a lot of stuff
and we can go back, okay?
So, I'm talking a lot
about building a model,
building a model,
but I have not said yet
how to build a model,
how a model is built, right?
So, there is this magic word
which is the algorithm,
also called black box,
also called no one knows
what happens inside
because if someone tells you
that they know, they don't.
I mean, it's possible
that you have created
some algorithms, okay?
Then you probably know it,
but you are one out of one million
and it's nice.
I will be glad to meet you.
There is another possibility.
You have tuned a model,
which is I had an algorithm
made by a genius.
I tried to understand
how it works
and I changed some parameters.
Well, you have not done
an algorithm,
you have just tuned it
a tiny bit
to adapt this method
to your dataset.
So, be aceptical about
creating algorithms
and when you hear
this word
or companies that say
we create your algorithm,
bullshit.
I have to say it
because it happens a lot
and it disappoints me
a tiny bit.
As you probably saw.
So, what we do mainly
is to choose models
that have been built
by clever people
through a lot of time,
through a lot of time
and usually
they have a lot of years
because neural networks,
21 century,
bullshit.
On the 60s
they were already developed.
Super vector machine,
the future 70s
and the 70s
it was already designed.
Not used until the 90s
because the Russians
are a tiny bit crazy.
They had these genius
inside their country
and they said,
well,
talk to a wall,
we don't care
what you say.
Then the American said,
come,
we are going to use
your algorithm
and they succeed using it.
So,
as you can see,
what has changed
is the possibility
of using these algorithms,
these methods
with a lot of data
and all the methods
I'm talking about
are non parametric algorithms.
Let's go into this point.
So, parametric algorithms,
the thing that you hate,
me too.
So,
these algorithms
are functions
that we already know their shape.
So, for example,
linear regression.
Sorry to use the blackboard
but I thought it was
better than creating
infinite amount of slides.
So,
imagine I have a,
I don't know,
variables.
Let's say something
that is easy
to interpret.
For example,
let's say salary,
age.
Let's imagine
that our data points
have this shape.
I'm very optimistic
for the elders,
but...
So,
the change
because like this,
I don't know
how to see it.
So, age,
salary.
Okay,
so the older someone gets,
it looks like
the more revenue
is getting, right?
So,
now,
I can tell you exactly
how much money
this guy is earning
and this guy is,
I don't know, 18
and earning
10 euros.
He lives in Barcelona,
clearly.
I can tell you exactly
how this guy,
how much money this guy earns
and I can tell you
exactly the same
for this guy.
Let's say it's 15
and this guy is 22.
What happens
with the guys
in the middle?
So,
we need to choose a way
to be able to represent
any point
of this space, right?
So,
I go
a tiny bit far away,
I choose a
line
and I try to draw it.
So, I failed miserably
in some points, right?
So, now the problem
is that
for some points
I'm quite accurate,
but for some others,
I'm not.
But hey,
I'm more intelligent
than only straight lines.
Let's choose another function.
I can try to use
something like an exponential function.
Hey,
this time I had better success.
Why?
Because I took a second
look at the data, right?
So,
the idea is to choose
a function
that has a known shape.
So, for example,
here I added the formula
of the linear function
with a slope
and a constant.
We can choose
any other function
that we know.
And then,
the good thing
is that these coefficients,
so A and K
are going to be defined
by the machine.
I'm not going to find
these parameters
because if not,
why are we using machines?
So,
parametric methods
will be good then
when you already know
the behavior of something, right?
For example,
how many of you
have worked with
parabolic movements
in high school?
Well, we know that
if I throw this thing
like this,
it will have a certain movement,
right?
I know it.
That's why I can
aim, actually,
because if I didn't know
this function,
I would not be able to aim.
So, again,
knowing this function
has nothing to do
with the knowledge of mathematics.
It's experience, okay?
So,
I can fail.
For example,
if I try to hit
Gerard from here,
it's possible
that I fail
because
the parameters
are not going to be
perfectly chosen.
What do I have to do
in order to
aim him quite accurately?
Throw some pens.
So, have some data,
not only one.
But not that much.
With a few trials,
I'm going to hit him.
Do you have more pens?
So,
that's the idea, okay?
So,
it's not the best way
to build a model,
especially now
that we can deal
with large data sets.
But,
if we don't have
a lot of data,
so,
not a lot of pens,
not a lot of trials.
But we know
something about the
field,
so we have some expertise.
That's the algorithm to choose.
Nonparametric algorithm.
Nonparametric algorithms.
So,
this time,
I'm not going to think
I'm clever
and the clever cookie
who knows to draw lines.
And,
I'm going to think
that the machine
knows better.
Now,
I'm going to find
a method
that uses
a way of thinking
that makes sense
for my data.
But I'm not going to
choose a function
or a predefined line.
Ok?
That is the main difference.
In order to do that,
I just need more data.
Why?
Because,
if we
if we collect
few data points,
they can have artificial shape.
For example,
here I collected
only 12 or 13 data points.
But,
what happens if I collect more,
I'm sorry that,
but,
there is a certain point
that the people
stops earning more.
So,
hey,
I didn't know that
before.
I didn't have enough data.
So,
I collect as much data
as I can.
Actually,
I stopped doing samples,
thanks God,
and I can work
with all the data
that they have
in order to find
the function
that suits best
the shape
of the data points.
Also,
it's easier
for our brains
to deal
with a higher dimension
problem
if we use
this kind of methods.
Because,
here the regression
was quite simple, right?
I have
two possibilities,
two variables.
But,
what happens if I have
ten variables?
So,
what happens when I have
a multi-dimensional space?
Well,
my brain
starts to get
exhausted,
and it's really difficult
to think
on this space.
I mean,
think about four dimensions.
So,
non-parametric algorithms
will help us
in finding patterns
and shapes
in forms
that we don't
even understand.
So,
here you have
a tiny chart
comparing both methods,
okay?
I said almost everything,
but it's good
to go through them.
So,
parametric algorithms,
more simple to understand,
faster to build,
of course,
and you can
work
with small datasets.
Working with
small datasets
seems not beneficial,
but,
hey,
if you are
doing experiments,
you are probably happy
not to test on
1,000 humans.
You can test in 100 only.
Limitations,
yeah,
of course,
they are constrained
by the shape.
So,
even I know
that here
I should do this again
and probably here
I should do this too.
I chose
a function
and I go with it
until the end.
So, simplicity,
yeah,
they are tiny bit simplistic,
I would say.
And also,
poor fit.
What does poor fit mean?
It means again the same,
that I know
that there are some errors here,
some distances
from the expected value
to the real value.
And I can do nothing about it.
Nonparametric algorithms,
benefits.
They are more flexible
as I already said.
The performance,
then it's better
because they can
adapt better
to my data points.
Limitations,
they require more data
to learn the shapes
because if not,
they will do movements
that are completely arbitrary
and they will fail.
Computational power,
of course,
to deal with these large
datasets,
I need more powerful computers.
And finally,
risk of overfitting.
The demo came already.
So,
overfitting
for those
who are not familiar
with the term,
happens
when we try to
to adapt the function,
so it goes through
the whole data points.
So, to all the data points.
So, let's imagine this function.
Let's say same problem.
If I do this,
I have a high risk
of biasing the model
because this guy
was probably
an outlayer.
Not normal behavior, right?
So,
trying to
go through
all the data points
with the function
is dangerous
because
new experiences
will not be explained
with,
they will be explained
by
the coincidence
of the data points
that we had before.
So,
trying to
push and turn
the model
until the end
will create great performances
when we test
with our train set,
but they will fail
when we go
to predict new cases.
Okay?
So, this would be overfit.
This would be
underfit.
Like, hey,
you could have done better.
And this will be like
in between a tuned fit
or something like that.
Questions?
Yeah,
when you talk about
nonparametric algorithms,
are you referring to
nonparametric statistics
also?
Because I know that,
for example,
experimental correlations,
nonparametric version
of the
other kind of correlations.
And
I'm not sure
whether this limitation
of more data
applies also.
So,
parameters exist
in both
models.
It's just
that some of them
are pretty defined.
So, in parametric
algorithms,
you define the parameters
and then
the algorithm
chooses the components.
So, for example,
on the linear
on the linear function,
you define that
there will be an X
and there
there will be a constant
and of course
at the constant
can be zero.
Yeah.
Then the model
will define
the coefficient.
So, the slope,
for example,
or how much
this constant is.
In this case,
the parameters exist,
but you do not define them.
You do not give it to the machine.
So, you don't know
if you are going to
find an exponential function,
logarithmic function
or whatever function
you can imagine,
or a very, very
complex polynomial
which is where
you get the danger
of getting overfitting.
I don't know
if it answers that.
Yeah, but the question is then,
do you think
Spearman population
as a non-parametric
statistical method
also requires
a big data set?
Yeah, I would say so.
Yeah.
Yeah.
So, it is a non-parametric
way
and it will require
a bigger data set.
The question is sometimes
from an exploratory point of view.
Yeah.
So,
the difference,
a small data set
is enough
at least
for identifying trends.
So,
which kind of data set,
like belonging to what
the real...
We work, for example,
in healthcare.
Okay.
Let me tell you
one thing.
So,
human behavior
is really changing, right?
But
healthcare,
actually,
so,
the health of a human
is something
that can be predicted
easily, let's say,
because the components
are known to us.
So, that's why
you can perform better
because actually
the functions
are kind of simple,
right?
So, for example,
let's say,
if you want to know
if someone has diabetes
with certain variables,
not a lot of them,
you will be able to know,
right?
So,
the components
can be reduced.
It's like having
principal component analysis
already done
to your data set,
whereas
other kind of behavior,
like, for example,
clicks on a website,
they have a lot of
different reasons, right?
And it's
more difficult to know.
Imagine,
I go to another dimension
of a space
and I throw an object.
The behavior
will be unexpected.
I don't know
what's the gravity,
I don't know
if it will go
higher,
lower,
or what will happen,
but
once I identify it,
it will be
a physical function.
So,
there are some limits
on how complex
can this be, right?
That would be
the point.
So,
even with non-natural data,
I would guess
that with a large enough data set,
we would have
a normal distribution.
So,
wouldn't every non-parmetric test
be normal with
a 10-page data?
¿Por qué
tendrán una distribución normal
todo el tiempo?
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
No.
Yo voy a la web, cogeo datos de diferentes websites y obtengo esta distribución.
Lo que está pasando aquí, lo que está pasando es que en el 98% de los casos,
esos websites no hablan de iPhone.
Este es el sentimiento, este es el positivo y este es el negativo.
Y tengo esta distribución de datos.
Tengo algunas personas aquí, algunas personas aquí, muchos de los websites aquí y el simétrico.
No será simétrico porque Apple tiene un mejor sentimiento que el mejor,
pero digamos que es simétrico por razones de pensamiento.
La primera forma que tengo es que tengo muchos puntos de datos en la cero
y tengo que reconocer que no significa sentimiento neutral,
pero significa obsesión de sentimiento.
Porque imagínense que creé cinco categorías.
Digo, esto es un variable numerico que no tiene mucho sentido en términos numericos
porque 20 puntos de sentimiento, menos 17 puntos de sentimiento,
prefiero contar a las personas que les gustan o no.
Y digo, tienes los lobos, tienes los positivos,
tienes los neutros, tienes los negativos,
y tienes los haters.
Si clasifico, y el método que voy a utilizar para clasificar esto
es este que voy a hablar de en una minuta si el síndrome me le da.
El método que voy a elegir es una tarea de decisión, por ejemplo.
Me gustan las líneas verticales.
Aquí tengo un problema. Imagínense que hago esto
y mi clasificador me dice, hey, soy 98% correcto.
Eureka.
Bueno, lo que el algoritmo está haciendo es,
sé que 98% de los puntos están aquí,
entonces pongo los 100 puntos de datos en este columne,
soy tan clave que casi no me fail.
Por eso tenemos las estadísticas capas,
que en este caso serán 0.
Para aquellos que no saben las estadísticas capas,
creen que es el suceso, que no es dependiendo
de AFFAR, de chance.
Entonces, si el algoritmo es tan clave que dice,
hey, voy a poner todas las personas aquí,
porque la mayoría de las personas están aquí,
está sucediendo porque de chance, de probabilidad de distribución.
Entonces, la primera cosa que tendría que hacer es
errar esta distribución.
Y si quiero todavía identificar el sentimiento neutral,
una manera de stratificar mi sample.
Entonces, tengo una distribución artificial,
porque los algoritmos no parámetros
con las distribuciones, en realidad, no lo importa.
Entonces, estoy hablando de la decisión 3,
así como clasificación de métodos,
es llamada C5.0,
y esta es la opción que la máquina dará.
Entonces, crea un 3,
es difícil de imaginar como esto,
es mejor imaginarlo como esto,
porque crea un 3 donde las clases son decisiones logísticas.
Entonces, de alguna manera, lo que el algoritmo está haciendo
es hacer un grito en el set de datos,
porque las reglas son paralelas a la acción.
Y lo encontramos, finalmente,
que son como clases pequeñas,
si puedo decir esto.
Entonces, tenemos un grupo de datos,
que se acercan similarmente,
y que están clasificados juntos.
Entonces, también es una buena manera
de examinar nuestro set de datos,
y es por eso que este modelo no es sólo
usado por predicciones,
por por las razones,
pero también para la comprensión de datos
o para entender las razones de datos.
Por lo tanto, 0 o 1, como dije con el pate o no pate,
usé esto por esa razón.
Y funciona muy bien con problemas multidimensionales,
porque se simplifica en las principales notas lógicas.
Imagínate que tenía un set de datos de costumers.
Por ejemplo, la región, el número y los datos importantes
usan para construir las notas para que llegue al pate.
Así que cada pate será una combinación de estos fitos, por ejemplo.
Y también se descartará otras franquicias,
así que algunos posibles pates fallarán.
Porque no necesitan ellos para darles el mejor suceso posible.
Así que tu pregunta fue sobre eso, en realidad.
El problema aquí, con la distribución del 98,
es que todas las categorías fallan,
porque su suceso es más por poner todos en el mismo pate.
Esto es un algoritmo de los habitantes.
Es un algoritmo de los habitantes, porque lo comparan con los datos
que tienes alrededor de ti.
Imagínate, vamos a encontrar un problema.
Imagínate que tienes estos datos,
y queremos saber si eres rojo o azul.
Por ejemplo, si quieres pineapple pizza o no.
Y en realidad, probablemente conoces si quieres pineapple pizza
o no.
Así que imagínate que este tipo es un tipo que gusta pineapple pizza.
Si comparamos con el tipo que tienes alrededor de ti,
podríamos decir que te gusta pineapple pizza,
así que lo clasificaremos como un red dot.
Pero esto no es muy democrático.
Esto es muy autoritario.
Es que decidimos una cantidad diferente de habitantes para comparar
con algunos otros puntos de datos y poder contrastar
si hay una posibilidad o dos posibilidades que tenemos que contar.
Vamos a hacer esto matemáticamente con distancia euclidiana
para estos puntos, pero es fácil entenderlo con este círculo.
Si tengo 1, 2, 3, 4, 5, 6, 7 puntos de datos, en vez de 1,
el ámbito será el oposito, porque ahora tienes 5 chicos
que no quieren pizza con pineapple, al lado de ti.
Así que la cantidad de habitantes que tu decides
es crítica al suceso de este método.
¿Cómo es que si estamos, las maestras hacen esto para nosotros,
porque se iteran hasta que se encuentren
el mejor número de habitantes posible?
¿Qué sería un buen caso de uso para este?
Cuando tienes comportamiento
basado en similar características,
es usado para agresión y clasificación,
pero funciona mejor en mi experiencia
para clasificación y para agresión.
Es mejor saber si este punto de datos es blu o red,
sino que saber la distancia a una posición cierta.
Finalmente, la máquina de supervector,
que es la más complexa de las que presento a ti ahora.
En este caso, lo que hacemos es
tratar de encontrar una posición,
una posición del espacio que crea
una división de los puntos de datos
en diferentes partes, en diferentes zonas.
Y luego lo que hacemos es
creamos un aeropuerto o
puede ser solo un aeropuerto de dos dimensiones,
si estamos en un espacio de tres dimensiones.
Y supervectores, que son estas zonas.
Y, en realidad, el problema de este método
es optimizar la función que crea los aeropuertos.
Para que estos aeropuertos sean
en la distancia máxima,
los supervectores
sean en la distancia máxima al aeropuerto.
Y simétricamente a las dos clases.
Vamos a empezar desde el principio,
pero desde el detalle.
Tenemos dos clases.
Uno está en la izquierda y el otro está en la izquierda.
Lo que hacemos ahora es tratar de encontrar.
Tengo una idea.
Ops, esa fue una muy mala idea.
Ojalá es bien si...
Sorry, I'm a slow thinker.
So the idea is I try to find the data points
that are the most close to each other.
And I do something like this.
I find the point in between,
but now I do not draw a perpendicular line,
but I find another line that is not perpendicular to this vector,
but to these and these other vectors.
Ok?
And then I draw this line.
And I have to draw this line,
where the hyperplane is going to divide the spaces.
Optimizing this function.
So in order that this vector has the smallest module,
it's in English, I don't know.
Better now.
Of course there is the possibility of having a circle here,
if not the method would be perfect.
But the good thing is that it has this so set kernel trick
that what it does is it checks the space
in order to see a position of the space
that will make this data have sense.
So it can rotate the space in order to find a position
where this is visible.
Ok?
So I think that was it.
As I said, do not be afraid of the learning of the computers.
But the problem is that even though they don't learn,
they have taken already over the world.
So that was it from my side.
I hope you got to know some things that you wanted to know
by coming here.
And as this is a big white topic,
I missed a lot of algorithms like probabilistic algorithms
or random forest,
which is a very lazy way of building models
because random forest will create a lot of decision trees for you.
So if you don't know what algorithm to choose,
choose random forest.
That's why if you didn't go through it.
But there are a lot of methods right now
and it's a really, really white topic.
I would have loved to talk about how to evaluate errors
on the classifier, not only by these raw proportions.
But I don't think I had time to do that.
So let's hope I come here a third time.
Thank you.
¡Suscríbete!
Subtítulos por la comunidad de Amara.org