This graph shows how many times the word ______ has been mentioned throughout the history of the program.
Jadi, me saludé un poco de subtítulos en escribirlos,
mis puntos de vista son que el edificio es justo un poco de dato.
Entonces, el problema es cómo ahoracelamos este diario
que es más desde el allí por aquí a needleu,
queайте de la parte inferior.
Entonces, usamos users de datos y ahora podemos methodology
balancear el data, el Warriors y todo la cheveraca,
el clave que está alrededor de todos,
si no lo tomamos, incluso si no tenemos el primer data.
Y obtenerlo de la web.
Así que ahora mismo, vamos a definir un gran data con v
como mucho data, ¿ok?
Y vamos a empezar desde el principio,
porque ¿qué es el propósito de extender data,
de tener cualquier data?
Entonces, el propósito es analizarlo, ¿verdad?
Y esta analización de data es algo
inerente de los seres humanos.
No es algo que hemos inventado
con la creación de la computadora.
Así que, vamos a empezar desde el principio.
Y el principio es realmente el principio.
Entonces, tenemos este hombre que, probablemente,
no es la vida anymore, un hunter en la edad histórica,
y él es más o menos un bordo de bordo.
El problema es que él estaba siente de comer
la misma cosa todos los días.
Y él decidió comenzar a cambiar su producto,
su bordo de bordo, con otros productos.
En este caso, vamos a tomar el ejemplo de un apple.
Después de algunas semanas, él empezó a sentir cansado,
y él no entiende por qué.
Entonces, él empieza un proceso analítico
con una pregunta, porque siempre empieza con una pregunta.
Y la pregunta en este caso es, ¿está bien para mí?
¿Está bien para mí?
¿Qué sería tu respuesta para cambiar un bordo de bordo
en un apple? ¿Es positivo para él? ¿Qué piensas?
Depende de cómo hay muchos apples.
Depende de cómo hay muchos apples, ¿sí?
Exacto, ese es el punto.
Entonces, vamos a necesitar un poco de data
para analizar si el trato es bueno o no.
Porque, sin data, somos blindes para analizarlo.
Entonces, depende de si es la respuesta correcta en este caso.
Entonces, vamos a ver, un bordo es difícil de obtener,
difícil de agarrar, ¿verdad?
Tienes que obtenerlo.
Él no tendrá que tenerlo agarrado, ¿verdad?
Es el caso rápido.
Entonces, no podrías extenderlo por un tiempo largo.
El buen punto es que cuando lo agarras,
estás cansado por mucho tiempo.
Sin embargo, un apple es más fácil de obtener.
Puedes ir a los tres y agarrarlo.
Puedes extenderlo por mucho tiempo que el bordo de bordo,
pero después que lo agarras,
puedes volver a obtenerlo.
Entonces, ahora que tenemos un poco de data,
podemos empezar el proceso de analizarlo.
Esto, por supuesto, comienza con una hipotesis,
que es, ¿es el tema bueno para mí o no?
La colección de data o, en este caso,
incluso la creación de data,
la procesación de data, que es,
en este caso, vamos a ver, ¿cuánto tiempo
me agarré después de agarrar una bordo de bordo?
Vamos a decir, 24 horas.
Y ¿cuánto tiempo me agarré después de agarrar un apple?
Vamos a decir, 3 horas.
Entonces, vamos a hacer los matos.
Un primer trato será, ¿cuántos apples?
8 apples, venga.
Entonces, nuestra conclusión es que el tercer problema
para nosotros es hacer este cambio de una bordo de bordo
para 8 apples, ¿vale?
Esa es la recomendación.
Entonces, si estás cambiando una bordo de bordo de bordo
para 1 apple, por favor, para más apples, puedes hacer eso.
Este chico es muy inteligente.
Él creó su negocio.
Por supuesto, él está pensando en mejores maneras
para hacer los tratos, como pueden ver.
Él creó su negocio, y ahora tiene un montón de bordo de bordo.
¿OK?
Él no solo lo agarró, pero lo agarró.
Y ahora él empieza distribuido los bordo
a diferentes áreas de su hometown.
Lo que él ve es que en cada trato,
tiene nuevos parámetros y nuevos datos
para coger y investigar.
Entonces, la cantidad de datos que él está construyendo,
y en este caso, memorizando, no tiene ninguna plataforma
para poner los datos, quizás una batalla,
es creciendo exponencialmente.
Cada nuevo parámetro tiene nuevos datos
para coger más colombias, más features, ¿OK?
Y también más reglas.
Cada intercambio sería una regla.
Cada nuevo parámetro, cada nuevo producto, cada nuevo lugar
sería un parámetro.
Esa es exactamente la problema que tenemos ahora mismo.
Let's jump to our time right now.
The problem is that data, or access to data,
is growing really, really, really fast.
I don't know if you know Moore's Law, which is about speed.
So it's about processing data, not about storing data.
But actually, the exponential progression is the same one.
So every Moore's Law says every two years,
I'm going to bring it to the data site.
Every two years, we double the amount of data
that we have access to every two years.
This is provided by Oracle, so it's a good source.
And as you can see, in 2008, we not even
had one zeta byte, which, by the way,
you have here the conversion is 1 billion terabytes
to put it in numbers that we understand.
So right now, we have 20 times more data
as we used to have in 2008.
So let's change the sentence.
Half of the data that we own will always
be created the last two years, which is kind of amazing.
So of course, to process this huge amount of data,
we need some new techniques.
This laptop is not enough anymore to process and store
all this data, of course.
So what we do is we shift to cloud computing.
Of course, we have this good thing humans
that we create solutions to problems.
But we have the other bad thing, we are lazy.
Before we don't have the requirement,
we don't create a solution, of course.
So what's happening right now is that data and the technical
solutions are having a battle to see which one arrives faster
and before each other.
So we have to design solutions while the amount of data
that we have and own is also increasing exponentially.
So the solutions have to adapt really quickly
to the changing environment.
Just to give you an example of what
can you do with cloud computing.
So how much would it cost a computer
like this that has 128 gigabytes of RAM, of memory RAM, and 32 cores?
How much you said?
5,000 at least, yeah?
So it's quite expensive.
You can actually rent it to Amazon for $6 per hour.
So if you have to launch any process that
would be impossible to do with this computer,
or that will take weeks in this computer,
you can rent a virtual machine for $6 per hour.
So it does it for you.
So in any case, you are, for example, launching a startup
or thinking, do I need to invest in computers right now?
Yeah, of course.
You have to invest in computers, user computers,
that will give you access as a console to the computers
that you will rent that are actually in Nevada
or whatever it is.
You don't care.
For storing the data, it's actually the same.
It has almost no cost.
That's what happens when we hear that now, hey,
there is no problem.
You can store millions of data without cost,
and you can process it really fast.
What happens, the first reaction is perfect.
Just give me more data.
And this only leads to disaster, because getting data
is still quite expensive.
And accumulating without any purpose,
you are wasting your money.
It's much more effective to do a different process.
En order to do this process, we are trying to generate new
professions related with big data.
Now I'm going to focus on the first guy there
and on this guy with the time.
The first guy is what we say data analyst.
And this guy is the one who will give you the relation
between the data and the questions of the business.
And this guy here is just the other guy five years older.
He's just a senior, and he's what we call data manager.
So all the other guys are more technical positions.
For example, this is the famous data scientist.
And between the data scientist and the data analyst,
the main difference is that the scientist
is more software developer oriented,
whereas the analyst is more business oriented.
So that's the main difference, because you will see
that there is a lot of confusion or a lot of information
being given right now about these two professions.
The other guys, they don't interest us actually right now.
And just for your information, those are average salaries
at the United States in 2014.
As you can see, the salaries are high,
and the reason is very simple.
When a department of based in data is working quite well,
a big company comes and buys out the whole department.
So of course, the salary is increasing all the time.
Let's focus on the data analyst.
By the way, this information comes from Datacamp,
which is a nice platform to do some online courses,
to teach you some art, for example.
So this guy, as we can see here,
he's also called Data Detective.
This is a cool way to say the same thing.
His role will be to understand.
I'm going to give quite a different explanation
to understand which data is required by the business.
Because as I said, to collect a whole amount of data
and store it, maybe it's affordable,
but it's losing your money.
Whereas the good thing is to know previously what data
you have to get in order to answer the questions.
And of course, before doing the analysis, you will not know.
So first, you have to have a lot of different data,
process it, understand which data you want to focus on,
and collect these certain data that you want to focus on,
that it's relevant for your business.
So that would be the process when
you have a data analyst working with you.
So now, the mountain has a climber,
which is a data analyst.
And what he's looking for is for minerals
to extract from the data.
So maybe you have heard about data mining,
which is exactly that, to check in the big mountain of data
which data is good, clean, and valuable enough
to provide insights, insights from the business.
As you can see, he's looking for some gold mines.
And that means, in terms of data,
to get which features are relevant and useful
for your business, to know how clean your data is.
Because sometimes you have a lot of data,
but it has a lot of missing values.
It's corrupted somehow.
And of course, when you know which data is relevant for you,
you're ready to start machine learning processes,
or, as I called it here, pattern discovery.
So machine learning, I'm going to take a time here,
because I don't have any slide to explain it.
Machine learning is just to teach a machine,
so to program a machine, to do a process automated process
based on statistical tools that some data scientists,
together with mathematicians, have
like, perfection under the name of algorithms.
And with these tools, we can mainly
get which data is relevant for us,
and also build predictive models.
So of course, for a business, to be able to foresee somehow
the future, it's great.
Imagine you could know what's going to happen in one year,
or which people send which articles on a newsletter.
So that's what we're doing right now with machine learning.
And I'm going to go over it again on the next slides.
So big data with V. Why did I start with V?
¿Anyone has a clue?
Volume, volume.
Yeah, so, yeah, yeah, exactly.
She's going to the point of the, there are a lot of articles
right now, talking about the three Vs,
four Vs of big data, even five Vs.
And I'm up to the last one, five Vs.
And these five Vs are, actually,
I was going to start clockwise, but I'm
going to start from the bottom.
So volume, we have the guy here,
which means that we can now process
a big volume of data, which means
that what you said at the beginning is now possible.
You can store data, even data that you
don't know what to do with, hide volumes of data,
and then start processing with the whole data
at the same time.
Now it's possible.
Before, you needed to create samples,
which brings me to validity.
Why big data provides validity to the results?
Well, because you're not working with samples,
and you don't have to test all the time.
If this sample is representative of this population,
so let's anti-test, please.
And V for velocity for speed, of course,
now we can do real-time analysis,
which was impossible before.
Because now we have this machine
that we rented for $6 or less, collecting the data,
doing the process, so we put the software also on the cloud,
and giving back the results.
And this Swiss knife represents variety.
Because, as you know already, this
is playable to almost any field.
You can do retail, you can do health care,
you can do anything that you imagine,
even if you don't own data, which
is a very important point.
What is the question mark, though?
We're missing one V. So this is the most important one,
and this is what changes the costs and the success
of the process of data mining, of big data,
of data analysis, which is focus on value.
And I will quote this guy, because it's very important.
It's very, I can't stress how important it is
to start the process thinking on which value do you want to add,
in this case, to your startup or to your business
or to whatever process you want to engage,
from the beginning.
It's really, really tempting in the big data world
to focus on how much are we doing, cool stuff.
But what is it providing for us?
Lately, there is this Harvard newspaper
that publishes really interesting articles.
There was an article with a header
that was 150 data scientists and still no business value.
Maybe some one of you read it.
It was an interesting approach to see how it's possible,
even in big companies, and probably I should say,
especially in big companies, to have a huge department
analyzing data and not extracting any value,
because everyone is thinking how to solve the technical problems
to analyze data, but who cares about the results
of this analysis.
So we need to focus on the value we want to extract
in order to reduce costs and reduce
the time of this analysis.
And that's all from my side.
Thank you very much.
Reversa del 100%