Gustav Soderstrom: Spotify | Lex Fridman Podcast #29

Lex Fridman Podcast

Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond. Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond.

Transcribed podcasts: 442
Time transcribed: 44d 12h 13m 31s

results.

Mention graph

This graph shows how many times the word ______ has been mentioned throughout the history of the program.

The following is a conversation with Gustav Sodastrom.
He's the chief research and development officer at Spotify,
leading their product design, data, technology, and engineering teams.
As I've said before, in my research and in life in general,
I love music, listening to it and creating it.
And using technology, especially personalization through machine learning
to enrich the music discovery and listening experience.
That is what Spotify has been doing for years, continually innovating,
defining how we experience music as a society in a digital age.
That's what Gustav and I talk about among many other topics,
including our shared appreciation of the movie True Romance,
in my view, one of the great movies of all time.
This is the Artificial Intelligence podcast.
If you enjoy it, subscribe on YouTube, give five stars on iTunes,
support on Patreon, or simply connect with me on Twitter at Lex Freedman,
spelled F-R-I-D-M-A-N.
And now here's my conversation with Gustav Sodastrom.
Spotify has over 50 million songs in its catalog.
So let me ask the all-important question.
I feel like you're the right person to ask.
What is the definitive greatest song of all time?
It varies for me personally.
So you can't speak definitively for everyone?
I wouldn't believe very much in machine learning if I did, right?
Because everyone had the same taste.
So for you, what is... you have to pick. What is the song?
All right, so it's pretty easy for me.
There was this song called You're So Cool,
Hans Zimmer, a soundtrack to True Romance.
It was a movie that made a big impression on me,
and it's kind of been following me through my life.
Actually, I had to play out my wedding.
I sat with the organist and helped him play it on an organ,
which was a pretty interesting experience.
That is probably my, I would say, top three movie of all time.
Yeah, this is an incredible movie.
Yeah, and it came out during my formative years.
And as I've discovered in music,
you shape your music taste during those years.
So it definitely affected me quite a bit.
Did it affect you in any other kind of way?
Well, the movie itself affected me back then.
It was a big part of culture.
I didn't really adopt any characters from the movie,
but it was a great story of love, fantastic actors.
And really, I didn't even know who Hans Zimmer was at the time,
but fantastic music.
And so that song has followed me,
and the movie actually has followed me throughout my life.
That was Quinto and Tarantino, actually, I think.
Directed or produced that.
So it's not Stairway to Heaven or Bohemian Rhapsody.
Those are great.
They're not my personal favorites,
but I've realized that people have different tastes,
and that's a big part of what we do.
Well, for me, I don't have to stick with Stairway to Heaven.
So 35,000 years ago, I looked this up on Wikipedia.
Flute-like instruments started being used in caves
as part of hunting rituals and primitive cultural gatherings,
things like that.
This is the birth of music.
Since then, we had a few folks, Beethoven, Elvis, Beatles,
Justin Bieber, of course, Drake.
So in your view, let's start high-level philosophical.
What is the purpose of music on this planet of ours?
I think music has many different purposes.
I think there's certainly a big purpose,
which is the same as much of entertainment,
which is escapism and to be able to live in some sort
of other mental state for a while.
But I also think you have the opposite of escaping,
which is to help you focus on something you are actually doing.
As I think people use music as a tool to tune the brain
to the activities that they are actually doing.
And it's kind of like, in one sense,
maybe it's the rawest signal.
If you think about the brain as neural networks,
it's maybe the most efficient hack we can do
to actually actively tune it into some state that you want to be.
You can do it in other ways.
You can tell stories to put people in a certain mood,
but music is probably very effective
to get you to a certain mood very fast, I think.
You know, there's a social component historically to music,
where people listen to music together.
I was just thinking about this, that to me,
and you mentioned machine learning,
but to me, personally, music is a really private thing.
I'm speaking for myself.
I listen to music.
Almost nobody knows the kind of things I have in my library,
except people who are really close to me,
and they really only know a certain percentage.
There's some weird stuff that I'm almost probably embarrassed by.
It's called the guilty pleasures, right?
Everyone has that.
The guilty pleasures, yeah.
Hopefully, they're not too bad.
But for me, it's personal.
Do you think of music as something that's social
or as something that's personal?
Or does it vary?
So I think it's the same answer that you use it for both.
We've thought a lot about this during these 10 years at Spotify,
obviously.
In one sense, as you said, music is incredibly social.
You go to concerts and so forth.
On the other hand, it is your escape,
and everyone has these things that are very personal to them.
So what we've found is that when it comes to most people claim
that they have a friend or two that they are heavily inspired by
and that they listen to.
So I actually think music is very social,
but in a smaller group setting, it's an intimate form of...
It's an intimate relationship.
It's not something that you necessarily share broadly.
Now, at concerts, you can argue you do,
but then you've gathered a lot of people
that you have something in common with.
I think this broadcast sharing of music is something we tried
on social networks and so forth,
but it turns out that people aren't super interested in...
This is what their friends listen to.
They're interested in understanding
if they have something in common, perhaps, with a friend,
but not just as information.
Right, that's really interesting.
I was just thinking of it this morning, listening to Spotify.
I really have a pretty intimate relationship with Spotify
with my playlists.
I've had them for many years now,
and they've grown with me together.
There's an intimate relationship you have
with a library of music that you've developed,
and we'll talk about different ways we can play with that.
Can you do the impossible task
and try to give a history of music listening
from your perspective, from before the internet
and after the internet,
and just kind of everything leading up to streaming
with Spotify and so on?
I'll try. It could be a 100-year podcast.
I'll try to do a brief version.
There are some things that I think are very interesting
during the history of music, which is that
before recorded music, to be able to enjoy music,
you actually had to be where the music was produced,
because you couldn't record it and time-shift it.
Creation and consumption had to happen at the same time,
basically concerts.
And so you either had to get to the nearest village
to listen to music, and while that was cumbersome,
and it severely limited the distribution of music,
it also had some different qualities,
which was that the creator could always interact
with the audience.
It was always live.
And also there was no time cap on the music.
So I think it's not a coincidence
that these early classical works,
they're much longer than the three minutes.
The three minutes came in as a restriction
of the first wax disc that could only contain
a three-minute song on one side, right?
So actually the recorded music severely limited the...
Or put constraints.
I won't say limit.
I mean, constraints are often good,
but it put very hard constraints on the music format.
So you kind of said, like, instead of doing this opus
of like many, you know, tens of minutes or something,
now you get three and a half minutes
because then you're out of wax on this disc.
But in return, you get an amazing distribution.
Your reach will widen, right?
Just on that point real quick,
without the mass-scale distribution,
there's a scarcity component where you kind of look forward to it.
We had that, it's like the Netflix versus HBO Game of Thrones.
You like wait for the event because you can't really listen to it.
So you like look forward to it and then it's...
You derive perhaps more pleasure
because it's more rare for you to listen to particular piece.
Do you think there's value to that scarcity?
Yeah, I think that that is definitely a thing.
And there's always this component of,
if you have something in infinite amounts,
will you value it as much?
Probably not.
Humanity is always seeking some...
It's relative, so you're always seeking something you didn't have
and when you have it, you don't appreciate it as much.
So I think that's probably true.
But I think that's why concerts exist.
So you can actually have both.
But I think net, if you couldn't listen to music
in your car driving,
that'd be worse, that cost would be bigger
than the benefit of the anticipation, I think, that you would have.
So, yeah, it started with live concerts.
Then it's being able to, you know, the phonograph invented, right?
You start to be able to record music.
Exactly. So then you got this massive distribution
that made it possible to create two things.
I think, first of all, cultural phenomenons.
They probably need distribution.
To be able to happen.
But it also opened access to, you know, for a new kind of artist.
So you started to have these phenomenons,
like Beatles and Elvis and so forth,
that were really a function of distribution, I think.
Obviously, of talent and innovation,
but there was also a technical component.
And of course, the next big innovation to come along
was radio, broadcast radio.
And I think radio is interesting
because it started not as a musical instrument,
it started not as a music medium.
It started as an information medium for news.
And then radio needed to find something to fill the time with
so that they could honestly play more ads and make more money.
And music was free.
So then you had this massive distribution
where you could program to people.
I think those things, that ecosystem,
is what created the ability for hits.
But it was also a very broadcast medium.
So you would tend to get these massive, massive hits,
but maybe not such a long tail.
In terms of choice of everybody listening to the same stuff.
Yeah, and as you said, I think there are some social benefits to that.
I think, for example, there is a high statistical chance
that if I talk about the latest episode of Game of Thrones,
we have something to talk about, just statistically.
In the age of individual choice, maybe some of that goes away.
So I do see the value of shared cultural components,
but I also obviously love personalization.
And so let's catch this up to the internet.
So maybe Napster.
Well, first of all, there's like MP3s.
Exactly.
There's tape, CDs.
There was a digitalization of music with a CD, really.
It was physical distribution, but the music became digital.
And so they were files, but basically boxed software,
to use a software analogy.
And then you could start downloading these files.
And I think there are two interesting things that happen.
Back to music used to be longer before it was constrained
by the distribution medium.
I don't think that was a coincidence.
And then really the only music genre to have developed mostly after
music was a file again on the internet is EDM.
And EDM is often much longer than the traditional music.
I think it's interesting to think about the fact
that music is no longer constrained in minutes per song or something.
It's a legacy of an old distribution technology.
And you see some of this new music that breaks the format.
Not so much as I would have expected actually by now,
but it still happens.
So first of all, I don't really know what EDM is.
Electronic dance music, you could say.
Avicii was one of the biggest in this genre.
So the main constraint is of time.
Something that three, four, five minutes song.
So you could have songs that were eight minutes,
10 minutes, and so forth.
Because it started as a digital product
that you downloaded.
So you didn't have this constraint anymore.
So I think it's something really interesting
that I don't think has fully happened yet.
We're kind of jumping ahead a little bit to where we are.
But I think there's tons of formal innovation in music
that should happen now that couldn't happen
when you needed to really adhere to the distribution constraints.
If you didn't adhere to that, you would get no distribution.
So Björk, for example, an Icelandic artist,
she made a full iPad app as an album.
That was very expensive, even though the App Store
has great distribution, she gets nowhere near the distribution
versus staying within the three-minute format.
So I think now that music is fully digital
inside these streaming services,
there is the opportunity to change the format again
and allow creators to be much more creative
without limiting their distribution ability.
That's interesting that you're right.
You're right, it's surprising that we don't see
that taking advantage more often.
It's almost like the constraints of the distribution
from the 50s and 60s have molded the culture
to where we want the three to five-minute song.
That anything else, not just...
So we want the song as consumers and as artists.
Like, because I write a lot of music
and I never even thought about writing something
longer than 10 minutes.
It's really interesting that those constraints...
Because all your training data
has been three and a half minutes long, right?
It's right.
Okay, so, yeah, digitization of data led to then MP3s.
Yeah, so I think you had this file then
that was distributed physically,
but then you had the components of digital distribution
and then the internet happened and there was this vacuum
where you had a format that could be digitally shipped
but there was no business model.
And then all these pirate networks happened.
Napster and in Sweden, Pirate Bay,
which was one of the biggest.
And I think from a consumer point of view,
which leads up to the inception of Spotify
from a consumer point of view,
consumers for the first time had this access model to music
where they could, without any marginal cost,
they could try different tracks.
You could use music in new ways.
There was no marginal cost.
And that was a fantastic consumer experience.
They have access to all the music ever made.
I think it was fantastic.
But it was also horrible for artists
because there was no business model around it.
So they didn't make any money.
So the user need almost drove the user interface
before there was a business model.
And then there were these download stores
that allowed you to download files.
Which was a solution.
But it didn't solve the access problem.
There was still a marginal cost of 99 cents
to try one more track.
And I think that heavily limits how you listen to music.
The example I always give is in Spotify,
a huge amount of people listen to music while they sleep,
while they go to sleep and while they sleep.
If that costed you 99 cents per three minutes,
you probably wouldn't do that.
And you would be much less adventurous
if there was a real dollar cost to explore music.
So the access model is interesting in that
it changes your music behavior.
You can take much more risk
because there's no marginal cost to it.
Maybe let me linger on piracy for a second.
Because I find, especially coming from Russia,
piracy is something that's very interesting.
To me, not me of course ever.
But I have friends who have partook in piracy
of music, software, TV shows, sporting events.
And usually to me what that shows
is not that they can actually pay the money
and they're not trying to save money.
They're choosing the best experience.
So what to me piracy shows
is a business opportunity in all these domains.
And that's where I think you're right.
Spotify stepped in is basically,
piracy is an experience.
You can explore, find music you like.
And actually the interface of piracy is horrible.
Because it's, I mean...
Bad metadata, long download times, all kinds of stuff.
And what Spotify does is basically
first rewards artists,
and second makes the experience
of exploring music much better.
I mean the same is true, I think, for movies and so on.
Piracy reveals, in the software space for example,
I'm a huge user and fan of Adobe products.
And there was much more incentive
to pirate Adobe products
before they went to a monthly subscription plan.
And now all of the said friends
that used to pirate Adobe products
that I know now actually pay,
gladly for the monthly subscription.
Yeah, I think you're right.
I think it's a sign of an opportunity
for product development.
And that sometimes there's a product market fit
before there's a business model fit
in product development.
I think that's a sign of it.
In Sweden, I think it was a bit of both.
There was a culture where we even had
a political party called the Pirate Party.
And this was during the time when people said
that information should be free.
It was somehow wrong to charge for ones and zeros.
So I think people felt that artists
should probably make some money somehow else
in concerts or something.
So at least in Sweden,
it was part really social acceptance
even at the political level.
But that also forced Spotify to compete with free,
which I don't think would actually
could have happened anywhere else in the world.
The music industry needed to be doing bad enough
to take that risk.
And Sweden was like the perfect testing ground.
It had government funded high bandwidth,
low latency broadband,
which meant that the product would work.
And it was also there was no music revenue anyway.
So they were kind of like,
I don't think this is going to work, but why not?
So this product is one that I don't think
could have happened in America,
the world's largest music market, for example.
So how do you compete with free?
Because that's an interesting world of the internet
where most people don't like to pay for things.
So Spotify steps in and tries to, yes, compete with free.
How do you do it?
So I think two things.
One is people are starting to pay for things on the internet.
I think one way to think about it
was that advertising was the first business model
because no one would put the credit card on internet.
Transactional with Amazon was the second.
And maybe subscription is the third.
And if you look offline,
subscription is the biggest of those.
So that may still happen.
I think people are starting to pay,
but definitely back then,
we needed to compete with free.
And the first thing you need to do
is obviously to lower the price to free.
And then you need to be better somehow.
And the way that Spotify was better
was on the user experience,
on the actual performance,
the latency of, you know,
even if you had high band with broadband,
it would still take you 30 seconds to a minute
to download one of these tracks.
So the Spotify experience of starting within the
perceptual limit of immediacy,
about 250 milliseconds,
meant that the whole trick was,
it felt as if you had downloaded all of Pirate Bay.
It was on your hard drive.
It was that fast, even though it wasn't.
And it was still free,
but somehow you were actually still being a legal citizen.
That was the trick that Spotify managed to pull off.
So, yeah, I've actually heard you say this or write this,
and that was a surprise that I wasn't aware of it,
because I just took it for granted.
You know, whenever an awesome thing comes along,
you're just like, oh, of course it has to be this way.
That's exactly right.
That it felt like the entire world's libraries
at my fingertips because of that latency being reduced.
What was the technical challenge in reducing the late?
So there was a group of really, really talented engineers,
one of them called Ludwig Strigius.
He wrote the, actually from Gothenburg,
he wrote the initial, the U-Torrent Client,
which is kind of an interesting backstory to Spotify,
you know, that we have one of the top developers
from BitTorrent Client as well.
So he wrote U-Torrent, the world's smallest BitTorrent Client.
And then he was acquired very early by Daniel and Martin,
who founded Spotify,
and they actually sold the U-Torrent Client to BitTorrent,
but kept Ludwig.
So Spotify had a lot of experience within peer-to-peer networking.
So the original innovation was a distribution innovation,
where Spotify built an end-to-end media distribution system.
Up until only a few years ago,
we actually hosted all the music ourselves.
So we had both the server side and the client,
and that meant that we could do things
such as having a peer-to-peer solution
to use local caching on the client side,
because back then the world was mostly desktop.
But we could also do things like hack the TCP protocols,
things like Nagle's algorithm
for kind of exponential back-off or ramp-up
and just go full throttle
and optimize for latency at the cost of bandwidth.
And all of this end-to-end control
meant that we could do an experience
that felt like a step change.
These days, we actually are on GCP.
We don't host our own stuff,
and everyone is really fast these days.
So that was the initial competitive advantage,
but then obviously you have to move on over time.
And that was over 10 years ago, right?
That was in 2008.
The product was launched in Sweden.
It was in a beta, I think, 2007.
And it was on the desktop, right?
It was desktop only.
There's no phone.
There was no phone.
The iPhone came out in 2008,
but the App Store came out one year later, I think.
So the writing was on the wall, but there was no phone yet.
You've mentioned that people would use Spotify
to discover the songs they like,
and then they would torrent those songs
so they can copy it to their phone.
Just hilarious.
Exactly.
Oh, not torrent, Pirate.
Seriously, Piracy does seem to be
a good guide for business models.
Video content.
As far as I know,
Spotify doesn't have video content.
Well, we do have music videos,
and we do have videos on the service,
but the way we think about ourselves is that
we're an audio service,
and we think that if you look at the amount of time
that people spend on audio,
it's actually very similar to the amount of time
that people spend on video.
So the opportunity should be equally big,
but today it's not at all valued.
Video is valued much higher.
So we think it's basically completely undervalued.
So we think of ourselves as an audio service,
but within that audio service,
I think video can make a lot of sense.
I think for when you're discovering an artist,
you probably do want to see them
and understand who they are,
to understand their identity.
You won't see that video every time.
No, 90% of the time the phone is going to be in your pocket.
For podcasters, you use video.
I think that can make a ton of sense.
So we do have video,
but we're an audio service where think of it
as we call it internally background-able video,
video that is helpful,
but isn't the driver of the narrative.
I think also if you look at YouTube,
the way people,
there's quite a few folks who listen to music on YouTube.
So in some sense, YouTube is a bit of a competitor
to Spotify,
which is very strange to me
that people use YouTube to listen to music.
They play essentially the music videos,
right, but don't watch the videos and put it in their pocket.
Well, I think it's similar to what,
strangely, maybe it's similar to what we were
for the Piracy Networks,
where YouTube, for very historical reasons,
have a lot of music videos.
So you use,
people use YouTube for a lot of the discovery part
of the process, I think.
But then it's not a really good,
sort of quote unquote, MP3 player,
because it doesn't even background.
Then you have to keep the app in the foreground.
So it's not a good consumption tool,
but it's a decently good discovery tool.
I mean, I think YouTube is a fantastic product,
and I use it for all kinds of purposes,
education also.
If I were to admit something,
I do use YouTube a little bit
for the discovery,
to assist in the discovery process of songs.
And then if I like it, I'll add it to Spotify.
But that's okay, that's okay with us.
Okay, so sorry, we're jumping around a little bit.
So there's kind of incredible,
you look at Napster,
look at the early days of Spotify.
How do you, one fascinating point is,
how do you grow a user base?
So you're there in Sweden,
you have an idea,
I saw the initial sketches that looked terrible.
How do you grow a user base
from a few folks to millions?
I think there are a bunch of tactical answers.
So first of all, I think you need a great product.
I don't think you take a bad product
and market it to be successful.
So you need a great product.
But sorry to interrupt,
but it's a totally new way to listen to music too.
So it's not just,
did people realize immediately
that Spotify is a great product?
I think they did.
So back to the point of piracy,
it was a totally new way to listen to music legally,
but people had been used to the access model in Sweden
and the rest of the world for a long time through piracy.
So one way to think about Spotify,
it was just legal and fast piracy.
And so people have been using it for a long time.
So they weren't alien to it.
They didn't really understand how it could be legal
because it seemed too fast and too good to be true,
which I think is a great product proposition
if you can be too good to be true.
But what I saw again and again
was people showing each other,
clicking the song,
showing how fast it started and say,
can you believe this?
So I really think it was about speed.
Then we also had an invite program
that was really meant for scaling
because we hosted our own servers.
We needed to control scaling,
but that built a lot of expectation.
I don't want to say hype,
because hype implies that it wasn't true.
Expectation, excitement around the product.
And we replicated that when we launched in the US.
We also built up an invite-only program first.
There are lots of tactics,
but I think you need a great product
that solves some problem.
And basically the key innovation,
there was technology,
but on a meta level,
the innovation was really the access model
versus the ownership model.
And that was tricky.
A lot of people said that they wanted to own their music.
They would never kind of rent it or borrow it.
But I think the fact that we had a free tier,
which meant that you get to keep this music for life as well,
helped quite a lot.
So this is an interesting psychological point
that maybe you can speak to.
It was a big shift for me.
It's almost like to go to therapy for this.
I think I would describe my early listening experience,
and I think a lot of my friends do,
is basically hoarding music.
As you're slowly one song by one song,
or maybe albums,
gathering a collection of music that you love.
And you own it.
It's like often, especially with CDs or tape,
you physically had it.
And with Spotify,
what I had to come to grips with,
and what's kind of liberating actually,
is to throw away all the music.
I've had this therapy session with lots of people.
And I think the mental trick is,
so actually we've seen the user data.
When Spotify started,
a lot of people did the exact same thing.
They started hoarding,
as if the music would disappear.
Almost the equivalent of downloading.
And so we had these playlists that had limits
of like a few hundred thousand tracks,
and we figured no one would ever, like, well, they do.
Knuts in hundreds and hundreds of thousands of tracks.
And to this day,
some people want to actually save,
quote unquote, in a play of the entire catalog.
But I think that the therapy session goes something like,
instead of throwing away your music,
if you took your files,
and you stored them in a locker at Google,
it'd be a streaming service.
It's just that in that locker,
you have all the world's music now for free.
So instead of giving away your music,
you got all the music.
It's yours.
You could think of it as having a copy
of the world's catalog there forever.
So you actually got more music instead of less.
It's just that you just took that hard disk
and you sent it to someone who stored it for you.
And once you go through that mental journey
of like, still my files, they're just over there,
and I just have 40 million or 50 million or something now,
then people are like, okay, that's good.
The problem is, I think,
because you paid us a subscription,
if we hadn't had the free tier,
where you would feel like, even if I don't want to pay anymore,
I still get to keep them.
You keep your playlists forever.
They don't disappear even though you stop paying.
I think that was really important.
If we would have started as,
you know, you can put in all this time,
but if you stop paying, you lose all your work.
I think that would have been a big challenge
and was the big challenge for a lot of our competitors.
That's another reason why I think the free tier
is really important,
that people need to feel the security
that the work they put in,
it will never disappear,
even if they decide not to pay.
I like how you put the work you put in.
I actually stopped even thinking of it that way.
I just, actually Spotify taught me to just enjoy music.
That's great.
As opposed to what I was doing before,
which is like, in an unhealthy way, hoarding music.
Which I found that because I was doing that,
I was listening to a small selection of songs way too much
to where I was getting sick of them.
Whereas Spotify, the more liberating kind of approach,
as I was just enjoying, of course,
I listened to Stairway to Heaven over and over,
but because of the extra variety,
I don't get as sick of them.
There's an interesting statistic I saw.
Spotify has, maybe you can correct me,
but over 50 million songs, tracks,
and over three billion playlists.
So 50 million songs and three billion playlists.
60 times more playlists than songs.
What do you make of that?
Yeah, so the way I think about it is that
from a statistician or machine learning point of view,
you have all these, if you want to think about
reinforcement learning, you have this state space
of all the tracks, and you can take different journeys
through this world.
I think of these as people helping themselves
and each other, creating interesting vectors
through this space of tracks.
And then it's not so surprising
that across many tens of millions of atomic units,
there will be billions of paths that make sense.
And we're probably pretty quite far away
from having found all of them.
So kind of our job now is users, when Spotify started,
it was really a search box that was,
for the time, pretty powerful.
And then I'd like to refer to this programming language
called playlisting, where if you,
as you probably were pretty good at music,
you knew your new releases, you knew your back catalog,
you knew you're starting with the heaven,
you could create a soundtrack for yourself
using this playlisting tool,
that's like meta-programming language for music,
to soundtrack your life.
And people who are good at music,
it's back to how do you scale the product.
For people who are good at music,
that wasn't actually enough.
If you had the catalog and a good search tool,
you can create your own sessions,
you could create really good a soundtrack
for your entire life, probably perfectly personalized
because you did it yourself.
But the problem was most people,
many people aren't that good at music,
they just can't spend the time.
Even if you're very good at music,
it's gonna be hard to keep up.
So what we did to try to scale this,
was to essentially try to build,
you can think of them as agents,
that this friend that some people had,
that helped them navigate this music catalog,
that's what we're trying to do for you.
But also, there is something like 200 million active
users on Spotify, so there, it's okay.
So from the machine learning perspective,
you have these 200 million people plus,
they're creating, it's really interesting to think of
a playlist as, I mean, I don't know if you meant it
that way, but it's almost like a programming language.
It's a release a trace of exploration
of those individual agents.
The listeners.
And you have all this new tracks coming in.
So it's a fascinating space that is ripe
for machine learning.
So is there, is it possible,
how can playlists be used as data
in terms of machine learning
to help Spotify organize the music?
So we found in our data, not surprising
that people who playlisted lots, they retained much better.
They had a great experience.
And so our first attempt was to playlist for users.
And so we acquired this company called Tunigo
of editors and professional playlisters
and kind of leveraged the maximum of human intelligence
to help build kind of these vectors
through the track space for people.
And that brought in the product.
Then the obvious next, and we used statistical means
where they could see when they created a playlist,
how did that playlist perform?
They could see skips of the songs,
they could see how the songs perform,
and they manually iterated the playlist
to maximize performance for a large group of people.
But there were never enough editors
to playlists for you personally.
So the promise of machine learning
was to go from kind of group personalization
using editors and tools and statistics
to individualization.
And then what's so interesting
about the three billion playlists we have is,
and the truth is we lucked out.
This was not a priori strategy, as is often the case.
It looks really smart, and hindsight was dumb luck.
We looked at these playlists,
and we had some people in the company,
a person named Erik Bernadson,
who was really good at machine learning already back then,
in like 2007, 2008.
Back then it was mostly collaborative filtering and so forth,
but we realized that what this is,
is people are grouping tracks for themselves
that have some semantic meaning to them,
and then they actually label it with a playlist name as well.
So in a sense, people were grouping tracks
along semantic dimensions and labeling them.
And so could you use that information
to find that latent embedding.
And so we started playing around with collaborative filtering,
and we saw tremendous success with it,
basically trying to extract some of these dimensions.
And if you think about it, it's not surprising at all.
It'd be quite surprising if playlists were actually random,
if they had no semantic meaning.
For most people, they grouped these tracks for some reason.
It just happened across this incredible data set
where people had taken these tens of millions of tracks
and grouped them along different semantic vectors.
And the semantics being outside the individual user,
so it's some kind of universal.
There's a universal embedding
that holds across people on this earth.
Yes, I do think that the embeddings you find
are going to be reflective of the people who playlisted.
So if you have a lot of indie lovers who playlists
or embedding is going to perform better there.
But what we found was that, yes,
there were these latent similarities.
They were very powerful.
And it was interesting because
I think that the people who playlisted the most initially
were the so-called music aficionados
who were really into music,
and they often had a certain...
their taste was often geared towards a certain type of music.
And so what surprised us,
if you look at the problem from the outside,
you might expect that
the algorithms would start performing best with mainstreamers first
because it somehow feels like an easier problem
to solve mainstream taste than really particular taste.
It was the complete opposite for us.
The recommendations performed fantastically
for people who saw themselves as having very unique taste.
That's probably because all of them playlisted
and they didn't perform so well for mainstreamers.
They actually thought they were a bit too particular
and unorthodox.
So we had the complete opposite of what we expected.
Success within the hardest problem first
and then had to try to scale to more mainstream recommendations.
So you've also acquired
Echo Nest
that analyzes song data.
So
in your view, maybe you can talk about
what kind of data is there
from a machine learning perspective.
There's a huge amount,
what we're talking about, playlisting
just user data
of what people are listening to,
the playlist they're constructing
and so on.
And then there's the actual data within a song.
What makes a song,
I don't know, the actual waveforms.
Is there any...
How do you mix the two?
How much values are in each?
To me, it seems like user data
is...
It's a romantic notion
that the song itself would contain useful information,
but if I were to guess,
user data would be much more powerful.
Playlist would be much more powerful.
Yeah, so we use both.
Our biggest success
initially was with
playlist data without understanding
anything about the structure of the song.
But when we acquired the Echo Nest,
they had the inverse problem.
They actually didn't have any play data.
They were a provider of recommendations,
but they didn't actually have any play data.
So they looked at the structural songs
sonically
and they looked at Wikipedia
for cultural references and so forth,
and did a lot of NLU and so forth.
So we got that skill into the company
and combined
our user data
with their
content based.
So you can think of it as we were user based
and they were content based in their recommendations.
And we combined those two.
And for some cases where you have a new song
that has no play data, obviously
you have to try to go by
either who the artist is
or the
sonic information in the song
or what it's similar to.
So there's definitely value in both
and we do a lot in both, but I would say yes.
The user data captures things
that have to do with culture
in the greater society
that you would never see
in the content itself.
But that said, we have seen
we have a research lab in Paris
and we can talk about more about that
on machine learning on the creator side.
What it can do for creators, not just for the consumers.
But where we looked at
how does the structure of a song actually affect
the listening behavior.
And it turns out that
we can predict things like skips
based on the song itself.
We could say that
maybe you should move that course a bit
because your skip is going to go up here.
There is a lot of latent structure in the music
which is not surprising
and hack.
So there should be structure.
That's probably what we respond to.
You just blew my mind actually
from the creator perspective.
So that's a really interesting topic
that probably most creators
aren't taken advantage of.
So I've recently got to interact
with a few folks,
YouTubers,
who are obsessed
with this idea of
what do I do
when other people keep watching
the video.
And then you look at the analytics
of which point do people turn it off
and so on.
First of all, I don't think that's healthy
because you can do it a little too much.
But it is a really powerful tool
for helping the creative process.
You just made me realize you could do
the same thing for
creation of music.
Is that something you've looked into?
Can you speak
to how much opportunity there is
for that kind of thing?
I listened to the podcast with Zirash
and I thought it was fantastic
and reacted to the same thing where he said
he posted something in the morning.
Immediately watched the feedback
where the drop-off was and then responded to that
in the afternoon.
Which is quite different from how
people make podcasts for example.
The feedback loop is almost non-existent.
So if we back out at one level,
I think
both for music and podcasts,
which we also do at Spotify,
I think there's a tremendous opportunity
just for the creation workflow.
I think it's really interesting
speaking to you because you're a
musician, a developer and a podcaster.
If you think about those three different roles,
if you make the leap as a musician,
if you think about it
as a software tool chain, really,
your door with the stems,
that's the IDE.
That's where you work in source code format
with what you're creating.
Then you sit around and you play with that
and when you're happy, you compile that thing
into some sort of AAC or MP3
or something. You do that because
you get distribution. There are so many run times
for that MP3 across the world in car stairs and stuff.
So if you kind of compile this executable
and you ship it out in kind of an old-fashioned
boxed software
analogy and then you hope for the best.
Right?
But as a software developer,
you would never do that.
First, you go and get help and you collaborate with other
creators and then
you think it'd be crazy to just ship one version
of your software without doing an A-B test
without any feedback loop
and then issue tracking.
Exactly. And then you would look
at the feedback loops and try to optimize that
thing. So I think
if you think about it as a very specific
software tool chain, it looks
quite arcane.
The tools that a music creator has
is what a software developer has.
So that's kind of how we think about it.
Why wouldn't
a music
creator have something like GitHub where you could collaborate
much more easily? So we
bought this company called Soundtrap
which has a kind of
Google Docs for music approach where
you can collaborate with other people on the
source code format with stems.
And I think introducing things
like AI tools there
to help you as you're creating music
both
in helping
you
put accompaniment to your music
like drums or something
help you master
and mix automatically
help you understand how this track will perform
exactly what you would expect as a
software developer. I think it makes a lot
of sense. And I think the same goes
for a podcaster. I think podcasters
will expect to have the same kind of feedback loop
that Siraj has.
Why wouldn't you? Maybe it's not
healthy but... Sorry.
I wanted to criticize the fact because you can
overdo it. Because a lot of the
we're in a new era
of that.
So you can
become addicted to it and
therefore
what people say you become a slave to the
YouTube algorithm.
It's always a danger
of a new technology as opposed to
say if you're creating a song
becoming too obsessed
about
the intro riff to the song
that keeps people listening versus
actually the entirety of the creation process.
It's a balance. But the fact that there's
zero... I mean you're blowing my mind
right now because you're
completely right that there's no
signal whatsoever. There's
no feedback whatsoever on the creation
process in music or podcasting
almost at all.
And are you
saying that Spotify
is hoping to help create tools
to, not tools
but... No, tools actually.
Actually tools for creators.
Absolutely. So we have
we've made some acquisitions
the last few years around music creation.
It's a company called Soundtrap which is
a digital audio workstation
that is browser based.
And their focus was really the Google Docs approach.
We can collaborate with people much more easily
than you could in previous
tools. So we have some of these tools
that we're working with that we want to make accessible
and then we can connect
it with our consumption data.
We can create this feedback loop where
we could help you understand
we could help you create
and help you understand how you will perform.
We also acquired this other company
within podcasting called Anchor
which is one of the biggest podcasting tools
mobile focused. So really
focused on simple creation
or easy access to creation.
But that also gives us this feedback loop.
And even before that
we invested in something
called Spotify
for Artists and Spotify for Podcasters
which is an app that you can download, you can verify
that you are that creator
and then you get
things that
software developers have had for years. You can see
where if you look at your podcast
for example on Spotify
or some day you released you can see
how it's performing, which cities it's performing
and who's listening to it, what's the demographic
breakup, so similar
in the sense that you can understand
how you're actually doing on the platform.
So we definitely
want to build tools. I think
you also interviewed
the Head of Research for Adobe
and I think that's
back to Photoshop that you like.
I think that's an interesting analogy as well.
Photoshop
I think has been very innovative in helping
photographers and
artists
and I think there should be the same kind of tools
for music creators
where you could get AI assistance for example
as you're creating music
as you can do with Adobe
where you can, I want a sky over here and you can get help
creating that sky.
The really fascinating thing is
what Adobe doesn't have
is a distribution
for the content you create.
So you don't have the data
if I create
whatever creation I make
in Photoshop or Premiere
I can't get immediate feedback
like I can on YouTube for example
about the way people are responding
and if Spotify is creating those
tools that's a really exciting
actually world
but
let's talk a little about podcasts.
So I have
trouble talking to one person
so it's a bit terrifying
and kind of hard to fathom
but on average
60 to 100,000
people will listen to this episode.
It's intimidating.
So I hosted on Blueberry
I don't know
if I'm pronouncing that correctly actually
it looks like most people listen to it
on Apple Podcasts, Cast Box
and Pocketcast and only about
a thousand
listen on Spotify
just my podcast.
So
do you see a time when
Spotify will dominate this?
So Spotify is relatively new
in podcasting
What's the deal with podcasting
in Spotify?
How serious is Spotify about podcasting?
Do you see a time where
everybody would listen to
probably a huge amount of people
majority perhaps listen to music
on Spotify?
Do you see a time when the same is true
for podcasting?
Well, I certainly hope so.
That is our mission. Our mission as a company
is actually to enable a million creators
to live off of their art
and a billion people inspired by it
and what I think is interesting about that mission is
it actually puts the creators first
even though it started as a consumer focused company
and it says to be able to live off of their art
not just make some money off of their art
as well.
It's a very ambitious project.
So we think about creators of all kinds
and
we kind of expanded our mission
from being music to being audio
a while back.
And that's not
so much because
we think we made that decision
we think that
decision was made for us
we think the world made that decision
whether we like it or not
when you put in your headphones
and make a choice between
music
and a new episode of
your podcast or something else
we're in that world whether we like it or not
and that's how radio work.
So we decided that
we think it's about audio
you can see the rise of audio books and so forth
we think audio is a great opportunity
so we decided to enter it
and obviously
Apple podcast is
absolutely dominating in
podcasting and we didn't have a single
podcast only like two years ago
what we did though was
we
we looked at this and said
can we bring something to this
we want to do this but back to
the original Spotify we had to do something that consumers actually value
to be able to do this and
the reason we've gone from not existing at all
to being the
quite a wide margin the second largest
podcast consumption
still a wide gap to iTunes
but we're growing quite fast
I think it's because when we looked
at the consumer problem
people said surprisingly that they wanted
their podcast and music
in the same application
so what we did was we took a little bit
of a different approach what we said instead of building
a separate podcast app
we thought is there a consumer problem to solve here
because the others are very successful already
and we thought there was in making
a more seamless experience where you can have
your podcast and your music
in the same application
because we think it's audio to you
and that has been successful and that meant
that we actually had 200 million people
to offer this to instead of starting from zero
so I think we have a good chance
because we're taking a different approach
than the competition and back to
the other thing I mentioned about
creators
because we're looking at the end-to-end flow
I think there's a tremendous amount of innovation
to do around podcast as a format
when we have creation tools and consumption
I think we could
start improving what podcasting is
I mean podcast is this
opaque big like one-two-hour
file that you're streaming
which it really doesn't make that much
sense in 2019 that
it's not interactive, there's no feedback loops
nothing like that so I think
if we're gonna win it's gonna have to be because
we build a better product for creators
and for consumers
so we'll see but it's certainly our goal
we have a long way to go
well the creators part is really exciting
you got me hooked there
it's the only stats I have
Blueberry just recently added the stats of
whether it's
listen to the end or not
and that's like a huge improvement
but that's still
nowhere to where you could possibly
go in terms of statistics
you just download this part of our podcast
and verify and then you'll know where people dropped out
in this episode
the moment I started talking
I might be depressed by this
but okay so
one other question
the original
Spotify for music
and I have a question about
podcasting in this line
is the idea of albums
I have
music aficionados
friends who are really
big fans of music
often really enjoy albums
listening to entire albums
of an artist
correct me if I'm wrong but I feel like
Spotify has helped
replace the idea of an album with
playlists so you create
your own albums
it's kind of the way at least I've
experienced music and I've really enjoyed it that way
one of the things that was
missing in podcasting for me
I don't know if it's missing
I don't know it's an open question for me
but the way I listen to podcasts is the way
you listen to albums
so I take Joe Rogan Experience
and that's an album
I put that on
and I listen one episode after the next
and there's a sequence and so on
is there room for
doing what
Spotify did for music
but creating playlists
sort of this kind of playlisting
idea of breaking apart from podcasting
from individual
podcast and creating kind of
this interplay
or have you thought about that space
it's a great question so I think
in music
you're right basically you bought an album
so it was like you bought a small catalog
of like ten tracks right it was
again it was actually a lot of
consumption you think it's about
what you like but it's based on the business model
you paid for this ten track
service and then you listen to that for a while
and then when everything was
that priced you tended to listen differently
now so I think
the album is still tremendously important
that's why we have it and you can save albums and so forth
and you have a huge amount of people who really listen
according to albums and I like that
because it is a creator format you can tell a longer
story over several tracks
and so some people listen to just one track
some people actually want to hear that whole
story
now in podcast I think
I think it's
different you can argue that podcasts
might be more like shows on Netflix
have like a full season of
Narcos and you're probably not going to do
like one episode of Narcos and then one of
House of Cards
there's a narrative there
and you love the cast
and you love these characters so I think
people love shows
and I think they will
listen to those shows
I do think you follow a bunch of shows at the same time
so there's certainly an opportunity to bring you the
latest episode of
the 5, 6, 10 things that you're into
but
I think people are going to listen
to specific hosts
and love those hosts
for a long time because I think there's something
different with podcast
where this format
of the
experience of the audience
is actually standing here right between us
whereas if you look at something on TV
the audio actually would come from
you would sit over there and the audio would come to you
both of us as if you were watching
not as if you were part of the conversation
so my experience of having to listen to podcasts
like yours and Joe Rogan
I feel like I know all of these people
they have no idea who I am but I feel like
I've listened to so many hours of them
it's very different from me watching
like a TV show or an interview
so I think you kind of
fall in love with people
and experience in a different way
so I think
shows and hosts are going to be very important
I don't think that's going to go away into some sort of
thing where you don't even know who you're listening to
I don't think that's going to happen
what I do think is I think there's a tremendous
discovery opportunity
in podcast because
the catalog is growing quite quickly
and
I think podcast is only
a few like 500
600,000 shows right now
if you look back to YouTube as another
analogy for creators
no one really knows if you would lift the lid on
YouTube but it's probably billions
of episodes
and so I think the podcast catalog would probably
grow tremendously because the creation
tools are getting easier
and then you're going to have this
discovery opportunity that I think is really big
so a lot of people tell me that they love their
shows but
discovering podcasts kind of suck
it's really hard to get into new show
they usually quite long, it's a big time investment
so I think there's plenty of opportunity
in the discovery part
yeah for sure 100%
and even the dumbest
there's so many low hanging fruit too
for example
just knowing
what episode to listen to
first to try out a podcast
exactly because most
podcasts don't have an order to them
they can be listened
to out of order and
sorry to say
some are better than others
episodes so some episodes
of Joe Rogan are better than others
and it's nice to know
which you should listen to
to try it out and there's
as far as I know almost no information
in terms of
like upvotes
on how good an episode is
exactly so I think part of the problem is
it's kind of like music
there isn't one answer people use music
for different things and there's actually many different types of music
there's a workout music and there's classical piano music
and focus music and
so forth
I think the same with podcasts some podcasts are sequential
they're supposed to be listened to
in order
it's actually telling a narrative
some podcasts are one topic
kind of like yours but
different guests so you could jump in anywhere
some podcasts actually have completely different topics
and for those podcasts it might be that
we should recommend
one episode because it's about
AI from someone
but then they talk about something that you're not interested in
the rest of the episode
so I think what we're spending a lot of time on now
is just first understanding the domain
and creating kind of the knowledge
graph of
how do these objects relate
and how do people consume and I think we'll find that it's going to be
it's going to be different
I'm excited
because you're the, Spotify
is the first people I'm aware of that are
trying to do this
for podcasting
podcasting has been like a wild west
until now
we want to be very careful though
because it's been a very good
wild west I think it's this fragile
ecosystem
we want to make sure that
you don't barge in
and say like oh we're going to
internetize this thing and
you have to think about the creators
you have to understand
how they get distribution today
how they make money today
make sure that their business model works
understand
I think it's back to doing something
improving their products
like feedback loops and distribution
so jumping back
into terms of this fascinating world
of recommender system
listening to music and using machine learning
to analyze things
do you think it's better to
what currently
correct me if I'm wrong but
currently Spotify lets people pick
what they listen to
there's a discovery process
but you kind of organize playlists
is it better
to let people pick what they listen to
or recommend what they should listen to
something like stations
by Spotify that I saw
that you're playing around with
maybe you can tell me what's the status of that
this is a Pandora style
app that just kind of
as opposed to you select the music you listen to
it kind of feeds
you the music you listen to
what's the status of stations by Spotify
what's its future
the store is Spotify as
we have grown has been that we made it more accessible
to different audiences
and
stations is another one of those where
the question is
some people want to be very specific
they actually want to hear stairway to heaven right now
that needs to be very easy to do
and some people or even the same person
at some point might say
I want to feel upbeat
or I want to feel happy
or I want songs to sing in the car
so they put in the information
at a very different level
and then we need to translate that into what that means
musically
so stations is a test to
create like a consumption input vector that is much simpler
where you can just tune it a little bit
and see if that increases the overall reach
but we're trying to kind of serve
the entire gamut of super advanced
so-called music aficionados
all the way to
to people who
they love listening to music
but it's not their number one priority in life
they're not going to sit and follow every new release
from every new artist
they need to be able to influence music
at a different level
so we're trying
you can think of it as different products
and I think one of the interesting things
to answer your question on
if it's better to let the user choose
or to play I think the answer is
the challenge when
when machine learning kind of came along
there was a lot of thinking about
what does product development mean
in a machine learning context
people like Andrew Eng for example
when he went to Baidu
he started doing a lot of practical machine learning
went from academia and you know
he thought a lot about this and he had this notion that
you know product manager
designer and engine they used to work around this wireframe
kind of describe what the product should look like
or something to talk about
when you're doing like a chat bot or a playlist
what are you going to say like it should be good
that's not a good product description
so how do you do that and he came up with this notion
that
the test set is the new wireframe
the job of the product manager is to source
a good test set that is representative of what
like if you say like I want to play this
that is songsticing in the car
job of the product manager is to go and source
like a good test set of what that means
then you can work with engineering to have algorithms
to try to produce that right
so we try to think a lot about
how to structure product development
for machine learning
age and what we discovered
was that a lot of it is actually in the expectation
and you can go
you can go two ways
so
let's say that
if you set the expectation with the user
that this is a discovery product like discover weekly
you're actually
setting the expectation that most of what we show you
will not be relevant
when you're in the discovery process you're going to accept that
actually if you find one gem
every Monday that you totally
love you're probably going to be happy
even though the statistical meaning
1 out of 10 is terrible or 1 out of
20 is terrible from a user point of view
because the setting was discovered it's fine
can I say to interrupt real quick
I just actually learned about discover weekly
which is a spotify
I don't know
it's a feature spotify that
shows you cool songs to listen
I uh
do issue tracking I couldn't find
on my spotify app
it's in your library
it's in the list of live because I was like
whoa this is cool I didn't know this existed
and I tried to find it
I would show it to you and feed back
to our product teams
there you go
but yeah so sorry
just to mention
the expectation there is basically
you're going to discover new songs
so then you can be quite adventurous
in the recommendations you do
but if you're
we have another product called Daily Mix
which kind of implies that these are only
going to be your favorites
so if you have 1 out of 10 that is good
and 9 out of 10 that doesn't work for you
you're going to think it's a horrible product
so actually a lot of the product development
we learned over the years is about setting
the right expectations so for Daily Mix
you know algorithmically we would pick
among things that feel very safe
in your taste space
because the expectation is
most of this is not going to
so a lot of that a lot of to answer your question there
a lot of should you let the user pick
or not it depends
we have some products where the whole point is
the user can click play put the phone in the pocket
and it should be really good music for like
an hour we have other products where
you probably need to say like no
no save no no
and it's very interactive
that makes sense and then the radio product
the stations product is one of these like click play
put in the pocket for hours
that's really interesting so you're
thinking of different test sets
for different
users and trying to create products that sort of
optimize
optimize for those test sets
that represents a specific
set of users yes I think
one thing that I think is
interesting is
we invested quite heavily in editorial
in people creating
playlists using statistical
data and that was successful for us and then
we also invested in machine learning
and for the longest time
within Spotify and within the
rest of the industry there was always this narrative of
humans versus the machine
algo versus editorial
and editors would say like well
if I had that data if I could see
your play listing history
and I made a choice for you I would have made a better choice
and they would have because they
they're much smarter than these algorithms
human is incredibly smart compared to
our algorithms
they can take culture into account and so forth
the problem is that they can't make
200 million decisions
you know per hour for every user
that logs in so the algo may be
not as sophisticated but much more efficient
so there was this
contradiction but then a few years ago
we started
focusing on this kind of human in the loop
thinking around machine learning
and we actually coined an internal
term for it called algo-torial
the combination of algorithms
and editors where
if we take a concrete example
you think of the editor
this
paid expert that we have there's really good at something like
soul, hip-hop
EDM something right there are two
experts no one in the industry
so they have all the cultural knowledge
you think of them as the product manager
and
let's say that you want to create
you think that there's
a product need in the world for something like songs to sing
in the car or songs to sing in the shower
I'm taking that example because it exists
people love to scream songs in the car
when they drive right
so you want to create that product and you have this product
manager who's a musical expert
they create they come up with a concept
like I think this is a missing thing in humanity
like a playlist called songs to sing in the car
they
they create the framing the image
the title and they create a test
set of they create a group of songs
like a few thousand songs out of
the catalog that they manually curate
that are known songs that are great to sing in the car
and they can take like
through romance into account they understand things that
our algorithms do not at all
so they have this huge set of tracks
then when we deliver that to you
we look at your taste vectors and you
get the 20 tracks that are songs to sing
in the car in your taste
so you have personalization
and editorial input
in the same process
if that makes sense
it makes total sense and I have several questions around that
this is like
fascinating
so first it is a little bit
surprising to me
that the world expert
humans are
outperforming machines
at
specifying songs to sing in the car
so
maybe you could talk to that a little bit
I don't know if you can put it into words but
what is it
how difficult is this problem
do you really
I guess what I'm trying to ask
is there how difficult is it to encode
the cultural references
the context
of the song, the artists
all those things together
can machine learning really not do that
I mean I think machine learning
is great at replicating patterns
if you have the patterns
but if you try to write
a spec of what song
great song to sing in the car definition is
is it loud
does it have many choruses to have been in movies
it quickly gets incredibly
complicated right
and a lot of it may not be
in the structure of the song or the title
it could be cultural references because
it was a history
so the definition problems
quickly get and I think that was
the insight of Andrew Eng
when he said that job of the product managers
understand these things that
algorithms don't and then
define what that looks like
and then you have something to train towards
then you have kind of the test set
and then so today the editors create
this pool of tracks and then we personalize
you could easily imagine that
once you have this set you could have some automatic exploration
of the rest of the catalog because then you understand
what it is
and then the other side of it when machine learning
does help is this
taste vector
to construct
a vector that represents the things
an individual human likes
this human preference
so you can
you know music isn't like
it's not like Amazon
like things you usually buy
music seems more amorphous
like it's this
thing that's hard to specify like
what is
if you look at my playlist
what is the music that I love it's harder
it seems to be
much more difficult to specify
concretely so how hard is it
to build a taste vector
it is very hard in the sense that you need
a lot of data and
I think what we found was that
so it's not a stationary
problem it changes over time
and so
we've gone through the journey of
if you've done a lot of computer vision
obviously I've done a bunch of computer vision
in my past and we started
kind of with the handcrafted heuristics
for you know
this is kind of in the music this is this
and if you consume this you probably like this
so we have
we started there and we have some of that still
then what was interesting about the playlist
data was that you could find these latent things that
wouldn't necessarily even make sense
to you
that could even capture maybe cultural references
because they co-occurred things that
wouldn't have appeared
mechanistically
either in the content or so forth
so
I think that
I think the core assumption is that
there are patterns
in almost everything
and if there are patterns
these embedding techniques are getting better
and better now as everyone else
we're also using
kind of deep embeddings where you can encode
binary values and so forth
and what I think
is interesting is this process
to try to find things that
do not necessarily
you wouldn't actually have
guessed so it is very hard
in an engineering
sense to find the right dimensions
it's an incredible scalability
problem to do for hundreds of millions
of users and to update it every day
but in theory
in theory
embeddings
isn't that complicated
the fact that you try to find some principal components
or something like that
dimensionality reduction and so forth
so the theory I guess is that the practice is very very hard
and it's a huge
engineering challenge but fortunately
we have some amazing both research
and engineering teams in this space
yeah I guess
the question is all
I mean it's similar I deal with the autonomous
vehicle space is the question is
how hard is driving
and here is
basically the question is of
edge cases
so
embedding probably works
not probably
but I would imagine works well
in a lot of cases
so there's a bunch of questions that arise
then so do
song preferences does your taste vector
depend on context
like mood
right
so there's different moods
absolutely
so how does that take
is it possible
to take that into consideration
or do you just leave that as
an interface problem that allows the user
to just control it so when I'm looking
for a workout music I kind of
specify it by choosing certain
playlists doing certain search
yeah so that's a great point
to the product development
you could try to spend a few years trying to predict
which mood you're in automatically
when you open Spotify
or you create a tab which is happy and sad
and you're going to be right 100% of the time with one click
now it's probably much better
to let the user tell you if they're happy or sad
or if they want to work out
on the other hand if your user interface becomes
2000 tabs you're introducing so much
friction so no one will use the product
so then you have to get better
so it's this thing where
maybe it was, I remember who coined it
but it's called full tolerant UIs
to build a UI that is tolerant to being
wrong and then you can be
much less right in your
algorithms
so we've had to learn a lot
of that, building the right UI that fits
where the machine learning
is and a great
discovery there which was
by the teams during
one of our hack days
was this thing of taking discovery packaging
into a playlist and saying that
these are new tracks
that we think you might like
based on this and setting the right expectation
made it a great product
so I think we have this benefit that
for example
Tesla doesn't have that
we can change the expectation
we can build a full tolerant setting
it's very hard to be full tolerant when you're driving at
100 miles per hour or something
and we have the luxury
of being able to say that
of being wrong if we have the right
UI which gives us
different abilities to take more risk
so I actually think the self-driving
problem is much harder
for sure
it's much less fun
because
people die
and in Spotify
it's
such a more fun problem because
failure
I mean failure is beautiful in a way
that leads to exploration
so it's a really fun reinforcement learning problem
and the worst case scenario is you get these WTF tweets
like how did I get this
which is a lot better than the self-driving
failure case
so
what's the feedback that a user provides
what's the signal
that a user provides
into the system
so you mentioned skipping
what is like the strongest signal
is you didn't mention clicking like
so
so we have a few signals that are important
obviously playing
playing through so one of the benefits
of music actually even compared to
podcast or
movies is the object itself
is really only about three minutes
so you get a lot of chances to recommend
and the feedback loop is
every three minutes instead of every two hours
or something so you actually get
kind of noisy but
quite fast feedback
so you can see if people played through
or if the inverse of skip really
that's an important signal
on the other hand much of the consumption
happens when your phone is in your pocket
maybe you're running or driving or you're playing on a speaker
and so you not skipping doesn't mean
that you love that song it might be that it wasn't bad enough
that you would walk up and skip
so it's a noisy signal
then we have the equivalent of the like
which is you save it to your library
that's a pretty strong signal of affection
and then
a explicit signal of play listing
like you took the time to create a playlist
you put it in there there's a very little
small chance that
if you took all that trouble this is not a really important
track to you and then we understand
also
what are the tracks it relates to so we have
we have the play listing we have the like and then we have
the listening or skip
and you have to
have very different approaches to all of them
because at different levels of noise
one is very voluminous but noisy
it's very rare but
you can probably trust it
yeah it's interesting because
I think between those signals captures
all the information you'd want to capture
I mean there's a feeling
a shallow feeling for me that there's sometimes
that I'll hear a song that's like yes
this is you know this was the right song
for the moment but there's really no way
to express that fact
except by listening through it all the way
and maybe playing it again
at that time or something
there's no need for a button that says
this was the best song
I could have heard at this moment
well we're playing around with that
with kind of the thumbs up concept saying like I really like this
just kind of talking to the algorithm
it's unclear if that's the best way
for humans to interact maybe it is
maybe they should think of Spotify
as a person an agent sitting there
trying to serve you and you can say like
that's Spotify, good Spotify
right now the analogy we've had is more
you shouldn't think of us
we should be investible
and the feedback is if you save it
it's kind of you work for yourself
you do a playlist because you think it's great
and we can learn from that
it's kind of back to Tesla
how they kind of have this shadow mode
they sit in what you drive
we kind of took the same analogy
we sit in what you playlist
and then maybe we can offer you an autopilot
we can take over for a while or something like that
and then back off if you say like
that's not good enough
if Spotify is an AI
that you talk to
which I think might be a bit too abstract
for many consumers
or if you still think of it as it's my music app
but it's just more helpful
and depends on the device
it's running on
which brings us to smart speakers
so I have a lot
of the Spotify listening I do is on
things that
on devices I can talk to
whether it's from Amazon, Google
or Apple
what's the role of Spotify in those devices
how do you think of it differently than
on the phone or on the desktop
there are a few things
to say about the first of all
it's incredibly exciting they're growing like crazy
especially here in the
in the US
and
it's solving
a consumer need that I think is
you can think of it as
just remote interactivity
you can control this thing
from across the room
and it may feel like a small thing but it turns out
that friction matters to consumers
being able to say play
pause and so forth from across the room
is very powerful
so basically you made the living room
interactive now
and
what we see in our data is that
the number one use case for these speakers
is music
and podcast
so fortunately for us
it's been important to these companies
to have those use case covered
so they want to Spotify on this we have very good relationships
with them
and we're seeing tremendous
success with them
what I think
it's interesting about them is
it's already working
we kind of
had this epiphany
many years ago
we started using sonos
if you went through all the trouble of setting up your sonos system
you had this magical experience
where you had all the music ever made
in your living room
and we made this assumption that
the home, everyone used to have a CD player at home
but they never managed to get their files
working in the home
having this network attached storage was too cumbersome
for most consumers
so we made the assumption that the home would skip from the CD
all the way to streaming box
where you would get
all the music built in
that took longer than we thought
but with the voice speakers that was the unlocking
that made kind of the connected speaker
happen in the home
so it really exploded
and we saw this engagement
that we predicted would happen
what I think is interesting though is where it's going
from now
right now you think of them as voice speakers
but I think if you look at
Google I.O. for example
they just added a camera
where when the alarm goes off
instead of saying
hey Google stop
you can just wave your hand
so I think they're going to think more of it
as an agent
or as an assistant
truly an assistant and an assistant that can see you
it's going to be much more effective than
a blind assistant
so I think these things will morph
and we won't necessarily think of them as
quote unquote voice speakers anymore
just as
we have access to the internet
in the home
but I still think that
the biggest use case for those
will be audio
so for that reason we're investing heavily in it
and we built our own NLU stack
to be able to
the challenge here is
how do you innovate in that world
it lowers friction for consumers
but it's also much more constrained
you have no pixels to play with
in an audio only world
so we started investing
and playing around quite a lot with that
trying to understand what the future will be
of you speaking and gesturing
and waving at your music
and actually you're actually nudging
closer to the autonomous vehicle space
because from everything I've seen
the level of frustration people experience
upon failure of natural
language understanding is much higher
than failure in other
people get frustrated really fast
so if you screw that
experience up even just a little bit
they give up really quickly
and I think you see that in the data
while it's tremendously
successful the most common
interactions are play, pause
and you know next
the things where if you compare it to
taking up your phone, unlocking it, bringing up the app
and clicking skip
it was much lower friction
but then
for longer more complicated things
can you find me that song about
bringing up the phone in search and then play it on their speaker
so we tried again to build a fault tolerant UI
where for the more
for the more complicated things
you can still pick up your phone, have powerful
full keyboard search
and then try to optimize for
where there is actually lower friction
it's kind of like the test autopilot thing
you have to be at the level where
you're helpful
if you're too smart and just in the way
people are going to get frustrated
and first of all I'm not obsessed with
it's just a good song
but let me mention that as a use case
because it's an interesting one
I've literally told
I don't want to say the name of the speaker
because when people are listening to it
it'll make their speaker go off
but I talk to the speaker and I say play
Stairway to Heaven
and every time
but a large percentage of the time plays
the wrong Stairway to Heaven
it plays like some cover of the
and
that part of the experience I actually wonder
from a business perspective
the Spotify control
that entire experience
or no it seems like the NLU
the natural language
stuff is controlled by the speaker
and then Spotify stays
at a layer below that
it's a good and complicated question
some of which is dependent
on the
on the partner so it's hard to comment on the
specifics but
the question is the right one
the challenge is
if you can't use any of the personalization
I mean we know which Stairway to Heaven
and the truth is maybe for
for one person it is exactly the cover that they want
and they would be very frustrated
if a place
I think we default to the right version
but you actually want to be able to do the cover
for the person that just played the cover 50 times
or Spotify is just going to seem stupid
so you want to be able to leverage the personalization
but you have this stack
where you have the
the ASR and this thing called the end best
list so the end best guesses
here and then the personalization comes in at the end
you actually want the personalization to be here
when you're guessing about what they actually meant
so we're working with these partners
and it's a complicated
it's a complicated thing
where you want to
you want to be able to, so first of all
you want to be very careful with your users data
you don't want to share your users data without the permission
but you want to share some data so that their experience gets better
so that these partners can understand
enough but not too much and so forth
so
the trick is that
it's like a business-driven relationship
where you're doing product development across
companies together
which is really complicated
but this is exactly why we built our own NLU
so that we actually
can make personalized guesses
because this is the biggest frustration
from a user point of view they don't understand
about ASRs and end best lists
and business deals, they're like how hard can it be
I've told this thing 50 times
this version and still it plays the wrong thing
it can't be hard
so we try to take that user approach
if the user is not going to understand
the complications of business
we have to solve it
let's talk about a complicated subject
that I myself
I'm quite
torn about
the idea of
paying artists
I saw
as of August 31st
2018
over $11 billion
were paid to rights holders
and further distributed
to artists from Spotify
so a lot of money is being
paid to artists, first of all
the whole time as a consumer
for me when I look at Spotify
I'm not sure
I'm remembering correctly but I think you said exactly
how I feel which is this is too good
to be true
when I start using
Spotify I assume you guys will go bankrupt
in like a month
this is too good
a lot of people did
this is amazing
so one question I have
is sort of the bigger question
how do you make money in this complicated world
how do you deal
with a relationship with record labels
who
are
complicated
these big
you essentially have the task
of hurting
cats but like rich
and powerful cats
and also
have the task of paying artists enough
and paying those labels enough
and still making money in the internet
space where people are not willing to pay
hundreds of dollars a month
so
how do you navigate the space
that's a beautiful description
hurting rich cats
it is very complicated
and I think
certainly
actually betting against Spotify has been statistically
a very smart thing to do
just looking at the
line of roadkill in music streaming services
it's kind of
I think if I understood the complexity
when I joined Spotify
fortunately I didn't know enough
about
the music industry to understand the complexities
because then I would have made a more rational guess
that it wouldn't work
ignorance is bliss
but I think
there have been a few distinct challenges
I think as I said
one of the things that made it work at all
was that Sweden and the Nordics
was a lost market
so there was
no risk for labels to try this
I don't think it would have worked
if the market was
healthy
so that was the initial condition
then we had
this tremendous challenge with the model itself
so
now most people were pirating
but for the people who bought a download or a CD
the artist would get
all the revenue for all the future plays
then
so you got it all up front
whereas the streaming model was like almost nothing they won
almost nothing they too
the curve of incremental revenue
would intersect with your
day one payment
and that took a long time to play out
before
the music labels they understood that
but on the artist side it took a lot of time to understand
that actually if I have a big hit
that is going to be played for many years
this is a much better model
because I get paid based on how much people use the product
not how much they thought they would use
day one or so forth
so it was a complicated model
but time helped
and now
the revenues to the music industry are bigger again
then
it is going through this incredible dip and now they are back up
so we are very
proud of having been
a part of that
so there have been distinct problems
I think when it comes to
the labels
we have taken the painful approach
some of our competition at the time
they kind of
looked at other companies
and said if we just ignore the rights
we get really big really fast
we are going to be too big
for the labels to fail
they are not going to kill us
we didn't take that approach, we went legal from day one
and we
negotiated and negotiated and negotiated
it was very slow, very frustrating
we were angry at seeing other companies taking shortcuts
and seeming to get away with it
it was this game theory thing
where over many rounds of playing the game
this would be the right strategy
and even though
clearly there is a lot of frustrations
at times during renegotiations
there is this weird trust
where we have been
honest and fair
we have never screwed them, they have never screwed us
it is ten years but
there is this trust in like
they know that if music doesn't get really big
if lots of people
do not want to listen to music and want to pay for it
Spotify has no business model
and we actually are incredibly
aligned
other companies have other business models
where even if they may
no money for music
there will still be profitable companies
but Spotify won't
I think the industry sees that
we are actually aligned business wise
so there is this trust
that allows us to
do product development
even if it is scary
taking risks
in itself was an incredible risk
for the music industry
to take, that they should get credit for
now some of it was that they had nothing to lose in Sweden
but frankly a lot of the labels also took risk
and so I think we built up that trust
with the
I think herding of cats
sounds a bit
what is the word? it sounds like
dismissive of the cats? dismissive
every cat matter they are all beautiful
and very important
exactly, they have taken a lot of risks
a lot of good
so it is really like playing
it is game theory, if you play the
if you play the game many times
then you can have this statistical outcome
that you bet on
and it feels very painful when you are in the middle of that thing
I mean there is risk
there is trust, there is relationships
just having read the biography
of Steve Jobs
similar kind of relationship were discussed
in iTunes
the idea of selling a song for a dollar
is not affordable
for labels
and there was no
it was the same kind of thing
it was trust, it was game theory
as a lot of relationships that had to be built
and it is really a terrifyingly
difficult process
that Apple could go through a little bit
because they could afford
for that process to fail
for Spotify it seems terrifying
because
you can't
a lot of it comes down to
honestly Daniel and his tenacity
in negotiating
which seems like an impossible
task
because he was completely unknown
but maybe that was also the reason
that it worked
but I think
I think game theory is probably the best way to think about it
you could go straight for this Nash equilibrium
that someone is going to defect
or you played many times
and you try to actually go for the top left
the corporations sell
is there any magical reason
why Spotify seems to have won
so a lot of people have tried to do
what Spotify tried to do
and Spotify has come out
well so the answer is that
there is no magical reason because I don't believe in magic
but I think there are
there are reasons
and I think some of them are that
people have
misunderstood
a lot of what we actually do
the actual
Spotify model is very complicated
they've looked at the premium model
and said it seems like you can charge
9.99 for music
and people are going to pay
but that's not what happened
actually when we launched the original mobile product
everyone said they would never pay
what happened was they started on the free product
and then their engagement grew so much
actually they said maybe it is worth
9.99
it's your propensity to pay gross
with your engagement
so we have this super complicated business model
where you operate two different business models
advertising and premium at the same time
and I think that is hard to replicate
I struggle to think of other companies
that run large scale advertising
and subscription products at the same time
so I think the business model
is actually much more complicated
than people think it is
so some people went after just the premium part
without the free part and ran into a wall
where no one wanted to pay
some people went after just
music should be free just ads
which doesn't give you enough revenue
and doesn't work for the music industry
so I think that combination is
kind of opaque from the outside
so maybe I shouldn't say it here and reveal the secret
but that turns out to be hard
to replicate
then you would think
so there is a lot of brilliant business strategy here
brilliance or luck
probably more luck but it doesn't really matter
it looks brilliant in retrospect
so let's call it brilliant
yeah when the books are written it will be brilliant
you've
mentioned that your philosophies
to embrace change
so
how will the music streaming
and music listening world
change over the next 10 years
20 years
you look out into the far future
what do you think
I think that music
and for that matter audio
podcast audio books
I think it's one of the few core human needs
I think it
there is no good reason to me why it shouldn't be
at the scale of something like
messaging or social networking
I don't think it's a niche thing
to listen to music or news or something
so I think scale is obviously one of the things that
I really hope for
I hope that it's going to be
everyone in the world gets access
to all the world's music ever made
so obviously I think it's going to be a much bigger business
otherwise we wouldn't be betting this big
now if you look more
at how it is consumed
what I'm hoping is back
to this analogy
of the software tool chain
where
I think sometimes
internally I make this analogy to
text messaging
text messaging was also based
on standards
in the area of mobile carriers
you had the SMS
140 characters
SMS
and it was great because everyone agreed on the standard
so as a consumer you got a lot of distributions
and interoperability
but it was a very constrained format
and when the industry wanted to add pictures
to that format to do the MMS
I looked it up and I think it took from the late 80s
to early 2000s
it was a very complex cycle
to bring pictures into that
now once that entire
value chain
of creation and consumption got wrapped in one software stack
within something like Snapchat
or WhatsApp
the first week they added disappearing messages
then two weeks later they added stories
the pace of innovation
when you're on one software stack
you can affect both creation and consumption
I think it's going to be rapid
so with these streaming services
for the first time in history
have enough I hope
people on one of these services
actually whether it's Spotify
or Amazon or Apple or YouTube
and hopefully enough creators
that you can actually start working with the format again
and that excites me
I think being able to change these constraints from 100 years
that could really do something
interesting I really hope
it's not just going to be the iteration
on the same thing
for the next 10 to 20 years as well
yeah changing the creation of music
the creation of audio, the creation of podcasts
is a really fascinating possibility
I myself don't understand
what it is about podcasts
that's so intimate
it just is, I listen to a lot of podcasts
I think it touches
on a human
on a deep human need
for connection that people do feel like
they're connected
to when they listen
I don't understand what the psychology of that is
but in this world
that's becoming more and more disconnected
it feels like
this is fulfilling a certain kind of need
and
empowering the creator
as opposed to just the listener
is really interesting
I'm really excited that you're working on this
yeah I think one of the things that is inspiring
for our teams to work on podcasts
is exactly that
whether you think like I probably do
that it's something biological about
perceiving to be in the middle of the conversation
that makes you listen in a different way
it doesn't really matter people seem to perceive it differently
and
there was this narrative for a long time that
you know if you look at video
everything kind of in the foreground it got shorter and shorter and shorter
because of financial
pressures and monetization and so forth
and eventually at the end there's almost like
20 seconds clip
people just screaming something
and
I'm really, I feel really good about
the fact that
you could have interpreted that as
people have no attention span anymore
they don't want to listen to things
they're not interested in deeper stories
like you know people are getting dumber
but then podcast came along and it's almost like
no no the need still existed
once but maybe
it was the fact that you're not prepared to look at your phone
like this for two hours
but if you can drive at the same time
it seems like people really want to dig deeper
and they want to hear like the more complicated version
so to me that is very inspiring
that podcast is actually long form
it gives me a lot of hope
for humanity that people seem
really interested in hearing deeper
more complicated conversations
I don't understand it
it's fascinating so the majority
for this podcast listen to the whole thing
this whole conversation
we've been talking for an hour and
45 minutes
and somebody will, I mean most people
will be listening to these words I'm speaking right now
you wouldn't have thought that
10 years ago with where the world seemed to go
so that's very positive I think
that's really exciting and empowering the creator
there is really exciting
last question
you also have a passion for
just mobile in general
how do you see the
smartphone world
the digital space
of
smartphones
and just everything that's on the move
whether it's
internet of things and so on
changing
over the next 10 years and so on
I think that one way to think about it
is that computing
might be moving out
of these multi-purpose devices
the computer we had in the phone
into specific
specific purpose devices
and it will be ambient that
at least in my home
you just shout something
and there's always one of these speakers close enough
and so
you start behaving differently
it's as if you have the internet
ambiently around you and you can ask
it things
so I think computing
will kind of get more integrated and we won't
necessarily think of it as
connected to a device in the same
way that we do today
I don't know the path to that maybe
we used to have these
desktop computers
and then we partially replaced that with
the laptops
we had desktop at home and at work and then
we got these phones and we started leaving
the laptop at home for a while and maybe
for
stretches of time you're going to start using the watch
and you can leave your phone at home
for a run or something and
we're on this progressive path where
I think what is happening with
the voice is that
you have an
interaction paradigm
that doesn't
require as large physical devices
so I definitely think there's a future where
you can have your
AirPods and your watch
and you can do a lot of computing
and
I don't think it's going to be this binary
thing, I think it's going to be like
many of us still have a laptop, we just use it
less and so you shift
your consumption over
and
I don't know about
AR glasses and so forth, I'm excited about it
I spend a lot of time in that area but I still think
it's quite far away.
AR VR
VR is happening and working, I think
the recent Oculus Quest
is quite impressive, I think AR
is further away, at least that type of AR
I think
but I do think
your phone
or watch or glasses understanding where you are
and being able to give you audio cues about
or you can say what is this and it tells you
what it is, that
I think might happen, you use
your watch or your glasses
as a mouse pointer on reality
I think it might be wrong
I hope I'm wrong, I think it might be
well before we walk around with these big lab
glasses that project things
I agree with you, it's actually
really difficult when you have to
understand the physical world
enough to project
onto it
I lied about the last question
because I just thought
of audio
and my favorite topic which is
the movie HER
do you think
whether it's part of Spotify or not
will have
I don't know if you've seen the movie HER
absolutely
and there
audio
is the primary form of interaction
and the connection with another
entity that you can actually
have a relationship with or fall in love
with based on
voice alone, audio alone
do you think that's possible first of all
based on audio alone to fall in love with somebody
somebody or
let's go with somebody, just have a relationship
based on audio alone
and second question to that
can we create an artificial intelligence
system
that allows one to fall in love
with it and HER
him with you
so this is my personal answer
speaking for me as a person
the answer is quite unequivocally
yes
on both
we just said about podcasts
and the feeling of being in the middle of a conversation
if you could have an
assistant where
and we just said that feels like a very personal
setting so if you walk around with these
headphones and this thing
all of the time that feels like it's in your brain
I think it's
it's going to be much easier to fall in love with
than something that would be on your screen
I think that's entirely possible
and then from the you can probably answer
better than me but from the concept
of if it's going to be possible
to build a machine that
that can achieve that
I think whether you
think of it as if you can fake it
the philosophical zombie that assimilates it enough
or it somehow actually is
I think there's
it's the only question if you ask me
about time I'd have to be a financier but if you say
I've given some half infinite
time absolutely I think it's
just atoms and
arrangement of information
well I personally think that
love is a lot simpler than people think
so we started with
true romance and ended in love
I don't see a better place to end
beautiful Gustav thanks so much
for talking today thank you so much it was a lot of fun
it was fun