Tuomas Sandholm: Poker and Game Theory | Lex Fridman Podcast #12

Lex Fridman Podcast

Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond. Conversations about science, technology, history, philosophy and the nature of intelligence, consciousness, love, and power. Lex is an AI researcher at MIT and beyond.

Transcribed podcasts: 442
Time transcribed: 44d 12h 13m 31s

results.

Mention graph

This graph shows how many times the word ______ has been mentioned throughout the history of the program.

The following is a conversation with Thomas Sanholm. He's a professor at SameU and co-creator of
Labratus, which is the first AI system to beat top human players in the game of Heads Up No Limit
Texas Hold'em. He has published over 450 papers on game theory and machine learning,
including a best paper in 2017 at NIPS, now renamed to New Rips, which is where I caught up with him
for this conversation. His research and companies have had wide-reaching impact in the real world,
especially because he and his group not only propose new ideas, but also build systems to
prove that these ideas work in the real world. This conversation is part of the MIT course on
Artificial Journal Intelligence and the Artificial Intelligence Podcast. If you enjoy it, subscribe
on YouTube, iTunes, or simply connect with me on Twitter at Lex Friedman, spelled F-R-I-D.
And now here's my conversation with Thomas Sanholm. Can you describe at the high level the game of
poker, Texas Hold'em, Heads Up Texas Hold'em, for people who might not be familiar with this
card game? Yeah, happy to. So Heads Up No Limit Texas Hold'em has really emerged in the AI community
as a main benchmark for testing these application-independent algorithms for imperfect information
game solving. And this is a game that's actually played by humans. You don't see that much on TV
or casinos because, well, for various reasons, but you do see it in some expert-level casinos,
and you see it in the best poker movies of all time. It's actually an event in the World Series
of Poker, but mostly it's played online, and typically for pretty big sums of money. And
this is a game that usually only experts play. So if you go to your home game on a Friday night,
it probably is not going to be Heads Up No Limit Texas Hold'em. It might be
No Limit Texas Hold'em in some cases, but typically for a big group, and it's not as
competitive. Well, Heads Up means it's two players, so it's really like me against you,
am I better, or are you better, much like chess or go in that sense. But an imperfect
information game, which makes it much harder because I have to deal with issues of you knowing
things that I don't know, and I know things that you don't know instead of pieces being
nicely laid on the board for both of us to see. So in Texas Hold'em, there's two cards that you
only see that belong to you, and they gradually lay out some cards that add up overall to five
cards that everybody can see. So the imperfect nature of the information is the two cards that
you're holding up front. Yeah. So as you said, you know, you first get two cards in private each,
and then there's a betting round. Then you get three cards in public on the table,
then there's a betting round. Then you get the fourth card in public on the table,
there's a betting round. Then you get the fifth card on the table, there's a betting round. So
there's a total of four betting rounds and four tranches of information revelation, if you will.
The only the first tranche is private, and then it's public from there.
And this is probably by far the most popular game in AI and just the general public in terms of
imperfect information. So it's probably the most popular spectator game to watch, right? So
which is why it's a super exciting game tackle. So it's on the order of chess, I would say,
in terms of popularity, in terms of AI setting it as the bar of what is intelligence. So in 2017,
Lebrados, how do you pronounce it? Libratos.
Lebrados beats. A little Latin there. A little bit Latin. Lebrados beats a few four expert human
players. Can you describe that event? What you learned from it? What was it like? What was the
process in general for people who have not read the papers and studied? Yeah. So the event was that
we invited four of the top 10 players with these specialist players in Heads Up No Limit Texas
Holden, which is very important because this game is actually quite different than the multiplayer
version. We brought them in to Pittsburgh to play at the reverse casino for 20 days. We wanted to
get 120,000 hands in because we wanted to get statistical significance. So it's a lot of hands
for humans to play, even for these top pros who play fairly quickly normally. So we couldn't just
have one of them play so many hands. 20 days, they were playing basically morning to evening,
and you raised 200,000 as a little incentive for them to play. And the setting was so that
they didn't all get 50,000. We actually paid them out based on how they did against the AI each.
So they had an incentive to play as hard as they could, whether they're way ahead or way behind
or right at the mark of beating the AI. And you don't make any money, unfortunately. Right. No,
we can't make any money. So originally, a couple of years earlier, I actually explored whether we
could actually play for money, because that would be, of course, interesting as well, to play against
the top people for money. But the Pennsylvania gaming board said no. So we couldn't. So this
is much like an exhibit, like for a musician or a boxer or something like that. Nevertheless,
you were keeping track of the money and brought us one close to $2 million, I think. So if it was
for real money, if you were able to earn money, that was a quite impressive and inspiring achievement.
Just a few details. What were the players looking at? I mean, were they behind a computer? What was
the interface like? Yes, they were playing much like they normally do. These top players, when
they play this game, they play mostly online. So they used to playing through a UI. Yes. And
they did the same thing here. So there was this layout, you could imagine, there's a table on a
screen. There's the human sitting there. And then there's the AI sitting there. And the screen shows
everything that's happening. The cards coming out and shows the bets being made. And we also had
the betting history for the human. So if the human forgot what had happened in the hand so far,
they could actually reference back and so forth. Is there a reason they were given access to the
betting history for? Well, we just, it didn't really matter that they wouldn't have forgotten
anyway, these are top quality people. But we just wanted to put out there so it's not a question
of a human forgetting and the AI somehow trying to get that advantage of better memory.
So what was that like? I mean, that was an incredible accomplishment. So what did it feel like
before the event? Did you have doubt, hope? Where was your confidence at?
Yeah, that's great. Great question. So 18 months earlier, I had organized a similar brains versus
AI competition with our previous AI called Cloudical. And we couldn't beat the humans.
So this time around, it was only 18 months later. And I knew that this new AI Liberatus was way
stronger. But it's hard to say how you'll do against the top humans before you try. So I thought we
had about a 50 50 shot. And the international betting sites put us as a us as a four to one
or five to one underdog. So it's kind of interesting that people really believe in people and get over
AI. Not just people, people don't just believe over believing themselves, but they have over
confidence in other people as well, compared to the performance of AI. And yeah, so we were a four
to one or five to one underdog. And even after three days of beating the humans in a row,
we were still 50 50 on the international betting sites.
Do you think there's something special and magical about poker and in the way people
think about it? In the sense you have, I mean, even in chess, there's no Hollywood movies,
poker is the star of many movies. And there's this feeling that certain human facial expressions
and body language, eye movement, all these tells are critical to poker. Like you can look into
somebody's soul and understand their betting strategy and so on. So that's probably why
possibly do you think that is why people have a confidence that humans will outperform because AI
systems cannot in this construct perceive these kinds of tells, they're only looking at betting
patterns and nothing else, the betting patterns and statistics. So
what's more important to you if you step back and human players, human versus human,
what's the role of these tells of these ideas that we romanticize?
Yeah. So I'll split it into two parts. So one is why do humans trust humans more than AI and
have overconfidence in humans? Yes, I think that's that's not really related to the tell
question. It's just that they've seen these top players how good they are and they're really
fantastic. So it's just hard to believe that the AI could beat them. So I think that's where that
comes from. And that's actually maybe a more general lesson about AI that until you've seen it
overperform a human, it's hard to believe that it could. But then the tells, a lot of these top
players, they're so good at hiding tells that among the top players, it's actually not really
worth it for them to invest a lot of effort trying to find tells in each other, because there's
so good at hiding them. So yes, at the kind of Friday evening game, tells are going to be a huge
thing. You can read other people and if you're a good reader, you'll read them like an open book.
But at the top levels of poker now, the tells become a less much smaller and smaller
aspect of the game as you go to the top levels. The amount of strategies, the amounts of possible
actions is very large, 10 to the power of 100 plus. So there has to be some, I've read a few
of the papers related, it has to form some abstractions of various hands and actions.
So what kind of abstractions are effective for the game of poker?
Yeah, so you're exactly right. So when you go from a game tree that's 10 to the 161,
especially in an imperfect information game, it's way too large to solve directly, even with our
fastest equilibrium finding algorithms. So you want to abstract it first. And abstraction
in games is much trickier than abstraction in MDPs or other single agent settings.
Because you have these abstraction pathologies that if I have a finer grained abstraction,
the strategy that I can get from that for the real game might actually be worse than the strategy
I can get from the coarse grained abstraction. So you have to be very careful.
Now the kinds of abstractions just to zoom out, we're talking about, there's the hands
abstractions, and then there's betting strategies, betting actions. So there's information
abstraction to talk about general games, information abstraction, which is the abstraction
of what chance does. And this would be the cards in the case of poker. And then there's
action abstraction, which is abstracting the actions of the actual players, which would be
bets in the case of poker yourself, any other players, yes, yourself and other players. And
for information abstraction, we were completely automated. So these were these are algorithms,
but they do what we call potential aware abstraction, where we don't just look at the
value of the hand, but also how it might materialize into good or bad hands over time. And it's a
certain kind of bottom up process with integer programming there and clustering and various
aspects, how do you build build this abstraction. And then in the action abstraction, there,
it's largely based on how humans other and other AI's have played this game in the past.
But in the beginning, we actually use an automated action abstraction technology,
which is provably convergent, that it finds the optimal combination of bet sizes, but it's not
very scalable. So we couldn't use it for the whole game, but we use it for the first couple
of betting actions. So what's more important, the strength of the hand, so the information
abstraction or the how you play them, the actions, does it, you know, the romanticized notion again
is that it doesn't matter what hands you have, that the actions, the betting, maybe the way you
win, no matter what hands you have. Yeah, so that's why you have to play a lot of hands,
so that the role of luck gets smaller. So you could otherwise get lucky and get some good hands,
and then you're going to win the match. Even with thousands of hands, you can get lucky.
Because there's so much variance in no limit, Texas hold them, because if we both go all in,
it's a huge stack of variance. So there are these massive swings in no limit Texas hold them.
So that's why you have to play not just thousands, but over 100,000 hands to get statistical
significance. So let me ask another way this question. If you didn't even look at your hands,
but they didn't know that the opponents didn't know that, how well would you be able to do?
Oh, that's a good question. There's actually, I heard the story that there's this Norwegian
female poker player called Annette Oberstad, who's actually won a tournament by doing exactly that.
But that would be extremely rare. So you cannot really play well that way.
Well, okay. So the hands do have some role to play. So Lebrados does not use,
as far as I understand, use learning methods, deep learning. Is there room for learning in the,
you know, there's no reason why Lebrados doesn't, you know, combine with an AlphaGo type approach
for estimating the quality for a function estimator. What are your thoughts on this?
Maybe as compared to another algorithm, which I'm not that familiar with deep stack,
the engine that does use deep learning that is unclear how well it does, but nevertheless uses
deep learning. So what are your thoughts about learning methods to aid in the way that Lebrados
plays the game of poker? Yeah. So as you said, Lebrados did not use learning methods and played
very well without them. Since then, we have actually, actually here, we have a couple of papers on
things that do use learning techniques. Excellent. So and deep learning in particular. And sort of
the way you're talking about, where it's learning an evaluation function. But in imperfect information
games, unlike, let's say, in Go or now also in chess and shogi, it's not sufficient to learn
an evaluation for a state, because the value of an information set depends not only on the
exact state, but it also depends on both players beliefs. Like if I have a bad hand, I'm much
better off if the opponent thinks I have a good hand. And vice versa, if I have a good hand,
I'm much better off if the opponent believes I have a bad hand. So the value of a state is not
just a function of the cards. It depends on if you will, the path of play, but only to the extent
that it's captured in the belief distributions. So that's why it's not as simple as it is in
perfect information games. And I don't want to say it's simple there either. It's, of course,
very complicated computationally there too. But at least conceptually, it's very straightforward.
There's a state, there's an evaluation function, you can try to learn it. Here,
you have to do something more. And what we do is in one of these papers, we're looking at
allowing where we allow the opponent to actually take different strategies at the leaf of the
search tree, if you will. And that is a different way of doing it. And it doesn't assume, therefore,
a particular way that the opponent plays. But it allows the opponent to choose from a set of
different continuation strategies. And that forces us to not be too optimistic in a look ahead
search. And that's, that's one way you can do sound look ahead search in imperfect information
games, which is very different, difficult. And in US, you were asking about deep stack,
what they did, it was very different than what we do, either in Libertadores or in this new work.
They were generally randomly generating various situations in the game. Then they were doing
the look ahead from there to the end of the game, as if that was the start of a different game.
And then they were using deep learning to learn those values of those states, but the states
were not just the physical states, they include belief distributions. When you talk about look
ahead, or deep stack, or with Lebrados, does it mean considering every possibility that the game
can evolve? Are we talking about extremely sort of this exponentially growth of a tree?
Yes. So we're talking about exactly that. Much like you do in alpha beta search or
want to crawl to research, but with different techniques. So there's a different search algorithm.
And then we have to deal with the leaves differently. So if you think about what Lebrados did,
we didn't have to worry about this, because we only did it at the end of the game.
So we would always terminate into a real situation, and we would know what the payout is.
It didn't do these depth limited lookaheads. But now in this new paper, which is called depth
limited, I think it's called depth limited search for imperfect information games, we can actually
do sound depth limited look ahead. So we can actually start to do the look ahead from the
beginning of the game on, because that's too complicated to do for this whole long game.
So in Lebrados, we were just doing it for the end. So and then the other side,
this belief distribution. So is it explicitly modeled what kind of beliefs that the opponent
might have? Yeah, it is explicitly modeled, but it's not assumed. The beliefs are actually output,
not input. Of course, the starting beliefs are input, but they just fall from the rules of the
game because we know that the dealer deals uniformly from the deck. So I know that every pair of cards
that you might have is equally likely. I know that for a fact, that just follows from the rules
of the game, of course, except the two cards that I have, I know you don't have those.
You have to take that into account. That's called card removal. And that's very important.
Is the dealing always coming from a single deck in heads up? Yes. So you can assume
single deck. So you know that if I have the ace of spades, I know you don't have an ace of spades.
Great. So in the beginning, your belief is basically the fact that it's a fair
dealing of hands. But how do you start to adjust that belief? Well, that's where this beauty of
game theory comes. So Nash equilibrium, which John Nash introduced in 1950, introduces what
rational play is when you have more than one player. And these are pairs of strategies where
strategies are contingency plans, one for each player. So neither player wants to deviate to a
different strategy, given that the other doesn't deviate. But as a side effect, you get the beliefs
from Bayes rule. So Nash equilibrium really isn't just deriving in these imperfect information
games. Nash equilibrium, it doesn't just define strategies. It also defines beliefs for both of
us. And the refines beliefs for each state. So the each state, each, if they call information
sets at each information set in the game, there's a set of different states that we might be in.
But I don't know which one we're in. Nash equilibrium tells me exactly what is a probability
distribution over those real world states in my mind. How does Nash equilibrium give you that
distribution? So why I'll do a simple example. So you know, the game Rock Peace paper scissors.
So we can draw it as player one moves first, and then player two moves. But of course,
it's important that player two doesn't know what player one moved. Otherwise, player two would win
every time. So we can draw that as an information set where player one makes one or three moves first.
And then there's an information set for player two. So player two doesn't know
which of those nodes the world is in. But once we know the strategy for player one,
Nash equilibrium will say that you play one third rock, one third paper, one third scissors.
From that, I can derive my beliefs on the information set that they're one third, one third,
one third. So Bayes gives you that basic issue. But is that specific to a particular player?
Or is it? Is it something you quickly update with those? No, the game theory isn't really
player specific. So that's also why we don't need any data. We don't need any history,
how these particular humans played in the past or how any AI or even had played before. It's all
about rationality. So we just think the AI just thinks about what would a rational opponent do?
And what would I do if I were I am rational and what that that's the idea of game theory.
So it's really a data free opponent free approach. So it comes from the design of the game as
opposed to the design of the player. Exactly. There's no opponent modeling per se. I mean,
we've done some work on combining opponent modeling with game theory so you can exploit
weak players even more. But that's another strand and in Libraders, we didn't turn that on.
So I decided that these players are too good. And when you start to exploit an opponent,
you typically open yourself up to exploitation. And these guys have so few holes to exploit and
they're world's leading experts in counter exploitation. So I decided that we're not going
to turn that stuff on. Actually, I saw a few of your papers exploiting opponents sound very
interesting to explore. Do you think there's room for exploitation generally outside of
Libraders? Is there a subject or people differences that could be exploited? Maybe not just in poker,
but in general interactions and negotiations, all these other domains that you're considering?
Yeah, definitely. We've done some work on that. And I really like the work that hybridizes the
two. So you figure out what would a rational opponent do. And by the way, that's safe in these
zero sum games, two player zero sum games, because if the opponent does something irrational,
yes, it might throw off my beliefs. But the amount that the player can gain by throwing
off my belief is always less than they lose by playing poorly. So it's safe. But still,
if somebody is weak as a player, you might want to play differently to exploit them more.
So you can think about it this way, a game theoretic strategy is unbeatable. But it doesn't
maximally beat other opponents. So the winnings per hand might be better with a different strategy.
And the hybrid is that you start from a game theoretic approach. And then as you gain data
from about the opponent, in certain parts of the game tree, then in those parts of the game tree,
you start to tweak your strategy more and more towards exploitation, while still staying fairly
close to the game theoretic strategy, so as to not open yourself up to exploitation too much.
How do you do that? Do you try to vary up strategies, make it unpredictable? It's like,
what is it, tit for tat strategies in Prisoner's Dilemma?
Well, that's a repeated game, kind of simple Prisoner's Dilemma, repeated games. But even
there, there's no proof that says that that's the best thing. But experimentally, it actually
does well. So what kind of games are there, first of all? I don't know if this is something that
you could just summarize. There's perfect information games, there are all the information
on the table. There is imperfect information games. There's repeated games that you play over and over.
There's zero sum games. There's non zero sum games. And then there's really important distinction,
you're making two player versus more players. So what are what other games are there? And what's
the difference, for example, with this two player game versus more players? What are the key differences
right here? So let me start from the basics. So a repeated game is a game where the same exact
game is played over and over. In these extensive form games, where you think about three form,
maybe with these information sets to represent incomplete information,
you can have kind of repetitive interactions, even repeated games are a special case of that,
by the way. But the game doesn't have to be exactly the same. It's like in sourcing options.
Yes, we're gonna see the same supply base year to year. But what I'm buying is a little different
every time and the supply base is a little different every time and so on. So it's not really
repeated. So to find a purely repeated game is actually very rare in the world. So they're really
a very coarse model of what's going on. Then if you move up from just repeated simple, repeated
matrix games, not all the way to extensive form games, but in between, there's stochastic games,
where, you know, there's these, you think about it like these little matrix games. And when you
take an action and your home takes an action, they determine not which next state I'm going to,
next game I'm going to, but the distribution over next games, where I might be going to.
So that's the stochastic game. But it's like, matrix games, repeated stochastic games,
extensive form games, that is from less to more general. And poker is an example of the last one.
So it's really in the most general setting, extensive form games. And that's kind of what the AI
community has been working on and being benchmarked on with this heads up no limit, Texas Holden.
Can you describe extensive form games? What's what's the model here?
Yeah, so if you're familiar with the tree form, so it's really the tree form, like in chess,
there's a search tree versus a matrix versus a matrix. Yeah. And that's the matrix is called the
matrix form or by matrix form or normal form game. And here you have the tree form. So you can
actually do certain types of reasoning there, that you lose the information when you go to
normal form. There's a certain form of equivalence, like if you go from three form and you say it,
every possible contingency plan is a strategy, then I can actually go back to the normal form.
But I lose some information from the lack of sequentiality. Then the multiplayer versus
two player distinction is an important one. So two player games in zero sum are conceptually
easier and computationally easier. They're still huge like this one, this one. But they're conceptually
easier and computationally easier. In that conceptually, you don't have to worry about
which equilibrium is the other guy going to play when there are multiple, because any
equilibrium strategy is a best response to any other equilibrium strategy. So I can play a different
equilibrium from you and we'll still get the right values of the game that falls apart even
with two players when you have a general sum games, even without cooperation, just even without
cooperation. So there's a big gap from two player zero sum to two player general sum, or even to
three players zero sum. That's a big gap, at least in theory. Can you maybe non mathematically
provide the intuition why it all falls apart with three or more players? It seems like you
should still be able to have a Nash equilibrium that that's instructive, that holds. Okay,
so it is true that all finite games have a Nash equilibrium. So this is what your Nash
actually proved. So they do have a Nash equilibrium. That's not the problem. The problem is that there
can be many. And then there's a question of which equilibrium to select. So and if you select your
strategy from a different equilibrium, and I select mine, then what does that mean? And in this
non zero sum games, we may lose some joint benefit where by being just simply stupid, we could actually
both be better off if we did something else. Yes. And in three player, you get other problems also
like collusion, like maybe you and I can get up on a third player, and we can do radically better
by colluding. So there are lots of issues that come up there. So no Brown, the student you work
with on this has mentioned, I looked through the AMA on Reddit, he mentioned that the ability of
poker players to collaborate would make the game. He was asked the question of, how would you make the
game of poker? Or both of you were asked the question, how would you make the game of poker
beyond being solvable by current AI methods? And he said that there's not many
ways of making poker more difficult. But collaboration or cooperation between players
would make it extremely difficult. So can you provide the intuition behind why that is,
if you agree with that idea? Yeah, yeah. So I've done a lot of work
in coalitional games. And we actually have a paper here with my other student, Gabrielle
Farina and some other collaborators on at NIPPS on that actually just came back from the
poster session where we presented this. But so when you have a collision, it's a different
problem. And it typically gets even harder than even the game representations, some of the game
representations don't really allow good computation. So we actually introduced a new game representation
for that. Is that kind of cooperation part of the model? Do you have information about the fact
that other players are cooperating? Or is it just this chaos that where nothing is known?
So there's some things unknown. Can you give an example of a collusion type game?
So like bridge. Yeah, so think about bridge. It's like when you and I are on a team,
our payoffs are the same. The problem is that we can't talk. So when I get my cards, I can't
whisper to you what my cards are. That would not be allowed. So we have to somehow coordinate
our strategies ahead of time, and only ahead of time. And then there's certain signals we
can talk about. But they have to be such that the other team also understands them. So that's
an example where the coordination is already built into the rules of the game. But in many
other situations like auctions or negotiations or diplomatic relationships, poker, it's not
really built in, but it still can be very helpful for the colluders. I've read you write somewhere
where the negotiations, you come to the table with prior like a strategy that like that you're
willing to do and not willing to do those kinds of things. So how do you start to now moving away
from poker, moving beyond poker into other applications like negotiations? How do you
start applying this to other two other domains? Yeah, even real world domains that you've worked
on. Yeah, I actually have two startup companies doing exactly that. One is called strategic machine,
and that's for kind of business applications, gaming, sports, all sorts of things like that.
Any applications of this to business and to sports and to gaming, to various types of things,
in finance, electricity markets and so on. And the other is called strategy robot, where we are
taking these to military, security, cybersecurity and intelligence applications.
I think you worked a little bit in, how do you put it, advertisement,
sort of suggesting ads kind of thing. Yes, that's another company, optimized markets.
But that's much more about a combinatorial market and optimization based technology.
That's not using these game-theoretic reasoning technologies. I see. Okay, so what sort of high
level do you think about our ability to use game-theoretic concepts to model human behavior?
Do you think human behavior is amenable to this kind of modeling outside of the poker games?
And where have you seen it done successfully in your work? I'm not sure. The goal really is
modeling humans. Like, for example, if I'm playing a zero-sum game, I don't really care
that the opponent is actually following my model of rational behavior. Because if they're not,
that's even better for me. Right. So, see, with the opponents in games, there's a,
the prerequisite is that you formalize the interaction in some way that can be amenable
to analysis. I mean, you've done this amazing work with mechanism design, designing games
that have certain outcomes. But so I'll tell you an example from my world of autonomous vehicles,
right? Okay. We're studying pedestrians and pedestrians and cars negotiate in this nonverbal
communication. There's this weird game dance of tension where pedestrians are basically saying,
I trust that you won't kill me. And so as a jay walker, I will step onto the road,
even though I'm breaking the law and there's this tension. And the question is, we really
don't know how to model that well in trying to model intent. And so people sometimes bring up
ideas of game theory and so on. Do you think that aspect of human behavior can use these kinds of
imperfect information approaches, modeling? How do we, how do you start to attack a problem
like that when you don't even know how to design the game to describe the situation in order to
solve it? Okay. So I haven't really thought about jaywalking. But one thing that I think could be
a good application in an autonomous vehicle system following. So let's say that you have
fleets of autonomous cars operating by different companies. So maybe here's the Waymo fleet and
here's the Uber fleet. If you think about the rules of the road, they define certain
legal rules, but that still leaves a huge strategy space open. Like as a simple example,
when cars merge, you know, how humans merge, you know, they slow down and look at each other and
try to try to merge wouldn't it be better if these situations would already be pre negotiated? So we
can actually merge at full speed. And we know that this is the situation. This is how we do it.
And it's all going to be faster. But there are way too many situations to negotiate manually.
So you could use automated negotiation. This is the idea, at least you could use automated
negotiation to negotiate all of these situations, or many of them in advance. And of course, it might
be that hey, maybe you're not gonna always let me go first. Maybe you said, okay, well, in these
situations, I'll let you go first. But in exchange, you're going to give me two hours, you're gonna
let me go first in this situation. So it's this huge combinatorial negotiation.
And do you think there's room in that example of merging to model this whole situation is an
imperfect information game? Or do you really want to consider it to be a perfect?
No, that's a good question. Yeah, that's a good question.
Do you pay the price of assuming that you don't know everything?
Yeah, I don't know. It's certainly much easier games with perfect information are much easier.
So if you can get away with it, you should. But if the real situation is of imperfect information,
then you're going to have to deal with it for imperfect information.
Great. So what lessons have you learned the annual computer poker competition,
an incredible accomplishment of AI, you know, you look at the history of the blue AlphaGo,
these kind of moments when AI stepped up in an engineering effort and a scientific effort combined
to beat the best human player. So what do you take away from this whole experience?
What have you learned about designing AI systems that play these kinds of games?
And what does that mean for AI in general, for the future of AI development?
Yeah, so that's a good question. So there's so much to say about it.
I do like this type of performance oriented research, although in my group, we go all the
way from like idea to theory to experiments to big system building to commercialization.
So we span that spectrum. But I think that in a lot of situations in AI, you really have to
build the big systems and evaluate them at scale before you know what works and doesn't.
And we've seen that in the computational game theory community, that there are a lot of techniques
that look good in the small, but then they cease to look good in the large. And we've also seen
that there are a lot of techniques that look superior in theory. And I really mean in terms
of convergence rates, better like first order methods, better convergence rates, like the
CFR based algorithms, yet the CFR based algorithms are the first fastest in practice.
So it really tells me that you have to test this in reality, the theory isn't tight enough,
if you will, to tell you which algorithms are better than the others. And you have to
look at these things that in the large, because any sort of projections you do from the small,
can at least in this domain be very misleading. So that's kind of from a kind of science and
engineering perspective, from a personal perspective, it's been just a wild experience
in that with the first poker competition, the first for first brains versus AI man machine
poker competition that we organized, there had been by the way for other poker games,
there had been previous competitions, but this was for heads up no limit, this was the first. And
I probably became the most hated person in the world of poker. And I didn't mean to, I saw
is that cracking the game for Yeah, it was a lot of people felt that it was a real threat to the
whole game, the whole existence of the game. If AI becomes better than humans, people would be
scared to play poker, because there are these super human AI is running around taking their
money and you know, all of that. So so I just, it was really aggressive. The comments were super
aggressive. I got everything just short of death threats. Do you think the same was true for chess?
Because right now, they just completed the world championships and chess, and humans just started
ignoring the fact that there's AI systems now that outperform humans, and they still enjoy the game
is still a beautiful game. That's what I think. And I think the same thing happened in poker.
And so so I didn't think of myself as somebody was going to kill the game. And I don't think I
did. Yeah, I've really learned to love this game. I wasn't a poker player before, but I learned so
many nuances about it from these AI's. And they've really changed how the game is played, by the way.
So they have these very Martian ways of playing poker. And the top humans are now incorporating
those types of strategies into their own play. So if anything, to me, our work has made poker
a richer, more interesting game for humans to play, not something that is going to steer humans
away from it entirely. Just a quick comment on something you said, which is, if I may say so,
in academia is a little bit rare sometimes, it's pretty brave to put your ideas to the test in
the way you described, saying that sometimes good ideas don't work when you actually try to apply
them at scale. So where does that come from? I mean, what if you could do advice for people?
What drives you in that sense? Were you always this way? I mean, it takes a brave person,
I guess is what I'm saying, to test their ideas, and to see if this thing actually works against
human top human players and so on. I don't know about brave, but it takes a lot of work.
It takes a lot of work and a lot of time to organize, to make something big and to organize
an event and stuff like that. And what drives you in that effort? Because you could still,
I would argue, get a best paper award at NIPS as you did in 17 without doing this.
That's right. Yes. And so in general, I believe it's very important to do things in the real
world and at scale. And that's really where the pudding, if you will, proof is in the pudding.
That's where it is. In this particular case, it was kind of a competition between different groups
for many years as to who can be the first one to beat the top humans at Heads Up No Limit Texas
Holden. So it became kind of like a competition who can get there.
Yeah. So a little friendly competition can do wonders for progress.
Yes, absolutely. So the topic of mechanism design, which is really interesting, also kind of new
to me, except as an observer of, I don't know, politics and any, I'm an observer of mechanisms,
but you write in your paper an automated mechanism of design that I quickly read.
So mechanism design is designing the rules of the game so you get a certain desirable outcome.
And you have this work on doing so in an automatic fashion as opposed to fine-tuning it.
So what have you learned from those efforts? If you look, say, I don't know, at complex,
it's like our political system. Can we design our political system to have in an automated
in an automated fashion to have outcomes that we want? Can we design something like
traffic lights to be smart where it gets outcomes that we want? So what are the lessons that you
draw from that work? Yeah, so I still very much believe in the automated mechanism design direction.
Yes. But it's not a panacea. There are impossibility results in mechanism design,
saying that there is no mechanism that accomplishes objective X in class C. So it's not going up,
there's no way using any mechanism design tools, manual or automated,
to do certain things in mechanism design. Can you describe that again? So meaning
it's impossible to achieve that? Yeah. There's a certain impossibility.
Impossible. So these are not statements about human ingenuity, who might come up with something
smart. These are proofs that if you want to accomplish properties X in class C, that is not
doable with any mechanism. The good thing about automated mechanism design is that we're not
really designing for a class. We're designing for specific settings at a time. So even if there's
an impossibility result for the whole class, it just doesn't mean that all of the cases in the
class are impossible. It just means that some of the cases are impossible. So we can actually
carve these islands of possibility within these known impossible classes. And we've actually
done that. So one of the famous results in mechanism design is the Meyers and Sathetway
theorem by Roger Meyers and Mark Sathetway from 1983. It's an impossibility of efficient trade
under imperfect information. We show that you can in many settings avoid that and get efficient
trade anyway. Depending on how you design the game. Depending how you design the game. And of
course, it doesn't in any way contradict the impossibility result. The impossibility result
is still there, but it just finds spots within this impossible class where in those spots you
don't have the impossibility. Sorry if I'm going a bit philosophical, but what lessons do you draw
towards like I mentioned politics or human interaction and designing mechanisms for
outside of just these kinds of trading or auctioning or purely formal games or human
interaction like a political system? Do you think it's applicable to politics or to
business, to negotiations, these kinds of things, designing rules that have certain outcomes?
Yeah. Yeah, I do think so. Have you seen success that successfully done?
They haven't really. Oh, you mean mechanism design or automated mechanism? Automated mechanism
design. So mechanism design itself has had fairly limited success so far. There are certain cases,
but most of the real world situations are actually not sound from a mechanism design
perspective. Even in those cases where they've been designed by very knowledgeable mechanism
design people, the people are typically just taking some insights from the theory and applying
those insights into the real world rather than applying the mechanisms directly. So one famous
example of is the FCC spectrum auctions. So I've also had a small role in that and very good
economists have been very excellent economists have been working on that with no game theory.
Yet the rules that are designed in practice there, they're such that
bidding truthfully is not the best strategy. Usually mechanism design, we try to make things
easy for the participants. So telling the truth is the best strategy. But even in those very
high stakes auctions where you have tens of billions of dollars worth of spectrum being auctioned,
truth telling is not the best strategy. And by the way, nobody knows even a single optimal
bidding strategy for those auctions. What's the challenge of coming up with an optimal
because there's a lot of players and there's imperfection? It's not so much there,
a lot of players, but many items for sale. And these mechanisms are such that even with just
two items or one item, bidding truthfully wouldn't be the best strategy. If you look at the history
of AI, it's marked by seminal events and AlphaGo beating a world champion, human go player,
I would put Lebrados winning the heads up no limit, Holdum is one of such event.
Thank you. And what do you think is the next such event,
whether it's in your life or in the broadly AI community that you think might be out there
that would surprise the world? So that's a great question and I don't really know the answer.
In terms of game solving, heads up no limit takes us all and really was the one remaining
widely agreed upon benchmark. So that was the big milestone. Now, are there other things?
Yeah, certainly there are, but there is not one that the community has kind of focused on.
So what could be other things? There are groups working on Starcraft, there are groups working
on Dota 2, these are video games, or you could have like Diplomacy or Hanabi, you know, things
like that. These are like recreational games, but none of them are really acknowledged as kind of
the main next challenge problem, like chess or go or heads up no limit takes us Holdum was.
So I don't really know in the game solving space what is or what will be the next benchmark.
I hope kind of hope that there will be a next benchmark, because really the different groups
working on the same problem really drove these application independent techniques forward very
quickly over 10 years. Do you think there's an open problem that excites you that you start
moving away from games into real world games like say the stock market trading?
Yeah, that's kind of how I am. So I am probably not going to work as hard on these recreational
benchmarks. I'm doing two startups on game solving technology, strategic machine and strategy robot,
and we're really interested in pushing this stuff into practice.
What do you think would be a really, you know, a powerful result that would be surprising,
that would be if you can say, I mean, it's, you know, five years, 10 years from now,
something that statistically would say is not very likely, but if there's a breakthrough,
what achieve? Yeah, so I think that overall, we're in a very different situation in game theory
than we are in, let's say, machine learning. So in machine learning, it's a fairly mature
technology and it's very broadly applied and proven success in the real world. In game solving,
there are almost no applications yet. We have just become superhuman, which machine learning,
you could argue happened in the 90s, if not earlier, and at least on supervised learning,
certain complex supervised learning applications. Now, I think the next challenge problem, I know
you're not asking about it this way, you're asking about technology breakthrough. But I think that
big breakthrough is to be able to show that, hey, maybe most of, let's say, military planning or
most of business strategy will actually be done strategically using computational game theory.
That's what I would like to see as a next five or 10-year goal. Maybe you can explain to me again,
forgive me if this is an obvious question, but machine learning methods and neural networks
are suffer from not being transparent, not being explainable. Game theoretic methods,
Nash Equilibria, do they generally, when you see the different solutions, are they,
when you talk about military operations, are they, once you see the strategies,
do they make sense, are they explainable, or do they suffer from the same problems as neural
networks do? So that's a good question. I would say a little bit yes and no. And what I mean by that
is that these game theoretic strategies, let's say, Nash Equilibrium, it has provable properties.
So it's unlike, let's say, deep learning where you kind of cross your fingers,
hopefully it'll work. And then after the fact, when you have the weights, you're still crossing
your fingers and I hope it will work. Here, you know that the solution quality is there,
there's provable solution quality guarantees. Now, that doesn't necessarily mean that the strategies
are human understandable. That's a whole other problem. So I think that deep learning and computational
game theory are in the same boat in that sense, that both are difficult to understand.
But at least the game theory techniques, they have these guarantees of
solution quality. So do you see business operations, strategic operations, even military,
in the future being at least the strong candidates being proposed by automated systems? Do you see
that? Yeah, I do. I do. But that's more of a belief than a substantiated fact. Depending
on where you land and optimism or pessimism, that's a really, to me, that's an exciting future,
especially if there's provable things in terms of optimality. So looking into the future,
there's a few folks worried about the, especially you look at the game of poker,
which is probably one of the last benchmarks in terms of games being solved. They worry about
the future and the existential threats of artificial intelligence. So the negative impact
in whatever form on society, is that something that concerns you as much? Or are you more
optimistic about the positive impacts of AI? I am much more optimistic about the positive impacts.
So just in my own work, what we've done so far, we run the nationwide kidney exchange,
hundreds of people are walking around alive today, who would it be? And it's increased employment.
You have a lot of people now running kidney exchanges and at transplant centers,
interacting with the kidney exchange. You have extra surgeons, nurses, anesthesiologists, hospitals,
all of that. So employment is increasing from that and the world is becoming a better place.
Another example is combinatorial sourcing auctions. We did 800 large scale combinatorial
sourcing auctions from 2001 to 2010 in a previous startup of mine called Combinet.
And we increased the supply chain efficiency on that $60 billion of spend by 12.6%. So that's
over $6 billion of efficiency improvement in the world. And this is not like shifting value from
somebody to somebody else, just efficiency improvement, like in trucking, less empty driving.
So there's less waste, less carbon footprint and so on. This is a huge positive impact in the near
term. But sort of to stay in it for a little longer, because I think game theory is a role to
play here. Let me actually come back on that as one thing. I think AI is also going to make the
world much safer. So that's another aspect that often gets overlooked. Let me ask this question.
Maybe you can speak to the safer. So I talked to Max Tagmark and Stuart Russell, who are very
concerned about existential threats of AI. And often the concern is about value misalignment. So AI
systems basically working, operating towards goals that are not the same as human civilization,
human beings. So it seems like game theory has a role to play there to make sure the values are
aligned with human beings. I don't know if that's how you think about it. If not, how do you think
AI might help with this problem? How do you think AI might make the world safer? Yeah, I think
this value misalignment is a fairly theoretical worry. And I haven't really seen it in because I
do a lot of real applications. I don't see it anywhere. The closest I've seen it was the following
type of mental exercise really, where I had this argument in the late 80s when we were building
these transportation optimization systems. And somebody had heard that it's a good idea to have
high utilization of assets. So they told me, hey, why don't you put that as objective? And we didn't
even put it as an objective, because I just showed him that, you know, if you had that as your
objective, the solution would be to load your trucks full and drive in circles. Nothing would
ever get delivered, you'd have 100% utilization. So yeah, I know this phenomenon. I've known this
for over 30 years. But I've never seen it actually be a problem in reality. And yes, if you have
the wrong objective, the AI will optimize that to the hilt. And it's gonna hurt more than some
human who's kind of trying to solve it in a half baked way with some human insight too. But I just
haven't seen that materializing practice. There's this gap that you've actually put your finger on
very clearly just now, between theory and reality, that's very difficult to put into words, I think.
It's what you can theoretically imagine, the worst possible case or even, yeah, I mean,
bad cases, and what usually happens in reality. So for example, to me, maybe it's something you can
comment on, having grown up and I grew up in the Soviet Union, you know, there's currently 10,000
nuclear weapons in the world. And for many decades, it's theoretically surprising to me that the
nuclear war is not broken out. Do you think about this aspect from a game theoretic perspective in
general? Why is that true? Why, in theory, you could see how things would go terribly wrong and
somehow yet they have not. Yeah, how do you think? So I do think that about that a lot. I think the
biggest two threats that we're facing as mankind, one is climate change, and the other is nuclear war.
So those are my main two worries that I worry about. And I've tried to do something about climate,
thought about trying to do something for climate change twice. Actually, for two of my startups,
I've actually commissioned studies of what we could do on those things. And we didn't really
find a sweet spot, but I'm still keeping an eye out on that if there's something where we could
actually provide a market solution or optimization solution or some other technology solution to
problems. Right now, like for example, pollution credit markets was what we were looking at then.
And it was much more the lack of political will by those markets were not so successful
rather than bad market design. So I could go in and make a better market design,
but that wouldn't really move the needle on the world very much if there's no political will. And
in the US, you know, the market, at least the Chicago market was just shut down and so on. So
then it doesn't really help how great your market design was. And on your nuclear side,
it's more so global warming is a more encroaching
problem. You know, nuclear weapons have been here. It's an obvious problem that's just been sitting
there. So how do you think about what is the mechanism design there that just made everything
seem stable? And are you still extremely worried? I am still extremely worried. So you
probably know the simple game theory of mad. So this was a mutually assured destruction.
And it doesn't require any computation with small matrices, you can actually convince yourself that
the game is such that nobody wants to initiate. Yeah, that's a very coarse grained analysis.
And it really works in a situation where you have two superpowers or small numbers of superpowers.
Now things are very different. You have a smaller nuke. So the threshold of initiating is smaller
and you have smaller countries and non-nation actors who may get a nuke and so on. So I think
it's riskier now than it was maybe ever before. And what idea, application of AI, you've talked
about a little bit, but what is the most exciting to you right now? I mean, you're here at NIPS,
NewRips. Now you have a few excellent pieces of work, but what are you thinking into the future
with several companies you're doing? What's the most exciting thing or one of the exciting things?
The number one thing for me right now is coming up with these scalable techniques for
game solving and applying them into the real world. I'm still very interested in market design
as well. And we're doing that in the optimized markets. But I'm most interested if number one
right now is strategic machine strategy robot getting that technology out there and seeing
as you're in the trenches doing applications, what needs to be actually filled, what technology
gaps still need to be filled. So it's so hard to just put your feet on the table and imagine what
needs to be done. But when you're actually doing real applications, the applications tell you
what needs to be done. And I really enjoy that interaction. Is it a challenging process to
apply some of the state of the art techniques you're working on, and having the various players in
industry or the military, or people who could really benefit from it actually use it? What's
that process like of, you know, in autonomous vehicles, we work with automotive companies and
they're in many ways are a little bit old fashioned. It's difficult. They really want to
use this technology. There's clearly will have a significant benefit. But the systems aren't quite
in place to easily have them integrated in terms of data, in terms of compute, in terms of all
these kinds of things. So do you, is that one of the bigger challenges that you're facing?
And how do you tackle that challenge? Yeah, I think that's always a challenge.
That's kind of slowness and inertia, really, of let's do things the way we've always done it.
You just have to find the internal champions at the customer who understand that, hey, things can't
be the same way in the future. Otherwise, bad things are going to happen. And it's in autonomous
vehicles, it's actually very interesting that the car makers are doing that, and they're very
traditional. But at the same time, you have tech companies who have nothing to do with cars or
transportation, like Google and Baidu, really pushing on autonomous cars. I find that fascinating.
Clearly, you're super excited about actually these ideas having an impact in the world.
In terms of the technology, in terms of ideas and research, are there directions
that you're also excited about, whether that's on the some of the approaches you talked about
for imperfect information games, whether it's applying deep learning to some of these problems?
Is there something that you're excited in the research side of things?
Yeah, yeah, lots of different things in the game solving. So solving even bigger games,
games where you have more hidden action of the player actions as well.
Poker is a game where really chance actions are hidden, or some of them are hidden,
but the player actions are public.
Multiplayer games of various sorts, collusion, opponent exploitation,
and even longer games. So games that basically go forever, but they're not repeated.
So extensive fun games that go forever. What would that even look like?
How do you represent that? How do you solve that?
What's an example of a game like that? This is some of the stochastic games that you mentioned?
Let's say business strategy. So it's not just modeling like a particular interaction,
but thinking about the business from here to eternity. Or let's say military strategy.
So it's not like war is going to go away. How do you think about military strategy that's going
to go forever? How do you even model that? How do you know whether a move was good,
that somebody made, and so on. So that's kind of one direction. I'm also very interested in
learning much more scalable techniques for integer programming.
So we had an ICML paper this summer on that, the first automated algorithm configuration paper
that has theoretical generalization guarantees. So if I see these many training examples,
and I tool my algorithm in this way, it's going to have good performance on the real distribution,
which I have not seen. So which is kind of interesting that algorithm configuration has
been going on now for at least 17 years seriously. And there has not been any generalization
theory before. Well, this is really exciting. And it's a huge honor to talk to you. Thank you
so much, Tomas. Thank you for bringing Lebrados to the world and all the great work you're doing.
Well, thank you very much. It's been fun. Good questions.