Engaging in activities like poker or bidding on a house involves dealing with imperfect information. You know your own cards or your budget, but not your opponent’s hand or their financial limit. A recent paper by MIT researchers, presented at the International Conference on Learning Representations in Rio De Janeiro, delves into imperfect-information games. These games are zero-sum, meaning one player’s win is another’s loss.
The paper’s authors include Sobhan Mohammadpour and Gabriele Farina from MIT, along with researchers from the University of Texas, University of California, Carnegie Mellon University, and New York University. They explored algorithms for training neural networks to play these games. Traditionally, it was believed that game theory-based algorithms outperformed general-purpose policy gradient methods, which date back to the 1990s. These methods help neural networks make decisions step by step towards a goal, adjusting as needed.
Despite not being designed for strategic games, the researchers examined how policy gradient methods perform in two-player settings. Farina notes these methods become complex when multiple players are involved, as optimal strategies shift with the opponent’s moves. Sokota highlights that the study reveals policy gradient methods can surpass specialized game-theoretic algorithms, challenging previous assumptions.
This study introduces a way to fairly evaluate different algorithms that train neural networks for imperfect-information games. Rudolph explains they offer a benchmark instead of a new algorithm, allowing researchers to test and compare performance. This benchmark measures performance using exploitability, assessing how well players fare against a worst-case opponent, with zero indicating perfect play.
The team tested five games, including Phantom Tic-Tac-Toe and Liar’s Dice. The challenge was applying the exploitability measure to complex games with up to 30 billion states. Mohammadpour describes it as navigating a dark room full of unseen objects. Previous studies used much smaller games.
Experiments showed that networks trained with policy gradient algorithms achieved better exploitability scores than those using game theory-based methods. In direct competitions, the policy gradient-trained networks also performed better, reinforcing confidence in their benchmark approach.
The researchers have made their benchmarking software widely accessible, requiring minimal resources to run. Farina emphasizes the broader implications of their work, noting that ‘game’ refers to any strategic interaction involving multiple agents. Vinitsky adds that hidden information impacts various fields, suggesting potential improvements beyond games.
Ian Gemp from Google DeepMind, uninvolved in the study, finds the results promising. He views the work as evidence that updating traditional tools like policy gradient methods can effectively address complex strategic challenges.
Original Source: news.mit.edu
