David Silver, et.al., “Mastering the game of Go with deep neural networks and tree search,” Nature 529, pp. 484–489, Jan. 28, 2016
Nature Paper Details
from Google Deep Mind
Our Nature paper published on 28th January 2016, describes the technical details behind a new approach to computer Go that combines Monte-Carlo tree search with deep neural networks that have been trained by supervised learning, from human expert games, and by reinforcement learning from games of self-play.
The game of Go is widely viewed as an unsolved “grand challenge” for artificial intelligence. Despite decades of work, the strongest computer Go programs still only play at the level of human amateurs. In this paper we describe our Go program, AlphaGo. This program was based on general-purpose AI methods, using deep neural networks to mimic expert players, and further improving the program by learning from games played against itself. AlphaGo won over 99% of games against the strongest other Go programs. It also defeated the human European champion by 5–0 in an official tournament match. This is the first time ever that a computer program has defeated a professional Go player, a feat previously believed to be at least a decade away.
Full Author List
David Silver, Aja Huang, Chris J. Maddison, Arthur Guez, Laurent Sifre, George van den Driessche, Julian Schrittwieser, Ioannis Antonoglou, Veda Panneershelvam, Marc Lanctot, Sander Dieleman, Dominik Grewe, John Nham, Nal Kalchbrenner, Ilya Sutskever, Timothy Lillicrap, Madeleine Leach, Koray Kavukcuoglu, Thore Graepel & Demis Hassabis
The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Without any lookahead search, the neural networks play Go at the level of state-of-the-art Monte Carlo tree search programs that simulate thousands of random games of self-play. We also introduce a new search algorithm that combines Monte Carlo simulation with value and policy networks. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.