from the wait-until-they-teach-it-how-to-write-software dept.
Google says its AlphaGo Zero artificial intelligence program has triumphed at chess against world-leading specialist software within hours of teaching itself the game from scratch. The firm's DeepMind division says that it played 100 games against Stockfish 8, and won or drew all of them.
The research has yet to be peer reviewed. But experts already suggest the achievement will strengthen the firm's position in a competitive sector. "From a scientific point of view, it's the latest in a series of dazzling results that DeepMind has produced," the University of Oxford's Prof Michael Wooldridge told the BBC. "The general trajectory in DeepMind seems to be to solve a problem and then demonstrate it can really ramp up performance, and that's very impressive."
Tic-tac-toe, checkers, chess, Go, poker. Artificial intelligence rolled over each of these games like a relentless tide. Now Google's DeepMind is taking on the multiplayer space-war videogame StarCraft II. No one expects the robot to win anytime soon. But when it does, it will be a far greater achievement than DeepMind's conquest of Go—and not just because StarCraft is a professional e-sport watched by fans for millions of hours each month.
DeepMind and Blizzard Entertainment, the company behind StarCraft, just released the tools to let AI researchers create bots capable of competing in a galactic war against humans. The bots will see and do all all the things human players can do, and nothing more. They will not enjoy an unfair advantage.
DeepMind and Blizzard also are opening a cache of data from 65,000 past StarCraft II games that will likely be vital to the development of these bots, and say the trove will grow by around half a million games each month. DeepMind applied machine-learning techniques to Go matchups to develop its champion-beating Go bot, AlphaGo. A new DeepMind paper includes early results from feeding StarCraft data to its learning software, and shows it is a long way from mastering the game. And Google is not the only big company getting more serious about StarCraft. Late Monday, Facebook released its own collection of data from 65,000 human-on-human games of the original StarCraft to help bot builders.
[...] Beating StarCraft will require numerous breakthroughs. And simply pointing current machine-learning algorithms at the new tranches of past games to copy humans won't be enough. Computers will need to develop styles of play tuned to their own strengths, for example in multi-tasking, says Martin Rooijackers, creator of leading automated StarCraft player LetaBot. "The way that a bot plays StarCraft is different from how a human plays it," he says. After all, the Wright brothers didn't get machines to fly by copying birds.
Churchill guesses it will be five years before a StarCraft bot can beat a human. He also notes that many experts predicted a similar timeframe for Go—right before AlphaGo burst onto the scene.
Have any Soylentils here experimented with Deep Learning algorithms in a game context? If so how did it go and how did it compare to more traditional opponent strategies?
Google DeepMind researchers have made their old AlphaGo program obsolete:
The old AlphaGo relied on a computationally intensive Monte Carlo tree search to play through Go scenarios. The nodes and branches created a much larger tree than AlphaGo practically needed to play. A combination of reinforcement learning and human-supervised learning was used to build "value" and "policy" neural networks that used the search tree to execute gameplay strategies. The software learned from 30 million moves played in human-on-human games, and benefited from various bodges and tricks to learn to win. For instance, it was trained from master-level human players, rather than picking it up from scratch.
AlphaGo Zero did start from scratch with no experts guiding it. And it is much more efficient: it only uses a single computer and four of Google's custom TPU1 chips to play matches, compared to AlphaGo's several machines and 48 TPUs. Since Zero didn't rely on human gameplay, and a smaller number of matches, its Monte Carlo tree search is smaller. The self-play algorithm also combined both the value and policy neural networks into one, and was trained on 64 GPUs and 19 CPUs over a few days by playing nearly five million games against itself. In comparison, AlphaGo needed months of training and used 1,920 CPUs and 280 GPUs to beat Lee Sedol.
Though self-play AlphaGo Zero even discovered for itself, without human intervention, classic moves in the theory of Go, such as fuseki opening tactics, and what's called life and death. More details can be found in Nature, or from the paper directly here. Stanford computer science academic Bharath Ramsundar has a summary of the more technical points, here.
Go is an abstract strategy board game for two players, in which the aim is to surround more territory than the opponent.