from the Zeroing-in-on-AI dept.
Google DeepMind researchers have made their old AlphaGo program obsolete:
The old AlphaGo relied on a computationally intensive Monte Carlo tree search to play through Go scenarios. The nodes and branches created a much larger tree than AlphaGo practically needed to play. A combination of reinforcement learning and human-supervised learning was used to build "value" and "policy" neural networks that used the search tree to execute gameplay strategies. The software learned from 30 million moves played in human-on-human games, and benefited from various bodges and tricks to learn to win. For instance, it was trained from master-level human players, rather than picking it up from scratch.
AlphaGo Zero did start from scratch with no experts guiding it. And it is much more efficient: it only uses a single computer and four of Google's custom TPU1 chips to play matches, compared to AlphaGo's several machines and 48 TPUs. Since Zero didn't rely on human gameplay, and a smaller number of matches, its Monte Carlo tree search is smaller. The self-play algorithm also combined both the value and policy neural networks into one, and was trained on 64 GPUs and 19 CPUs over a few days by playing nearly five million games against itself. In comparison, AlphaGo needed months of training and used 1,920 CPUs and 280 GPUs to beat Lee Sedol.
Though self-play AlphaGo Zero even discovered for itself, without human intervention, classic moves in the theory of Go, such as fuseki opening tactics, and what's called life and death. More details can be found in Nature, or from the paper directly here. Stanford computer science academic Bharath Ramsundar has a summary of the more technical points, here.
Go is an abstract strategy board game for two players, in which the aim is to surround more territory than the opponent.
Google's machine learning oriented chips have gotten an upgrade:
At Google I/O 2017, Google announced its next-generation machine learning chip, called the "Cloud TPU." The new TPU no longer does only inference--now it can also train neural networks.
[...] In last month's paper, Google hinted that a next-generation TPU could be significantly faster if certain modifications were made. The Cloud TPU seems to have have received some of those improvements. It's now much faster, and it can also do floating-point computation, which means it's suitable for training neural networks, too.
According to Google, the chip can achieve 180 teraflops of floating-point performance, which is six times more than Nvidia's latest Tesla V100 accelerator for FP16 half-precision computation. Even when compared against Nvidia's "Tensor Core" performance, the Cloud TPU is still 50% faster.
[...] Google will also donate access to 1,000 Cloud TPUs to top researchers under the TensorFlow Research Cloud program to see what people do with them.
Previously: Google Reveals Homegrown "TPU" For Machine Learning
Google Pulls Back the Covers on Its First Machine Learning Chip
Nvidia Compares Google's TPUs to the Tesla P40
NVIDIA's Volta Architecture Unveiled: GV100 and Tesla V100
To say that AlphaGo had a great run in the competitive Go scene would be an understatement: it has just defeated the world's number 1 Go player, Ke Jie, in a three-part match. Now that it has nothing left to prove, the AI is hanging up its boots and leaving the world of competitive Go behind. AlphaGo's developers from Google-owned DeepMind will now focus on creating advanced general algorithms to help scientists find elusive cures for diseases, conjure up a way to dramatically reduce energy consumption and invent new revolutionary materials.
Before they leave Go behind completely, though, they plan to publish one more paper later this year to reveal how they tweaked the AI to prepare it for the matches against Ke Jie. They're also developing a tool that would show how AlphaGo would respond to a particular situation on the Go board with help from the world's number one player.
Google says its AlphaGo Zero artificial intelligence program has triumphed at chess against world-leading specialist software within hours of teaching itself the game from scratch. The firm's DeepMind division says that it played 100 games against Stockfish 8, and won or drew all of them.
The research has yet to be peer reviewed. But experts already suggest the achievement will strengthen the firm's position in a competitive sector. "From a scientific point of view, it's the latest in a series of dazzling results that DeepMind has produced," the University of Oxford's Prof Michael Wooldridge told the BBC. "The general trajectory in DeepMind seems to be to solve a problem and then demonstrate it can really ramp up performance, and that's very impressive."