from the skynet-development dept.
[We] started a stealthy project at Google several years ago to see what we could accomplish with our own custom accelerators for machine learning applications. The result is called a Tensor Processing Unit (TPU), a custom ASIC we built specifically for machine learning — and tailored for TensorFlow. We've been running TPUs inside our data centers for more than a year, and have found them to deliver an order of magnitude better-optimized performance per watt for machine learning. This is roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore's Law). [...] TPU is an example of how fast we turn research into practice — from first tested silicon, the team
had them up and running applications at speed in our data centers within 22 days.
The processors are already being used to improve search and Street View, and were used to power AlphaGo during its matches against Go champion Lee Sedol. More details can be found at Next Platform, Tom's Hardware, and AnandTech.
Following Google's release of a paper detailing how its tensor processing units (TPUs) beat 2015 CPUs and GPUs at machine learning inference tasks, Nvidia has countered with results from its Tesla P40:
Google's TPU went online in 2015, which is why the company compared its performance against other chips that it was using at that time in its data centers, such as the Nvidia Tesla K80 GPU and the Intel Haswell CPU.
Google is only now releasing the results, possibly because it doesn't want other machine learning competitors (think Microsoft, rather than Nvidia or Intel) to learn about the secrets that make its AI so advanced, at least until it's too late to matter. Releasing the TPU results now could very well mean Google is already testing or even deploying its next-generation TPU.
Nevertheless, Nvidia took the opportunity to show that its latest inference GPUs, such as the Tesla P40, have evolved significantly since then, too. Some of the increase in inference performance seen by Nvidia GPUs is due to the company jumping from the previous 28nm process node to the 16nm FinFET node. This jump offered its chips about twice as much performance per Watt.
Nvidia also further improved its GPU architecture for deep learning in Maxwell, and then again in Pascal. Yet another reason for why the new GPU is so much faster for inferencing is that Nvidia's deep learning and inference-optimized software has improved significantly as well.
Finally, perhaps the main reason for why the Tesla P40 can be up to 26x faster than the old Tesla K80, according to Nvidia, is because the Tesla P40 supports INT8 computation, as opposed to the FP32-only support for the K80. Inference doesn't need too high accuracy when doing calculations and 8-bit integers seem to be enough for most types of neural networks.
Google's TPUs use less power, have an unknown cost (the P40 can cost $5,700), and may have advanced considerably since 2015.
Google Translate will be upgraded using a "Neural Machine Translation" technique, starting with Chinese-English translation today:
Google has been working on a machine learning translation technique for years, and today is its official debut. The Google Neural Machine Translation [GNMT] system, deployed today for Chinese-English queries, is a step up in complexity from existing methods. Here's how things have evolved (in a nutshell). [...] GNMT is the latest and by far the most effective to successfully leverage machine learning in translation. It looks at the sentence as a whole, while keeping in mind, so to speak, the smaller pieces like words and phrases. It's much like the way we look at an image as a whole while being aware of individual pieces — and that's not a coincidence. Neural networks have been trained to identify images and objects in ways imitative of human perception, and there's more than a passing resemblance between finding the gestalt of an image and that of a sentence.
Interestingly, there's little in there actually specific to language: The system doesn't know the difference between the future perfect and future continuous, and it doesn't break up words based on their etymologies. It's all math and stats, no humanity. Reducing translation to a mechanical task is admirable, but in a way chilling — though admittedly, in this case, little but a mechanical translation is called for, and artifice and interpretation are superfluous.
Google's machine learning oriented chips have gotten an upgrade:
At Google I/O 2017, Google announced its next-generation machine learning chip, called the "Cloud TPU." The new TPU no longer does only inference--now it can also train neural networks.
[...] In last month's paper, Google hinted that a next-generation TPU could be significantly faster if certain modifications were made. The Cloud TPU seems to have have received some of those improvements. It's now much faster, and it can also do floating-point computation, which means it's suitable for training neural networks, too.
According to Google, the chip can achieve 180 teraflops of floating-point performance, which is six times more than Nvidia's latest Tesla V100 accelerator for FP16 half-precision computation. Even when compared against Nvidia's "Tensor Core" performance, the Cloud TPU is still 50% faster.
[...] Google will also donate access to 1,000 Cloud TPUs to top researchers under the TensorFlow Research Cloud program to see what people do with them.
Previously: Google Reveals Homegrown "TPU" For Machine Learning
Google Pulls Back the Covers on Its First Machine Learning Chip
Nvidia Compares Google's TPUs to the Tesla P40
NVIDIA's Volta Architecture Unveiled: GV100 and Tesla V100