Slash Boxes

SoylentNews is people

posted by CoolHand on Friday May 20 2016, @03:06PM   Printer-friendly
from the skynet-development dept.

Google has lifted the lid off of an internal project to create custom application-specific integrated circuits (ASICs) for machine learning tasks. The result is what they are calling a "TPU":

[We] started a stealthy project at Google several years ago to see what we could accomplish with our own custom accelerators for machine learning applications. The result is called a Tensor Processing Unit (TPU), a custom ASIC we built specifically for machine learning — and tailored for TensorFlow. We've been running TPUs inside our data centers for more than a year, and have found them to deliver an order of magnitude better-optimized performance per watt for machine learning. This is roughly equivalent to fast-forwarding technology about seven years into the future (three generations of Moore's Law). [...] TPU is an example of how fast we turn research into practice — from first tested silicon, the team
had them up and running applications at speed in our data centers within 22 days.

The processors are already being used to improve search and Street View, and were used to power AlphaGo during its matches against Go champion Lee Sedol. More details can be found at Next Platform, Tom's Hardware, and AnandTech.

Original Submission

Related Stories

Nvidia Compares Google's TPUs to the Tesla P40 4 comments

Following Google's release of a paper detailing how its tensor processing units (TPUs) beat 2015 CPUs and GPUs at machine learning inference tasks, Nvidia has countered with results from its Tesla P40:

Google's TPU went online in 2015, which is why the company compared its performance against other chips that it was using at that time in its data centers, such as the Nvidia Tesla K80 GPU and the Intel Haswell CPU.

Google is only now releasing the results, possibly because it doesn't want other machine learning competitors (think Microsoft, rather than Nvidia or Intel) to learn about the secrets that make its AI so advanced, at least until it's too late to matter. Releasing the TPU results now could very well mean Google is already testing or even deploying its next-generation TPU.

Nevertheless, Nvidia took the opportunity to show that its latest inference GPUs, such as the Tesla P40, have evolved significantly since then, too. Some of the increase in inference performance seen by Nvidia GPUs is due to the company jumping from the previous 28nm process node to the 16nm FinFET node. This jump offered its chips about twice as much performance per Watt.

Nvidia also further improved its GPU architecture for deep learning in Maxwell, and then again in Pascal. Yet another reason for why the new GPU is so much faster for inferencing is that Nvidia's deep learning and inference-optimized software has improved significantly as well.

Finally, perhaps the main reason for why the Tesla P40 can be up to 26x faster than the old Tesla K80, according to Nvidia, is because the Tesla P40 supports INT8 computation, as opposed to the FP32-only support for the K80. Inference doesn't need too high accuracy when doing calculations and 8-bit integers seem to be enough for most types of neural networks.

Google's TPUs use less power, have an unknown cost (the P40 can cost $5,700), and may have advanced considerably since 2015.

Previously: Google Reveals Homegrown "TPU" For Machine Learning

Original Submission

Google Upgrades Chinese-English Translation with "Neural Machine Translation" 24 comments

Google Translate will be upgraded using a "Neural Machine Translation" technique, starting with Chinese-English translation today:

Google has been working on a machine learning translation technique for years, and today is its official debut. The Google Neural Machine Translation [GNMT] system, deployed today for Chinese-English queries, is a step up in complexity from existing methods. Here's how things have evolved (in a nutshell). [...] GNMT is the latest and by far the most effective to successfully leverage machine learning in translation. It looks at the sentence as a whole, while keeping in mind, so to speak, the smaller pieces like words and phrases. It's much like the way we look at an image as a whole while being aware of individual pieces — and that's not a coincidence. Neural networks have been trained to identify images and objects in ways imitative of human perception, and there's more than a passing resemblance between finding the gestalt of an image and that of a sentence.

Interestingly, there's little in there actually specific to language: The system doesn't know the difference between the future perfect and future continuous, and it doesn't break up words based on their etymologies. It's all math and stats, no humanity. Reducing translation to a mechanical task is admirable, but in a way chilling — though admittedly, in this case, little but a mechanical translation is called for, and artifice and interpretation are superfluous.

The code runs on Google's homegrown TPUs. The Google Research Blog says that the technique will be applied to other language pairs in the coming months.

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation

Original Submission

Google's New TPUs are Now Much Faster -- will be Made Available to Researchers 20 comments

Google's machine learning oriented chips have gotten an upgrade:

At Google I/O 2017, Google announced its next-generation machine learning chip, called the "Cloud TPU." The new TPU no longer does only inference--now it can also train neural networks.

[...] In last month's paper, Google hinted that a next-generation TPU could be significantly faster if certain modifications were made. The Cloud TPU seems to have have received some of those improvements. It's now much faster, and it can also do floating-point computation, which means it's suitable for training neural networks, too.

According to Google, the chip can achieve 180 teraflops of floating-point performance, which is six times more than Nvidia's latest Tesla V100 accelerator for FP16 half-precision computation. Even when compared against Nvidia's "Tensor Core" performance, the Cloud TPU is still 50% faster.

[...] Google will also donate access to 1,000 Cloud TPUs to top researchers under the TensorFlow Research Cloud program to see what people do with them.

Also at EETimes and Google.

Previously: Google Reveals Homegrown "TPU" For Machine Learning
Google Pulls Back the Covers on Its First Machine Learning Chip
Nvidia Compares Google's TPUs to the Tesla P40
NVIDIA's Volta Architecture Unveiled: GV100 and Tesla V100

Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by fishybell on Friday May 20 2016, @03:34PM

    by fishybell (3156) Subscriber Badge on Friday May 20 2016, @03:34PM (#348804)

    Google seems to be proving more and more that to be highly competitive in the long run you have to vertically integrate. They designed their own server racks, their own motherboards, and now their own ASICs. I expect the likes of Microsoft, Apple, Facebook, etc. to follow in their footsteps.

    • (Score: 2) by dyingtolive on Friday May 20 2016, @03:49PM

      by dyingtolive (952) on Friday May 20 2016, @03:49PM (#348810)

      Contrast that with the guy I was talking to the other day who, when I brought up the notion of using a local git repo rather than github, responded with "I make things... that aren't code repository services".

      And I mean, while setting up a git repo is braindead easy, he has a point to a certain degree. Vertical integration is great as long as you have the manpower for it. If you don't, then you never get any actual work done, because you're trying to keep your infra alive.

      Maybe my example is imperfect, but I see this as the natural extension to the constant "cloud vs. inhouse" argument, except taken to the point where it goes beyond services and into hardware. It's the natural place to go for someone who's gotten to the point that they've streamlined enough bottlenecks. Sooner or later, the most worthwhile ones to go after begin winding up in off the shelf hardware.

      Don't blame me, I voted for moose wang!
      • (Score: 2) by opinionated_science on Friday May 20 2016, @04:43PM

        by opinionated_science (4031) on Friday May 20 2016, @04:43PM (#348830)

        well git(hub) allows it to be shared , without their inertia. As a developed (sometimes), our codes might go through many iterations, but relying upon external libraries/code is best static.

        I am interesting in the details of the math/operations their TPU specialise in. In my field (biophysics/informatics), we are always looking for ways of converting problems to exploit hardware advances!!!

      • (Score: 2) by WillR on Friday May 20 2016, @06:36PM

        by WillR (2012) on Friday May 20 2016, @06:36PM (#348859)
        If your business needs a widget, you buy one off the shelf.
        If it needs a million widgets a year, you build a dedicated widget factory to make them as efficiently as possible.
    • (Score: 2, Insightful) by Anonymous Coward on Friday May 20 2016, @04:20PM

      by Anonymous Coward on Friday May 20 2016, @04:20PM (#348822)

      > Google seems to be proving more and more that to be highly competitive in the long run you have to vertically integrate.

      Or that in a marketplace dominated by behemoths who can all afford to vertically integrate it is in their self-interest to vertically integrate because it keeps as much innovation from diffusing into the general marketplace where it could be used by your competition, including any new start-ups.

    • (Score: 2) by ledow on Friday May 20 2016, @06:19PM

      by ledow (5567) on Friday May 20 2016, @06:19PM (#348856) Homepage

      And Amazon - who in the UK basically have their own delivery services, cloud datacentres, etc. and then sell the excess.

      I'm always confused about this. I'm not sure that it works at all scales. But at a certain point, I can't fathom it.

      I work in schools (in the UK that means kids from 3-18), so let's use that as an example.

      Every school I've ever worked in uses an outside catering firm for food. And they serve hundreds of meals every day.
      To do that, the school pays for some catering company to provide food, and term-time-only staff, in facilities that the school owns.
      So, you're paying for the staff (the same staff every day, all day, and they have no jobs elsewhere), plus their pensions and legal requirements, plus the food, etc. plus the electricity and the cookers and all the food hygiene training and whatever else. And it doesn't matter whether it comes from the school or the catering company, they just don't enjoy any kind of "bulk discount" (I've seen the prices!). But for everything that the catering company provides, the school are paying for all the costs of doing that, plus having standby staff, plus all the training and legalities. PLUS A PROFIT MARGIN.

      So... why don't the school just hire their own cooks (even poach the ones working for the catering company), order in their own ingredients (a school is big enough to negotiate discounts and would it really matter if the cash-and-carry down the road did a better deal so long as the end product was the same?), and do the same thing AND KEEP THAT PROFIT THEMSELVES.

      Print management systems. Buy from them (more likely lease from them) a printer. Pay for every page and consumable. And pay a company who has to have engineers who come out and repair it (and often those engineers are nothing more than the original manufacturer's engineers anyway). Or... you could just buy the printers, buy the toner, and hire even a part-time printer engineer (large enough school, though, and you could easily keep a full-time one busy). Then no excuses, no waiting for them to turn up, often you're paying - via the service charge or via one-off repair charges - for every part anyway. PLUS PROFIT. And if your lease gives you "free" upgrades - guess, what, you could buy those free upgrades yourself for the same outlay, PLUS PROFIT, and still have the old printers lying about to use / sell!

      And printed items. We send so much off to the printers for things like brochures, fancy leaflets, mass mailings, etc. There's a point at which I think "If you just had bought one stupendously large printer and make a reprographics room, you could have done that all in-house and got so much other benefit from having that facility that it would pay for itself". And having an employed reprographics person do this all day takes ALL the printing off the rest of the staff and no more wasting time or money figuring out how to do that once-a-year thing when the guy is working with the printers every day. Some schools do this, I've seen it, but by far not all of them, and not even just the biggest.

      Silly things - DVD production. We were paying photographers to come in and video our own performances. And they were charging £20 per copy on DVD. Or we could buy a DVD duplicator which, even with on-disc printing, and custom-printed cases, and the handling and faffing time to burn them, print them, pack them all and send home - we actually made the money back for the cost of the duplicator and the ink and discs within the first two runs of discs (literally, a couple of hundred disks and we do that a dozen times a year). And we can keep all the old performances on the network, and print one-offs on demand at any time with only a moment's notice for no extra cost. We couldn't ever go back to the photographers and say "Can I have another DVD of that production 10 years ago for one of our former pupils?" And, though I'm sure the quality is good, why are we paying someone to film a performance when we have drama, theatre and photography students everywhere? Buy the good camera, lock it away from casual use, and film your own performances!

      Vehicles. Schools hire coaches. And pay for a driver. And coaches turn up on an almost daily basis for trips and whatever else. And it costs a fortune. And several staff will be licensed to drive buses and coaches anyway, for the smaller ones. Is it really more profitable to just let someone else do it and take a profit margin on it?

      Cleaning. Some contractor somewhere hires cleaners who do only a few hours a day (cleaners in my schools have rarely worked for anyone else under their contractor's agreements while working for the school). He assigns them to a school. They come in, clean up, leave. They often don't even provide their own hoovers or cleaning equipment, the school provides that. Then you pay for that, plus the contingencies, PLUS PROFIT.

      I'm sure sometimes the answer is no, and sometimes the answer is maybe, and sometimes the scale doesn't match what you might need (e.g. IT sometimes doesn't scale and often for smaller schools a small IT support contract is fine). But for many things I'm honestly NOT convinced that you're not just paying a middleman to do the job you could do yourself and then take his 10%.

      If you already have an HR department, already require health and safety training, already have a variety of contracts, already have to run payroll and tax and employer pension schemes, already have liability insurance, etc. then why not use it? And if the staff you hire - who are all paid for solely by your payments to the contractors, anyway - are crap, you sack them and find another. Rather than having to get into contractual rows with a third-party.

      And once you start down that route, I'm sure there's other things to absorb along the way. The only stopping point is scale (no, a school wouldn't make its own processors, but it might source its own IT, maybe even build the machines, do their own in-house repairs), but I'm convinced that most large corporations stop far too early. For instance, in a large school you have data managers and website people and an IT department... I'm sure an in-house coder who can customise - or even create - the applications you use would bring more than enough value to justify their existence. You have teachers spending hours looking for apps to use in the classroom and you could just have some guy knock one up, working in tandem to get exactly what you want. Or, and it needs to track each kids progress? Well, his other project was working with the data manager on the main database with assessment data. And the IT guy is struggling to install it? The guy who wrote it is sitting next to him.

      I'm just... I get baffled sometimes by what the schools that have employed me do. I see a lot of wastage. And I see a lot of things where an outside company would do a better job, yes. But also I see lots of places where I just think: We have X million pounds in the bank, and we're leasing a dozen crap printers that are poorly supported and where we have to pay stupid cost-per-page rates and are tied in for 5 years.

      I'm sure there are reasons, but I'm equally sure they're not insurmountable while still making profit. I'm sure some of it is about liability, but I'm also sure that the school would still be ultimately responsible and at least if it was in-house, they could train people specifically and sack them when they make stupid mistakes and not have to constantly explain the same things over and over to some contracted worker who'll be someone different next week again.

      At Google's scale, or Amazon's scale, and the amount of cash they are throwing around... yeah, I'd do everything I possibly could myself.

      • (Score: 0) by Anonymous Coward on Friday May 20 2016, @09:00PM

        by Anonymous Coward on Friday May 20 2016, @09:00PM (#348895)

        why don't the school just hire their own cooks (even poach the ones working for the catering company), order in their own ingredients (a school is big enough to negotiate discounts and would it really matter if the cash-and-carry down the road did a better deal so long as the end product was the same?), and do the same thing AND KEEP THAT PROFIT THEMSELVES.

        The theory is that if it is outsourced then the school has the option to switch to another catering company, thus the catering company has an incentive to keep costs down. Thus despite the profit margin, they would still be cheaper than another catering company.

        But too often what really happens is that the catering company "captures" the contract through things like institutional memory that would make a brand new catering company have higher costs or lower quality until it develops its own institutional memory. Also buddying up (aka "a good working relationship") with the school administrators responsible for managing the contracts, making them less ruthless about switching contracts.

        In those cases the end result is typically that the private catering company shaves costs by short-changing the labor and keeps the lionshare of those savings for themselves. So the peons get fucked, the school is no better off but the owners of the catering company have vacation homes in ibiza.

    • (Score: 0) by Anonymous Coward on Friday May 20 2016, @06:58PM

      by Anonymous Coward on Friday May 20 2016, @06:58PM (#348866)

      I would expect FB to lead a charge for open ASIC designs, that seems to be the way they work.

      Agree that Google, Amazon, and Microsoft are much more - can I say the P word? - proprietary when it comes to work they think is cutting edge. Even when they obtain the IP through an acquisition.

  • (Score: 0) by Anonymous Coward on Friday May 20 2016, @04:26PM

    by Anonymous Coward on Friday May 20 2016, @04:26PM (#348825)

    The name sounds wrong. A tensor is generally an array of vectors. So when I hear "tensor processor" I think hardcore floating point processor. But, afaik, AI is not a floating-point workload. So, is this just a case of someone picking a name that just sounds cool, or is it really doing actual tensor computations?

    • (Score: 0) by Anonymous Coward on Friday May 20 2016, @05:27PM

      by Anonymous Coward on Friday May 20 2016, @05:27PM (#348849)

      I'm of course just speculating here, but they've probably got it performing the ASIC version of a classification algorithm in multiple dimensions (n-dimensional gradients, where n is some predetermined order). If that's the case then it could legitimately be a chip that performs tensor operations.

    • (Score: 2) by jcross on Friday May 20 2016, @09:33PM

      by jcross (4009) on Friday May 20 2016, @09:33PM (#348902)

      Apparently the TPU is based on 8-bit integer math, but there's really no reason you can't have discrete tensors with 8-bit values, is there? I mean even floating point numbers are not continuous in the mathematical sense of the word, so all we're talking about is a massive reduction in range and resolution.

    • (Score: 2) by Non Sequor on Friday May 20 2016, @10:35PM

      by Non Sequor (1005) Subscriber Badge on Friday May 20 2016, @10:35PM (#348920) Journal

      Well there are eigenvector based methods for machine learning. These aren't what you would call AI on their own but they capture key elements of problems of classification and inference.

      Write your congressman. Tell him he sucks.
  • (Score: 2) by Fnord666 on Friday May 20 2016, @07:11PM

    by Fnord666 (652) on Friday May 20 2016, @07:11PM (#348870)
    So I'm curious. What does it cost to build an ASIC like this for a custom purpose?
    • (Score: 2) by Fnord666 on Friday May 20 2016, @07:17PM

      by Fnord666 (652) on Friday May 20 2016, @07:17PM (#348872)

      So I'm curious. What does it cost to build an ASIC like this for a custom purpose?

      Ok I guess this is a little outside of the hobbyist's budget.

      Krewell points out that designing a chip from scratch, even a simple one, can cost $100 million or more.

      • (Score: 2) by bob_super on Friday May 20 2016, @07:58PM

        by bob_super (1357) on Friday May 20 2016, @07:58PM (#348881)

        That's why you use an FPGA unless you have a Google-size budget.

        • (Score: 0) by Anonymous Coward on Friday May 20 2016, @08:28PM

          by Anonymous Coward on Friday May 20 2016, @08:28PM (#348890)

          well there is that guy who built a 6502 cpu clone from discrete (surface mounted...) components, so it's theoretically possible for the hobbyist...

      • (Score: 2) by Scruffy Beard 2 on Friday May 20 2016, @08:36PM

        by Scruffy Beard 2 (6030) on Friday May 20 2016, @08:36PM (#348892)

        In the Bitcoin community the cost of an ASIC run was often quoted at $2M. I wonder if $100M includes several iterations.

        • (Score: 2) by jcross on Friday May 20 2016, @09:36PM

          by jcross (4009) on Friday May 20 2016, @09:36PM (#348903)

          I believe the majority of it would be the services of the ASIC design team. Seems to me like bitcoin folks are mostly not corporate, so they would be unlikely to include that part as an explicit expense.

      • (Score: 0) by Anonymous Coward on Friday May 20 2016, @09:16PM

        by Anonymous Coward on Friday May 20 2016, @09:16PM (#348898)

        If you have to ask... you can't afford it!

  • (Score: 0) by Anonymous Coward on Friday May 20 2016, @07:57PM

    by Anonymous Coward on Friday May 20 2016, @07:57PM (#348880)

    It also runs on gogols.