Here's a machine learning chip startup:
Chip startups come and go. Generally, we cover them because of novel architectures or potential for specific applications. But in some cases, like today, it is for those reasons and because of the people behind an effort to bring a new architecture into a crowded, and ultimately limited, landscape.
With $100 million in "patience money" from a few individual investors who believe in the future of sparse matrix-based computing on low-power and reprogrammable devices, Austin-based Knupath, has spent a decade in stealth mode designing and fabricating a custom digital signal processor (DSP) chip to target deep learning training, machine learning-based analytics workloads, and naturally, signal processing. The company, led by famed NASA administrator Dan Goldin, has already fulfilled a $20 million contract for its first generation DSP-based system and has an interesting roadmap ahead, which will include the integration of FPGAs, among other devices.
(Score: 0) by Anonymous Coward on Tuesday June 07 2016, @04:19PM
Whose FPGA?
Will there be an open source toolchain?
Is there any documentation available for writing open source tools for these chips?
(Score: 1) by WillR on Tuesday June 07 2016, @06:04PM
They've had $100 million in investments and been working for 10 years and they're only breaking radio silence now? And the CEO is someone who would have invaluable contacts in Washington and years of experience negotiating big federal purchase contracts? Their first and best customer is probably the good old No Such Agency.
(Score: 2) by Dunbal on Tuesday June 07 2016, @07:53PM
100 million for 10 years? Why they could almost have made a space game [robertsspaceindustries.com] by now!
(Score: 2, Informative) by Anonymous Coward on Tuesday June 07 2016, @07:58PM
From the interesting fa,
> Such a system is scalable to 512,000 chips. Each chip has 256 tDSP cores (the “t” is for “tiny” with a single ARM management core). The latency story is a compelling one, with rack to rack latency of 400 nanoseconds (as good as the fastest Ethernet today)—all with the ability to handle sparse matrix computations efficiently and especially.
and,
> ...All of this leads to aggregate memory bandwidth of 3.7 terabytes per second across the machine. On the scalability front then, each little “cluster” has the memory shared among the DSPs so the memory bandwidth numbers scale proportionally to the number of chips (adding more means adding more memory and memory bandwidth into the system).
This is big stuff, you probably aren't going to be programming it in your basement. (unless you have a pretty special basement?)