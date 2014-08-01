[...] . That's what it's not. What actually is NorthPole? Some of the ideas do carry forward from IBM's earlier efforts. These include the recognition that a lot of the energy costs of AI come from the separation between memory and execution units. Since a key component of neural networks—the weight of connections between different layers of "neurons"—is held in memory, any execution on a traditional processor or GPU burns a lot of energy simply getting those weights from memory to where they can be used during execution.

So NorthPole, like TrueNorth before it, consists of a large array (16×16) of computational units, each of which includes both local memory and code execution capacity. So, all of the weights of various connections in the neural network can be stored exactly where they're needed.

Executing neural networks on the chip is also a relatively unusual process. Once the weights and connections of the neural network are placed in buffers on the chip, execution simply requires an external controller—typically a CPU—to upload the data it's meant to operate on (such as an image) and tell it to start. Everything else runs to completion without the CPU's involvement, which should also limit the system-level power consumption.

[...] While the tests were run with the NorthPole processor installed on a PCIe card, IBM told Ars that the chip is still viewed as a research prototype, and additional work would be needed to convert it into a commercial product. The company did not indicate whether it would be pursuing commercialization, though.

One of the potential limitations of the system is that it can only run neural networks that fit within its hardware. Put too many nodes in a single layer, and NorthPole cannot deal with it. But there is the possibility of splitting up layers and executing segments of them on multiple NorthPole chips in parallel. The hardware has the capacity to handle this, but it hasn't been tested as of yet.

Perhaps the biggest limitation, however, is that this is specialized for a single category of AI task. While it's a commonly used one, the efficiency here comes largely from designing hardware that's a good match to the type of execution needed by inference tasks. So, while it's good to see the effort put into dropping the power demands of some AI workloads, we're not at the point yet where we can have a single accelerator that works for all cases.