Intel to add AI engine to all 14th-gen Meteor Lake SoCs:
Computex Intel will use the "VPU" tech it acquired along with Movidius in 2016 to all models of its forthcoming Meteor Lake client CPUs.
[...] Curiously, Intel didn't elucidate the acronym, but has previously said it stands for Vision Processing Unit. Chipzilla is, however, clear about what it does and why it's needed – and it's more than vision.
Intel Veep and general manager of Client AI John Rayfield said dedicated AI silicon is needed because AI is now present in many PC workloads. Video conferences, he said, feature lots of AI enhancing video and making participants sounds great – and users now just expect that PCs do brilliantly when Zooming or WebExing or Teamising. Games use lots of AI. And GPT-like models, and tools like Stable Diffusion, are already popular on the PC and available as local executables.
CPUs and GPUs do the heavy lifting today, but Rayfield said they'll be overwhelmed by the demands of AI workloads.
Shifting that work to the cloud is pricey, and also impractical because buyers want PCs to perform.
Meteor Lake therefore gets VPUs and emerges as an SoC that uses Intel's Foveros packaging tech to combine the CPU, GPU, and VPU.
The VPU gets to handle "sustained AI and AI offload." CPUs will still be asked to do simple inference jobs with low latency, usually when the cost of doing so is less than the overhead of working with a driver to shunt the workload elsewhere. GPUs will get to do jobs involving performance parallelism and throughput. Other AI-related work will be offloaded to VPUs.
Intel Demos Meteor Lake's AI Acceleration for PCs, Details VPU Unit:
[...] Intel will still include the Gaussian Neural Acceleration low-power AI acceleration block that already exists on its chips, marked as 'GNA 3.5' on the SoC tile in the diagram (more on this below). You can also spot the 'VPU 2.7' block that comprises the new Movidius-based VPU block.
Like Intel's stylized render, the patent image is also just a graphical rendering with no real correlation to the actual physical size of the dies. It's easy to see that with so many external interfaces, like the memory controllers, PCIe, USB, and SATA, not to mention the media and display engines and power management, that the VPU cores simply can't consume much of the die area on the SoC tile. For now, the amount of die area that Intel has dedicated to this engine is unknown.
The VPU is designed for sustained AI workloads, but Meteor Lake also includes a CPU, GPU, and GNA engine that can run various AI workloads. Intel's Intel says the VPU is primarily for background tasks, while the GPU steps in for heavier parallelized work. Meanwhile, the CPU addresses light low-latency inference work. Some AI workloads can also run on both the VPU and GPU simultaneously, and Intel has enabled mechanisms that allow developers to target the different compute layers based on the needs of the application at hand. This will ultimately result in higher performance at lower power -- a key goal of using the AI acceleration VPU.
Intel's chips currently use the GNA block for low-power AI inference for audio and video processing functions, and the GNA unit will remain on Meteor Lake. However, Intel says it is already running some of the GNA-focused code on the VPU and achieving better results, with a heavy implication that Intel will transition to the VPU entirely with future chips and remove the GNA engine.
Intel also disclosed that Meteor Lake has a coherent fabric that enables a unified memory subsystem, meaning it can easily share data among the compute elements. This is a key functionality that is similar in concept to other contenders in the CPU AI space, like Apple with its M-series and AMD's Ryzen 7040 chips.
(Score: 5, Informative) by takyon on Tuesday May 30, @08:01AM (2 children)
The ~5-30 TOPS AI accelerators found in smartphone SoCs, and now x86 CPUs can be good enough to be useful (for inference). AMD added one to Phoenix, now Intel is adding one to Meteor Lake. They must be ubiquitous to get adoption, and they will eventually spread to all desktop CPUs and even some non-AI-focused server products:
https://www.notebookcheck.net/AMD-outlines-plans-to-integrate-AI-XDNA-IPUs-across-its-entire-processor-portfolio.717919.0.html [notebookcheck.net]
https://www.notebookcheck.net/AMD-and-Microsoft-present-AI-Developer-Tools-for-Ryzen-7040-processors.719863.0.html [notebookcheck.net]
For mobile, power efficiency is key, so it should definitely be on the same package and not another chip. Rumor has it that Meteor Lake will have a big enough boost in iGPU performance to displace some low-end laptop dGPUs off the market.
Although iGPUs could probably be used for inference, if you want to use them for graphics at the same time, that's not ideal. Eventually, games might be using AI accelerators for real-time voice synthesis or something.
I don't think the amount of die space being used up here is very much. We're probably talking about less than 20 mm2 inside of the 95mm2 "SoC tile":
https://www.semianalysis.com/p/meteor-lake-die-shot-and-architecture [semianalysis.com]
Already done:
https://www.tomshardware.com/news/new-amd-instinct-mi300-details-emerge-debuts-in-2-exaflop-el-capitan-supercomputer [tomshardware.com]
https://www.nextplatform.com/2023/05/29/nvidias-grace-hopper-hybrid-systems-bring-huge-memory-to-bear/ [nextplatform.com]
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
(Score: 0) by Anonymous Coward on Tuesday May 30, @10:06AM (1 child)
In most cases for "PC" usage the AIs will be running on Microsoft/Google/your servers. While the AI developers would probably be using dedicated hardware for it.
I can somewhat see the argument for smartphones since you might want your phone to have some AI stuff and use less power and still work in scenarios where you have zero/flaky data connectivity.
(Score: 4, Insightful) by takyon on Tuesday May 30, @10:34AM
It could be seen as a chicken and egg problem. They have to add the accelerator before anyone will use it. Intel showed off a Stable Diffusion plugin they wrote for GIMP, and there are Photoshop tools that could probably use it relatively soon, instead of using the GPU.
The power argument is the same for laptops. People want closer to 24 hours of battery life than 3.
You can see some of the other partners in these slides:
https://images.anandtech.com/doci/18878/MTL%20AI%20Deck%20for%20May%202023%20press%20brief_13.png [anandtech.com]
https://images.anandtech.com/doci/18878/MTL%20AI%20Deck%20for%20May%202023%20press%20brief_15.png [anandtech.com]
https://images.anandtech.com/doci/18878/MTL%20AI%20Deck%20for%20May%202023%20press%20brief_12.png [anandtech.com]
https://www.anandtech.com/show/18878/intel-discloses-new-details-on-meteor-lake-vpu-block-lays-out-vision-for-client-ai [anandtech.com]
I think there is less desire to use remote servers for some of this stuff than you might think. Ignoring the privacy angle, it could just be more responsive to run inference locally. Real-time video effects for webcams is one example in the slides.
[SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]