https://scitechdaily.com/scientists-create-a-periodic-table-for-artificial-intelligence/ [scitechdaily.com]
Artificial intelligence is increasingly relied on to combine and interpret different kinds of data, including text, images, audio, and video. One obstacle that continues to slow progress in multimodal AI is deciding which algorithmic approach best fits the specific task an AI system is meant to solve.
Researchers have now introduced a unified way to organize and guide that decision process. Physicists at Emory University developed a new framework that brings structure to how algorithms for multimodal AI are derived, and their work was published in The Journal of Machine Learning Research.
"We found that many of today's most successful AI methods boil down to a single, simple idea — compress multiple kinds of data just enough to keep the pieces that truly predict what you need," says Ilya Nemenman, Emory professor of physics and senior author of the paper. "This gives us a kind of 'periodic table' of AI methods. Different methods fall into different cells, based on which information a method's loss function retains or discards."
A loss function is the mathematical rule an AI system uses to evaluate how wrong its predictions are. During training, the model continually adjusts its internal parameters in order to reduce this error, using the loss function as a guide.
"People have devised hundreds of different loss functions for multimodal AI systems and some may be better than others, depending on context," Nemenman says. "We wondered if there was a simpler way than starting from scratch each time you confront a problem in multimodal AI."
To address this, the team developed a mathematical framework that links the design of loss functions directly to decisions about which information should be preserved and which can be ignored. They call this approach the Variational Multivariate Information Bottleneck Framework.
"Our framework is essentially like a control knob," says co-author Michael Martini, who worked on the project as an Emory postdoctoral fellow and research scientist in Nemenman's group. "You can 'dial the knob' to determine the information to retain to solve a particular problem."
"Our approach is a generalized, principled one," adds Eslam Abdelaleem, first author of the paper. Abdelaleem took on the project as an Emory PhD candidate in physics before graduating in May and joining Georgia Tech as a postdoctoral fellow.
"Our goal is to help people to design AI models that are tailored to the problem that they are trying to solve," he says, "while also allowing them to understand how and why each part of the model is working."
AI-system developers can use the framework to propose new algorithms, to predict which ones might work, to estimate the needed data for a particular multimodal algorithm, and to anticipate when it might fail.
"Just as important," Nemenman says, "it may let us design new AI methods that are more accurate, efficient and trustworthy."
The researchers brought a unique perspective to the problem of optimizing the design process for multimodal AI systems.
"The machine-learning community is focused on achieving accuracy in a system without necessarily understanding why a system is working," Abdelaleem explains. "As physicists, however, we want to understand how and why something works. So, we focused on finding fundamental, unifying principals to connect different AI methods together."
Abdelaleem and Martini began this quest — to distill the complexity of various AI methods to their essence — by doing math by hand.
"We spent a lot of time sitting in my office, writing on a whiteboard," Martini says. "Sometimes I'd be writing on a sheet of paper with Eslam looking over my shoulder."
The process took years, first working on mathematical foundations, discussing them with Nemenman, trying out equations on a computer, then repeating these steps after running down false trails.
"It was a lot of trial and error and going back to the whiteboard," Martini says.
They vividly recall the day of their eureka moment.
They had come up with a unifying principal that described a tradeoff between compression of data and reconstruction of data. "We tried our model on two test datasets and showed that it was automatically discovering shared, important features between them," Martini says. "That felt good."
As Abdelaleem was leaving campus after the exhausting, yet exhilarating, final push leading to the breakthrough, he happened to look at his Samsung Galaxy smart watch. It uses an AI system to track and interpret health data, such as his heart rate. The AI however, had misunderstood the meaning of his racing heart throughout that day.
"My watch said that I had been cycling for three hours," Abdelaleem says. "That's how it interpreted the level of excitement I was feeling. I thought, 'Wow, that's really something! Apparently, science can have that effect."
The researchers applied their framework to dozens of AI methods to test its efficacy.
"We performed computer demonstrations that show that our general framework works well with test problems on benchmark datasets," Nemenman says. "We can more easily derive loss functions, which may solve the problems one cares about with smaller amounts of training data."
The framework also holds the potential to reduce the amount of computational power needed to run an AI system.
"By helping guide the best AI approach, the framework helps avoid encoding features that are not important," Nemenman says. "The less data required for a system, the less computational power required to run it, making it less environmentally harmful. That may also open the door to frontier experiments for problems that we cannot solve now because there is not enough existing data."
The researchers hope others will use the generalized framework to tailor new algorithms specific to scientific questions they want to explore.
Meanwhile, they are building on their work to explore the potential of the new framework. They are particularly interested in how the tool may help to detect patterns of biology, leading to insights into processes such as cognitive function.
"I want to understand how your brain simultaneously compresses and processes multiple sources of information," Abdelaleem says. "Can we develop a method that allows us to see the similarities between a machine-learning model and the human brain? That may help us to better understand both systems."
Reference: “Deep Variational Multivariate Information Bottleneck – A Framework for Variational Losses” by Eslam Abdelaal, Ilya Nemenman and K. Michael Martini Jr., 2 September 2025, arXiv.
DOI: 10.48550/arXiv.2310.03311 [doi.org]