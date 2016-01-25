from the darmok dept.
In 2023, AI researchers at Meta interviewed 34 native Spanish and Mandarin speakers who lived in the US but didn't speak English. The goal was to find out what people who constantly rely on translation in their day-to-day activities expect from an AI translation tool. What those participants wanted was basically a Star Trek universal translator or the Babel Fish from the Hitchhiker's Guide to the Galaxy: an AI that could not only translate speech to speech in real time across multiple languages, but also preserve their voice, tone, mannerisms, and emotions. So, Meta assembled a team of over 50 people and got busy building it.
[...] AI translation systems today are mostly focused on text, because huge amounts of text are available in a wide range of languages thanks to digitization and the Internet.
[...] AI translators we have today support an impressive number of languages in text, but things are complicated when it comes to translating speech.
[...] A few systems that can translate speech-to-speech directly do exist, but in most cases they only translate into English and not in the opposite way.
[...] to pull off the Star Trek universal translator thing Meta's interviewees dreamt about, the Seamless team started with sorting out the data scarcity problem.
[...] Warren Weaver, a mathematician and pioneer of machine translation, argued in 1949 that there might be a yet undiscovered universal language working as a common base of human communication.
[...] Machines do not understand words as humans do. To make sense of them, they need to first turn them into sequences of numbers that represent their meaning.
[...] When you vectorize aligned text in two languages like those European Parliament proceedings, you end up with two separate vector spaces, and then you can run a neural net to learn how those two spaces map onto each other.
But the Meta team didn't have those nicely aligned texts for all the languages they wanted to cover. So, they vectorized all texts in all languages as if they were just a single language and dumped them into one embedding space called SONAR (Sentence-level Multimodal and Language-Agnostic Representations).
[...] The team just used huge amounts of raw data—no fancy human labeling, no human-aligned translations. And then, the data mining magic happened.
SONAR embeddings represented entire sentences instead of single words. Part of the reason behind that was to control for differences between morphologically rich languages, where a single word may correspond to multiple words in morphologically simple languages. But the most important thing was that it ensured that sentences with similar meaning in multiple languages ended up close to each other in the vector space.
[...] The Seamless team suddenly got access to millions of aligned texts, even in low-resource languages, along with thousands of hours of transcribed audio. And they used all this data to train their next-gen translator.
[...] The Nature paper published by Meta's Seamless ends at the SEAMLESSM4T models, but Nature has a long editorial process to ensure scientific accuracy. The paper published on January 15, 2025, was submitted in late November 2023. But in a quick search of the arXiv.org, a repository of not-yet-peer-reviewed papers, you can find the details of two other models that the Seamless team has already integrated on top of the SEAMLESSM4T: SeamlessStreaming and SeamlessExpressive, which take this AI even closer to making a Star Trek universal translator a reality.
SeamlessStreaming is meant to solve the translation latency problem.
[...] SeamlessStreaming was designed to take this experience a bit closer to what human simultaneous translator do—it translates what you're saying as you speak in a streaming fashion. SeamlessExpressive, on the other hand, is aimed at preserving the way you express yourself in translations.
[...] Sadly, it still can't do both at the same time; you can only choose to go for either streaming or expressivity, at least at the moment. Also, the expressivity variant is very limited in supported languages—it only works in English, Spanish, French, and German. But at least it's online so you can go ahead and give it a spin.
Android 6.0 "Marshmallow" takes translation one step further by automatically integrating it into popular apps such as LinkedIn, WhatsApp, and TripAdvisor.
Users will need to have Google's Translate app installed on their phone or tablet for the feature to work, but they won't need to switch back and forth between Translate and other apps to be able to understand text written in other languages. The translated text will appear right in the app being used.
If you're using WhatsApp to communicate with someone in another language, the feature will help you to read his or her messages and to compose your own – just type a response in your preferred language, and Android will convert it to the language spoken by the other person. Google says the feature will allow for translation between any of 90 languages.
Let's test it out. Can any Marshmallow users check if the feature is translating this correctly? 君達の基地は、全てCATSがいただいた。 ("All your base are belong to us")
Google Translate will be upgraded using a "Neural Machine Translation" technique, starting with Chinese-English translation today:
Google has been working on a machine learning translation technique for years, and today is its official debut. The Google Neural Machine Translation [GNMT] system, deployed today for Chinese-English queries, is a step up in complexity from existing methods. Here's how things have evolved (in a nutshell). [...] GNMT is the latest and by far the most effective to successfully leverage machine learning in translation. It looks at the sentence as a whole, while keeping in mind, so to speak, the smaller pieces like words and phrases. It's much like the way we look at an image as a whole while being aware of individual pieces — and that's not a coincidence. Neural networks have been trained to identify images and objects in ways imitative of human perception, and there's more than a passing resemblance between finding the gestalt of an image and that of a sentence.
Interestingly, there's little in there actually specific to language: The system doesn't know the difference between the future perfect and future continuous, and it doesn't break up words based on their etymologies. It's all math and stats, no humanity. Reducing translation to a mechanical task is admirable, but in a way chilling — though admittedly, in this case, little but a mechanical translation is called for, and artifice and interpretation are superfluous.
The code runs on Google's homegrown TPUs. The Google Research Blog says that the technique will be applied to other language pairs in the coming months.
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
This week, we're reporting on a startling, scholarly white paper recently issued by researchers from Yale, the Future of Humanity Institute at Oxford and the AI Impacts think tank that adumbrates the AI world to come.
The white paper – "When Will AI Exceed Human Performance?", based on a global survey of 352 AI experts – reinforces the truism that technology is always at a primitive stage. Impressive as current Big Data and machine learning innovations are, they are embryonic compared with Advanced AI in the decades to come.
High-Level Machine Intelligence (HLMI) will transform the life we know. According to study, it's not just conceivable but likely that all human work will be automated within 120 years, many specific jobs much sooner.
[...] The study asked respondents to forecast automation milestones for 32 tasks and occupations, 20 of which, they predict, will happen within 10 years. Some of the more interesting findings: language translator: seven years: retail salesperson: 12 years; writing a New York Times bestseller and performing surgery: approximately 35 years; conducting math research: 45 years.
The researchers point to two watersheds in AI revolution that will have profound impact. The first is the attainment of HLMI, "achieved when unaided machines can accomplish every task better and more cheaply than human workers."
The researchers reported that the "aggregate forecast" gave a 50 percent chance for HLMI to occur within 45 years (and a 10 percent chance within eight years). Interestingly, respondents from Asia are more sanguine about the HLMI timeframe than those from other regions – Asian respondents expect HLMI within about 30 years, whereas North Americans expect it in 75 years.
AI research will come under the power of HLMI within 90 years, and this in turn could contribute to the second major watershed, what the AI community calls an "intelligence explosion." This is defined as AI performing "vastly better than humans in all tasks," a rapid acceleration in AI machine capabilities.
[...] When I first got interested in the subject, in the mid-1970s, I ran across a letter written in 1947 by the mathematician Warren Weaver, an early machine-translation advocate, to Norbert Wiener, a key figure in cybernetics, in which Weaver made this curious claim, today quite famous:
When I look at an article in Russian, I say, "This is really written in English, but it has been coded in some strange symbols. I will now proceed to decode."
[...] The practical utility of Google Translate and similar technologies is undeniable, and probably it's a good thing overall, but there is still something deeply lacking in the approach, which is conveyed by a single word: understanding. Machine translation has never focused on understanding language. Instead, the field has always tried to "decode"—to get away without worrying about what understanding and meaning are. Could it in fact be that understanding isn't needed in order to translate well? Could an entity, human or machine, do high-quality translation without paying attention to what language is all about? To shed some light on this question, I turn now to the experiments I made.
It is a bit on the long side but Douglas Hofstadter very clearly exposes what language translation is and that Google Translate does not do it that way
AI localization tool claims to translate your words in your voice:
Localization is a tricky issue for all content creators. It can take significant time and resources to make their work fully accessible to folks who speak different languages. One company thinks it has cracked part of that code with an artificial intelligence system that automatically translates speech into other languages in the same speaker's voice.
Resemble AI says its Localize tool can keep voices consistent in various languages in movies, games, audiobooks, corporate videos and other formats. Google is working on similar tech, but we haven't heard much about that since it published a paper on the Translatotron system last year.
[...] For now, Localize can translate speech between English, French, German, Dutch, Italian and Spanish. There are plans to add Korean, Japanese and Mandarin to the mix in the near future.
Resemble AI says Localize can translate recordings in a way that accurately reflects the speaker's words and meanings. The system, it claims, can turn the original audio into speech that uses colloquialisms and grammar structures of a certain region and language.
No mention is made about trying to lip-synch the vocalizations.
Standing 40 cm high and 70 cm wide, the semi-transparent display has translations pop up simultaneously on the screen as the station staff and a foreign tourist speak:
Standing 40 cm high and 70 cm wide, the semi-transparent display has translations pop up simultaneously on the screen as the station staff and a foreign tourist speak.
With more than two million visitors flocking to Japan last month in the wake of the country's post-pandemic reopening, railway companies are gearing up to warmly greet the influx of global travellers.
Seibu Railway, one of the country's large railroad companies, is implementing a new simultaneous translation system to help foreign tourists navigate Tokyo's metro which is notorious for its complexity.
[...] this new semi-transparent display has translations pop up simultaneously on the screen as the station staff and foreign tourists communicate.
"The display we have introduced can automatically translate between Japanese and other languages. When customers speak in a foreign language, the station attendant can see it in Japanese, and when the station attendant speaks Japanese, customers can read the sentences in their own language," said Ayano Yajima, Seibu Railway Sales and Marketing supervisor.
"Google Translate isn't always available because you don't always have Wi-Fi everywhere you go, so places like this, it's also much faster than pulling up your phone, typing everything out, showing it and (there being) misunderstandings. Having it like this, clear on the screen, it's really nice," said Kevin Cometto, an Italian student visiting Japan.
The VoiceBiz UCDisplay supports Japanese and 11 other languages including English, French, and Spanish.
The station staff previously used translation apps.
But with the translation window, a face-to-face conversation through the screen is possible, complete with facial expressions and gestures.
According to Seibu Railway, the device is designed to help customers with more complex requests such as seeking directions or information about the local area.
Last week, Gizmodo parent company G/O Media fired the staff of its Spanish-language site Gizmodo en Español and began to replace their work with AI translations of English-language articles, reports The Verge.
Former Gizmodo writer Matías S. Zavia publicly mentioned the layoffs, which took place via video call on August 29, in a social media post. On August 31, Zavia wrote, "Hello friends. On Tuesday they shut down @GizmodoES to turn it into a translation self-publisher (an AI took my job, literally)."
Previously, Gizmodo en Español had a small but dedicated team who wrote original content tailored specifically for Spanish-speaking readers, as well as producing translations of Gizmodo's English articles. The site represented Gizmodo's first foray into international markets when it launched in 2012 after being acquired from Guanabee.
Newly published articles on the site now contain a link to the English version of the article and a disclaimer stating (via our translation from Google Translate), "This content has been automatically translated from the source material. Due to the nuances of machine translation, there may be slight differences. For the original version, click here."
(Score: 2) by weirsbaski on Friday January 17, @07:35AM
It's nice to see progress on stuff like this, but it's pretty far from a Star Trek *universal* translator. Basing the work on human languages means it automatically gets affected by certain things (which may not apply to a wider audience):
- the sounds are those that human mouths can make, the interesting frequencies are ones that human ears are able to discern
- the bouba/kiki effect is in play ( https://en.wikipedia.org/wiki/Bouba/kiki_effect [wikipedia.org] )
- how voice pitch intersects with status/dominance
- etc
Let the researchers demonstrate 2-way conversations with dolphins, then they'll really be on to something.