Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Thursday September 29 2016, @09:53AM   Printer-friendly
from the Translate-"Jabberwocky" dept.

Google Translate will be upgraded using a "Neural Machine Translation" technique, starting with Chinese-English translation today:

Google has been working on a machine learning translation technique for years, and today is its official debut. The Google Neural Machine Translation [GNMT] system, deployed today for Chinese-English queries, is a step up in complexity from existing methods. Here's how things have evolved (in a nutshell). [...] GNMT is the latest and by far the most effective to successfully leverage machine learning in translation. It looks at the sentence as a whole, while keeping in mind, so to speak, the smaller pieces like words and phrases. It's much like the way we look at an image as a whole while being aware of individual pieces — and that's not a coincidence. Neural networks have been trained to identify images and objects in ways imitative of human perception, and there's more than a passing resemblance between finding the gestalt of an image and that of a sentence.

Interestingly, there's little in there actually specific to language: The system doesn't know the difference between the future perfect and future continuous, and it doesn't break up words based on their etymologies. It's all math and stats, no humanity. Reducing translation to a mechanical task is admirable, but in a way chilling — though admittedly, in this case, little but a mechanical translation is called for, and artifice and interpretation are superfluous.

The code runs on Google's homegrown TPUs. The Google Research Blog says that the technique will be applied to other language pairs in the coming months.

Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by KritonK on Thursday September 29 2016, @01:37PM

    by KritonK (465) on Thursday September 29 2016, @01:37PM (#407890)

    If i input

    我隻氣墊船裝滿晒鱔

    which I am assured [omniglot.com] means "my hovercraft is full of eels", I get:

    I only hovercraft filled with eel.

    It leaves a lot to be desired.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 3, Informative) by Francis on Thursday September 29 2016, @02:00PM

    by Francis (5544) on Thursday September 29 2016, @02:00PM (#407900)

    Looks like the original Chinese isn't right. The second character is wrong, it should be "我的氣墊船裝滿晒鱔" if you type that into the translator you get almost exactly the translation you would expect, and arguably completely correct.

    • (Score: 0) by Anonymous Coward on Thursday September 29 2016, @02:56PM

      by Anonymous Coward on Thursday September 29 2016, @02:56PM (#407936)

      That's a problem with machine translations. If one character is off a machine destroys the entire sentence. A human, like yourself, can tell it was a minor mistake and can still correctly translate the intent. So the machines still need improvement.

      • (Score: 2) by jdavidb on Thursday September 29 2016, @08:15PM

        by jdavidb (5690) on Thursday September 29 2016, @08:15PM (#408101) Homepage Journal

        That's a problem with machine translations. If one character is off a machine destroys the entire sentence

        On the plus side, that sounds like a very effective hashing algorithm!

        --
        ⓋⒶ☮✝🕊 Secession is the right of all sentient beings
      • (Score: 1) by Francis on Thursday September 29 2016, @09:13PM

        by Francis (5544) on Thursday September 29 2016, @09:13PM (#408121)

        Chinese is particularly problematic because they haven't discovered spaces between words. As a result word segmentation issues abound.
        Most other languages have them. It's rather inefficient to have to mentally insert white space when reading and makes it hard to identify words.

        Making things worse, the language has a ton of particles that aren't words, but are required for grammatical reasons.

      • (Score: 2) by darkfeline on Friday September 30 2016, @03:29AM

        by darkfeline (1030) on Friday September 30 2016, @03:29AM (#408223) Homepage

        That's a false premise though. Not all human Chinese speakers would have caught that error, and many Chinese speakers are functionally illiterate.

        Comparing the best of human performance against the worst of computer performance is hardly fair, and is denial at best, of the frightening potential of machine learning.

        --
        Join the SDF Public Access UNIX System today!
  • (Score: 2) by moondrake on Thursday September 29 2016, @02:14PM

    by moondrake (2658) on Thursday September 29 2016, @02:14PM (#407905)

    Well..there is Chinese and there is Chinese. I am by no means fluent but would translate the sentence as google did, even although I end up with nonsense (as if the original made much sense...). Then I followed your link and saw you copied a Cantonese sentence... Often the writing has roughly the same meaning though (even although the talking is unintelligible to me, and probably to most native mandarin speakers). But it is not the same.

    Google does not seem to translate Cantonese. But it does translate the mandarin Chinese correctly.