Stories
Slash Boxes
Comments

SoylentNews is people

posted by Fnord666 on Sunday January 28 2018, @01:49PM   Printer-friendly
from the augmented-intelligence dept.

Arthur T Knackerbracket has found the following story:

Greg Kondrak, a computer scientist from University of Alberta's AI lab, claims to have begun decoding the mystery behind the unknown text with his novel algorithm, CTVNews reported.

[...] It is believed that the manuscript is somehow related to women's health but there is no solid clue, according to the report. People have made wild guesses regarding the code, with at least eight making firm claims – only to be debunked later on.

Kondark, however, took a different approach towards solving the problem – artificial intelligence. "Once you see it, once you find out the mystery, this is a natural human tendency to solve the puzzle," the computer scientist told CTVNews. "I was intrigued and thought I could contribute something new."

He and his co-author Bradley Hauer combined novel AI algorithms with statistical procedures to identify and translate the language. The approach, which had been used to translate United Nations Universal Declaration of Human Rights in 380 languages, came in handy and suggested the language was Hebrew, albeit with critical tweaks.

They found that the letters in every word had been reordered and the vowels were dropped in the code. The first complete sentence which the AI decrypted read, "She made recommendations to the priest, man of the house and me and people." One section of the text carries words that translate into "farmer", "light", "air", and "fire".

The translated line could be the starting of something big but it is a long way to go for Kondark, who stresses on the need of complementary human assistance. However, it is not clear how accurate the translation really is.

"Somebody with very good knowledge of Hebrew and who's a historian at the same time could take this evidence and follow this kind of clue," he said while highlighting the need of someone who could make sense of the translated text.

For those who may not be familiar with the manuscript, see Voynich Manuscript at Wikipedia, or read it yourself at archive.org (Javascript required).


Original Submission

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 4, Interesting) by FatPhil on Sunday January 28 2018, @03:04PM (6 children)

    by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Sunday January 28 2018, @03:04PM (#629463) Homepage
    "The approach, which had been used to translate United Nations Universal Declaration of Human Rights in 380 languages..."

    380 *known* languages, this is a different problem domain.

    "suggested the language was Hebrew"

    Just looking at the texts, that seems unlikely. I'd like to see the entropy measurements of the letters, digrams, trigrams, and word forms, and compare them to a range of languages. I just don't imagine it having the same statistics as Hebrew.

    It's hard to *disprove* such claims though. And I'm an ardent Popperite.

    They should start with /tabula rasa/, run their system over the bulk of the text (but nothing part from the text), and then be offered snippets of text from a random page, and see if the trained system can tell you whether those words are next to a picture of a man, a woman, a beast, a plant, or whatever. If they can guess the image better than random, maybe they've got something. That presumes that the text and the images are correlated, of course. If the images were necessary for the training, then that changes things, of course, but that's not insurmountable. Their program just has to compete faivourably (and statistically significantly) against a naive program that is only told which words appear near which images, and uses Bayes to estimate probabilities of images appearing near the given test words.
    --
    Life is a precious commodity. A wise investor would get rid of it when it has the highest value.
    • (Score: 4, Interesting) by zocalo on Sunday January 28 2018, @03:48PM (2 children)

      by zocalo (302) on Sunday January 28 2018, @03:48PM (#629476)
      I guess that depends on what they meant by being "in Hebrew", since the manuscript obviously isn't actually written in the Hebrew script but one unique to the document. It could be that the AI suggested "Hebrew" based on the breakdown of symbols counts, which are adjacent to which, etc., but it could also be one level abstracted from that and be based more on sentence structure, e.g. if you were to give a compilation of Yoda's sayings to analyse then it *should* come back with Japanese, even though they were actually spoken in English. People have done such analysiys on the manuscript before and the general consensus seems to be that there's a method behind the madness and it's not just random gibberish (it might still be well structured gibberish though), so it may well be as simple as taking the sentence structure of one language, writing it down word for word in another, then using some kind of substitution cypher to turn it into the Voynich script.

      Given the number of possible permutations though, I'm not holding out much hope that this analysis has got the right combination.
      --
      UNIX? They're not even circumcised! Savages!
      • (Score: 3, Funny) by requerdanos on Sunday January 28 2018, @05:00PM

        by requerdanos (5997) Subscriber Badge on Sunday January 28 2018, @05:00PM (#629503) Journal

        An "AI" to decode [The Voynich Manuscript | The Cosmic Microwave Background Radiation | Leetspeak | etc. ] ...


        # initialize
        if (exist: spaces) {
                delimiter=spaces;
        } else {
                delimiter=random(portion of input);
        }
        # process
        while more_document_exists {
                this_gibberish_word = next_word_until_delimiter();
                this_translated_word = word_this_nonsense_is_mathematically_least_dissimilar_to(this_gibberish_word);
                add_to_output (this_translated_word)
        }
        #enjoy success

        I'd be surprised if the AI in TFA differs conceptually by much.

      • (Score: 2) by FatPhil on Sunday January 28 2018, @09:52PM

        by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Sunday January 28 2018, @09:52PM (#629593) Homepage
        I was guessing that they came up with a "maybe the vowels are missing?" idea, and then just made a leap to "like hebrew", and ran with it.

        But just look at it as a tesselation of symbols, it bears no resemblence in dynamics to hebrew texts - I can't believe it has the same kind of statistical distribution. It looks like poetic latinate text (the amount of repetition would be unusual for prose).
        --
        Life is a precious commodity. A wise investor would get rid of it when it has the highest value.
    • (Score: 1) by tftp on Sunday January 28 2018, @06:30PM (1 child)

      by tftp (806) Subscriber Badge on Sunday January 28 2018, @06:30PM (#629529) Homepage
      I thought it's already solved. The plants are Mexican [newscientist.com], and the language is [a version of] nahuatl [voynichms.com], written in some old Spanish font.
    • (Score: 2) by darkfeline on Tuesday January 30 2018, @04:34AM

      by darkfeline (1030) on Tuesday January 30 2018, @04:34AM (#630189) Homepage

      > 380 *known* languages, this is a different problem domain.

      This is a different problem domain for a human, not necessarily so for an AI.

      The thing that you have got to understand is that the way AI "thinks" is fundamentally different from humans. It's like the difference between proving a theorem using geometry and proving the same theorem using linear algebra. The strategy, approach, and difficulties are going to be completely different. That's why an AI might mistake a poodle for a car, but a human might mistake a jar for a human face. Both have weaknesses, they're just completely different weaknesses.

  • (Score: 2) by requerdanos on Sunday January 28 2018, @04:18PM (6 children)

    by requerdanos (5997) Subscriber Badge on Sunday January 28 2018, @04:18PM (#629485) Journal

    Greg Kondrak, a computer scientist from University of Alberta's AI lab, claims to have begun decoding [the Voynich manuscript] with his novel algorithm

    He wouldn't be the first, yet it remains untranslated.

    It is believed that the manuscript is somehow related to women's health

    Um, citation needed? believed by whom? I sort of think it's about better living through plant-based chemical supplements in association with woo-woo astrology thinking, just judging by flipping through the thing, and I am not aware of any more responsible opinion despite the low, low bar that sets.

    People have made wild guesses regarding the code, with at least eight making firm claims – only to be debunked later on.

    This is why claims are merely annoying background noise; a headline akin to "So-and-so finally decyphers Voynich Manuscript; It's about how Aliens started the History Channel" would be newsworthy.

    But, AI==for nerds, I get it. :)

    • (Score: 0) by Anonymous Coward on Sunday January 28 2018, @04:29PM (2 children)

      by Anonymous Coward on Sunday January 28 2018, @04:29PM (#629490)

      and he'snt even done! don't we usually wait until pet projects like that net something?? or is there some cool code i missed?

      • (Score: 3, Funny) by maxwell demon on Sunday January 28 2018, @04:40PM (1 child)

        by maxwell demon (1608) Subscriber Badge on Sunday January 28 2018, @04:40PM (#629494) Journal

        or is there some cool code i missed?

        Sure, the Voynich code. ;-)

        --
        The Tao of math: The numbers you can count are not the real numbers.
        • (Score: 2) by requerdanos on Sunday January 28 2018, @04:47PM

          by requerdanos (5997) Subscriber Badge on Sunday January 28 2018, @04:47PM (#629499) Journal

          Sure, the Voynich code. ;-)

          Well, even that is unproven and very much up in the air.

          It could well be "The Voynich set of pictures and cool handwriting-looking meaningless drawings."

    • (Score: 0) by Anonymous Coward on Sunday January 28 2018, @04:46PM

      by Anonymous Coward on Sunday January 28 2018, @04:46PM (#629498)

      Um, citation needed? believed by whom?

      It just is. You don't understand!

    • (Score: 0) by Anonymous Coward on Sunday January 28 2018, @05:51PM (1 child)

      by Anonymous Coward on Sunday January 28 2018, @05:51PM (#629517)

      It is believed that the manuscript is somehow related to women's health

      Um, citation needed? believed by whom?

      Well, there's lots of images of naked women. So it's either about women's health or a tedious tome of porn.

      • (Score: 3, Insightful) by ledow on Sunday January 28 2018, @09:39PM

        by ledow (5567) on Sunday January 28 2018, @09:39PM (#629586) Homepage

        If you bought a catalogue of greek statues, it would also be have lots of images of naked women.

        Doesn't mean it has anything to do with women's health, or porn for that matter.

        Hey, it's almost like NOBODY knows what it's really about, isn't it?

  • (Score: 3, Interesting) by theluggage on Sunday January 28 2018, @04:43PM (3 children)

    by theluggage (1797) on Sunday January 28 2018, @04:43PM (#629497)

    I thought this had already been settled [xkcd.com].

    • (Score: 2) by requerdanos on Sunday January 28 2018, @04:51PM

      by requerdanos (5997) Subscriber Badge on Sunday January 28 2018, @04:51PM (#629500) Journal

      I thought this had already been settled [xkcd 593, Voynich Manuscript as Dungeons and Dragons DM Guide].

      That's as good a guess as any, and much more interesting (and just as well-founded) as the "believed... related to women's health" guess in TFS.

    • (Score: 3, Funny) by Gaaark on Sunday January 28 2018, @05:38PM (1 child)

      by Gaaark (41) Subscriber Badge on Sunday January 28 2018, @05:38PM (#629513) Homepage Journal

      Yes... i solved it: it is Harry Potter fan-fiction porn.

      --
      --- That's not flying: that's... falling... with more luck than I have. ---
      • (Score: 5, Touché) by requerdanos on Sunday January 28 2018, @06:00PM

        by requerdanos (5997) Subscriber Badge on Sunday January 28 2018, @06:00PM (#629520) Journal

        Okay, earlier when I said "That's as good a guess as any," it turns out I was wrong.

  • (Score: 4, Insightful) by Anonymous Coward on Sunday January 28 2018, @05:57PM (1 child)

    by Anonymous Coward on Sunday January 28 2018, @05:57PM (#629519)

    It is believed that the manuscript is somehow related to women's health but there is no solid clue

    Women have always been hard to figure out.

    • (Score: 3, Funny) by Gaaark on Sunday January 28 2018, @09:10PM

      by Gaaark (41) Subscriber Badge on Sunday January 28 2018, @09:10PM (#629574) Homepage Journal

      +1 BIG TIME Insightful...married 32 years now and STILL haven't seen logical consistency in any way, lol.

      I know, I know, women are never logical nor consistent....but man I loves her. :)

      "That's what ise appreciates about you"
      --Letterkenny guy I can't remember name of.

      --
      --- That's not flying: that's... falling... with more luck than I have. ---
  • (Score: 2, Insightful) by Anonymous Coward on Sunday January 28 2018, @09:15PM

    by Anonymous Coward on Sunday January 28 2018, @09:15PM (#629578)

    sigh

  • (Score: 5, Interesting) by AthanasiusKircher on Sunday January 28 2018, @09:40PM (4 children)

    by AthanasiusKircher (5291) Subscriber Badge on Sunday January 28 2018, @09:40PM (#629587) Journal

    Don't we all remember the many times before when someone has claimed to have deciphered this thing, often to proved to be a complete idiot once any random expert who knows anything about medieval languages and manuscripts looks at their "translation"? Didn't we learn anything even from the one just a few months ago? (Reported for example in Ars Technica [arstechnica.com], then conclusively debunked [arstechnica.com] immediately, once really ANY medieval expert looked at it and said, "Huh, nope -- that doesn't make any actual sense in Latin.")

    But maybe this guy's different, right? Well, I tried finding more details than in TFA, and I happened upon this piece in the Independent [independent.co.uk], which gives a few. Things aren't looking promising. Here are a few excerpted details from the Independent article:

    “It turned out that over 80 per cent of the words were in a Hebrew dictionary, but we didn’t know if they made sense together,” said Professor Kondrak.

    Hmm... what about the other 20%? How many hits did they get if they tried one of the other 380 languages they used to train their algorithm? Is this just a kind of "p-hacking" kind of thing where we just find the closest hit and assume it must be a meaningful pattern? (Devil's advocate: if the text deals with esoteric topics like alchemy, mystical knowledge about obscure herbs and botanical stuff, it's likely a lot of words would have to be coined in Hebrew to deal with it. I've encountered this problem myself reading renaissance Latin texts, where they often have to make up new Latin words to describe obscure stuff.)

    While they noted that none of their results, using any reference language, resulted in text they could describe as “correct”, the Hebrew output was most successful.

    Hmm... sounds even more suspiciously like the "p-hacking" kind of thing.

    The scientists approached fellow computer scientist and native Hebrew speaker Professor Moshe Koppel with samples of deciphered text. Taking the first line as an example, Professor Koppel confirmed that it was not a coherent sentence in Hebrew.

    Okay, so they consulted an expert, who told them it wasn't coherent Hebrew. But I don't understand -- they showed an expert only one sentence? That's all they used as an evaluation here? How about translating at least a few pages and see whether any of it makes sense?

    But surely, they listened to the expert and gave up. Oh no...

    However, following tweaks to the spelling, the scientists used Google Translate to convert it into English, which read: “She made recommendations to the priest, man of the house and me and people.”

    WHAA?!! They consulted an actual expert in Hebrew, who told them it didn't make sense. So, rather than accepting maybe the method had problems or maybe translating more and taking it to more experts, they just started randomly changing spellings and then stuck it into Google Translate??!!!

    How much spelling tweaks did they need to do? Were they just of the sort that might reasonably be encountered in a renaissance manuscript (where spelling wasn't necessarily as standardized as today)? Or did they randomly start swapping letters where it didn't make sense? Or did the words constitute actual Hebrew words, but they just tweaked the grammar and morphology through spelling to make it work? Either way, this sounds like we could easily go from "potential statistical significance" to "absolute BS I found in random characters that I forced to make sense" pretty quickly.

    But surely they took this new version back to an expert in Hebrew or some other expert to determine whether it would possibly make sense as an opening sentence in a treatise? You know, understanding conventions of the Hebrew language, the kind of rhetoric a treatise of this era might use? Well...

    “It’s a kind of strange sentence to start a manuscript but it definitely makes sense,” said Professor Kondrak.

    Oh.

    So, you got a rough sort of "hit" on Hebrew as pattern-matching a little better than other languages, then you translated only one sentence and took it to an expert, who told you it didn't make sense. Then, undeterred, you started tweaking the letters and changing spelling until Google Translate managed to come up with something that sounded vaguely like an English sentence. And then you just decided, "Yeah, I guess it 'definitely' makes sense..."

    Okay. Thanks, Prof. Kondrak. Can you come back when you've successfully "deciphered" more than a single sentence and consulted an expert who actually knows the language you're talking about and its history and declared your method came up with something that made sense??

    Too late -- numerous news stories are already claiming you successfully "decoded" it.

    By the way, I tracked down the actual technical article [transacl.org] and skimmed it too. There's a histogram on p. 83 that shows Hebrew as an "outlier" which is why they chose it. But it's not like it's an extreme outlier in the 280 languages they tested. Also, they didn't address whether there might be linguistic features of Hebrew that make it statistically predisposed to be a better match. For example, Hebrew often doesn't record vowels, which can create more ambiguity in potential words. Does that predispose it to be a more likely match to random data using their algorithm? They talk a bit about the vowel issue, but don't address this concern directly (at least that I could find in skimming the article). They do note the second-best match is Esperanto, which is obviously rejected because it was a made up language of recent invention -- but there they note the regularity of the language makes it more likely to be a high match.

    All in all, I'm really not convinced. The fact that Hebrew was the best match according to a couple different algorithms is potentially interesting, but I'm not sure how this is getting all this attention for "deciphering" anything when it sounds like the only evidence they offered was ONE BLOODY SENTENCE that was declared incorrect by an expert and that they only managed to translate by swapping out letters and plugging it into Google Translate.

    ...

    Oh, and by the way, I should know about BS claims about translation. I owned the darn Voynich manuscript myself 350 years ago. I claimed to translate Egyptian hieroglyphics [wikipedia.org] even though I couldn't -- you can still see my faulty Latin translations of an Egyptian obelisk [wikipedia.org] in Piazza Navona in Rome. And I even claimed to translate stuff INTO Egyptian hieroglyphs, even though I had no idea what I was doing. (Actually, back then I maybe had a little too much of the communion wine and claimed I was "divinely inspired" to understand the Egyptian language and to do these "translations.")

    • (Score: 4, Interesting) by AthanasiusKircher on Sunday January 28 2018, @10:52PM (2 children)

      by AthanasiusKircher (5291) Subscriber Badge on Sunday January 28 2018, @10:52PM (#629621) Journal

      Just a couple other quick comments after reading the actual technical article more in-depth:

      (1) They did translate more than a sentence, but it's not clear how much. They said the data is "noisy." Other than the opening sentence, the only bit of data they offer is that a 72-word section assumed to be about herbs contains the Hebrew words "narrow," "farmer," "light," "air," and "fire," which they assume are vocabulary words likely to occur in a section on herbs. No other detailed evidence of translation is offered other than the opening sentence and these five words.

      (2) On p. 85 of the technical article, there's another histogram showing that if you drop vowels in other languages (as they do in Hebrew), Latin, Italian, and English all have matches for vocabulary in the 75-85% range, similar to Hebrew. (Latin appears to be even a slightly better match than Hebrew if you drop the vowels.) I don't know how many other languages they tested like this (if any), but that already begins to make the "more than 80% Hebrew words" thing sound like it's not necessarily a significant claim. And again, this leads me to wonder how much of the "Hebrew match" is just a statistical artifact due to linguistic features of Hebrew.

      • (Score: 3, Interesting) by The Archon V2.0 on Monday January 29 2018, @03:00PM (1 child)

        by The Archon V2.0 (3887) on Monday January 29 2018, @03:00PM (#629818)

        > On p. 85 of the technical article, there's another histogram showing that if you drop vowels in other languages (as they do in Hebrew), Latin, Italian, and English all have matches for vocabulary in the 75-85% range, similar to Hebrew.

        So the biggest takeaway from this is that the script might be an abjad instead of an alphabet? Hasn't that hypothesis been floating around for a decade (likely more) already?

        • (Score: 3, Interesting) by AthanasiusKircher on Thursday February 01 2018, @09:07PM

          by AthanasiusKircher (5291) Subscriber Badge on Thursday February 01 2018, @09:07PM (#631692) Journal

          Sorry for the late reply, but I'm not sure that's the takeaway. Dropping the vowels makes it easier to match multiple words to a single set of symbols, thereby increasing the apparent "match" stats for a test like they did here.

          My point is that it's likely the "high match" percentage is just due to that basic statistical fact, i.e., that an abjad is likely to have a higher "hit rate" just due to random coincidence. And since Hebrew is frequently written as an abjad, perhaps that's one of the only reasons Hebrew was ranked higher in their algorithm. (Though I'm hoping they actually realized this, since it's a pretty basic feature likely to influence the stats. I'm hoping they did take that into account and that Hebrew still stood above other languages... though it's not clear from what I read that the difference is statistically big enough to justify their confidence that Hebrew is actually the language "encoded" in the manuscript.)

    • (Score: 2) by requerdanos on Monday January 29 2018, @01:27AM

      by requerdanos (5997) Subscriber Badge on Monday January 29 2018, @01:27AM (#629664) Journal

      a kind of "p-hacking" kind of thing where we just find the closest hit and assume it must be a meaningful pattern

      That's kind of a long name for the program. Maybe they will eventually call it "Phacker" for short (pronounced "fack' er").

(1)