AI Potentially Deciphers Voynich Manuscript

Use on UNDHC is irrelevant, other issues Use on UNDHC is irrelevant, other issues (Score: 4, Interesting) by FatPhil on Sunday January 28 2018, @03:04PM (6 children)

by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Sunday January 28 2018, @03:04PM (#629463) Homepage

"The approach, which had been used to translate United Nations Universal Declaration of Human Rights in 380 languages..."

380 *known* languages, this is a different problem domain.

"suggested the language was Hebrew"

Just looking at the texts, that seems unlikely. I'd like to see the entropy measurements of the letters, digrams, trigrams, and word forms, and compare them to a range of languages. I just don't imagine it having the same statistics as Hebrew.

It's hard to *disprove* such claims though. And I'm an ardent Popperite.

They should start with /tabula rasa/, run their system over the bulk of the text (but nothing part from the text), and then be offered snippets of text from a random page, and see if the trained system can tell you whether those words are next to a picture of a man, a woman, a beast, a plant, or whatever. If they can guess the image better than random, maybe they've got something. That presumes that the text and the images are correlated, of course. If the images were necessary for the training, then that changes things, of course, but that's not insurmountable. Their program just has to compete faivourably (and statistically significantly) against a naive program that is only told which words appear near which images, and uses Bayes to estimate probabilities of images appearing near the given test words.

--
Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves

Re:Use on UNDHC is irrelevant, other issues Re:Use on UNDHC is irrelevant, other issues (Score: 4, Interesting) by zocalo on Sunday January 28 2018, @03:48PM (2 children)

by zocalo (302) on Sunday January 28 2018, @03:48PM (#629476)

I guess that depends on what they meant by being "in Hebrew", since the manuscript obviously isn't actually written in the Hebrew script but one unique to the document. It could be that the AI suggested "Hebrew" based on the breakdown of symbols counts, which are adjacent to which, etc., but it could also be one level abstracted from that and be based more on sentence structure, e.g. if you were to give a compilation of Yoda's sayings to analyse then it *should* come back with Japanese, even though they were actually spoken in English. People have done such analysiys on the manuscript before and the general consensus seems to be that there's a method behind the madness and it's not just random gibberish (it might still be well structured gibberish though), so it may well be as simple as taking the sentence structure of one language, writing it down word for word in another, then using some kind of substitution cypher to turn it into the Voynich script.

Given the number of possible permutations though, I'm not holding out much hope that this analysis has got the right combination.

--
UNIX? They're not even circumcised! Savages!

Parent
- Re:Use on UNDHC is irrelevant, other issues (Score: 3, Funny) by requerdanos on Sunday January 28 2018, @05:00PM
  
  by requerdanos (5997) on Sunday January 28 2018, @05:00PM (#629503) Journal
  
  An "AI" to decode [The Voynich Manuscript | The Cosmic Microwave Background Radiation | Leetspeak | etc. ] ...
  # initialize if (exist: spaces) { delimiter=spaces; } else { delimiter=random(portion of input); } # process while more_document_exists { this_gibberish_word = next_word_until_delimiter(); this_translated_word = word_this_nonsense_is_mathematically_least_dissimilar_to(this_gibberish_word); add_to_output (this_translated_word) } #enjoy success
  I'd be surprised if the AI in TFA differs conceptually by much.
  
  Parent
- Re:Use on UNDHC is irrelevant, other issues (Score: 2) by FatPhil on Sunday January 28 2018, @09:52PM
  
  by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Sunday January 28 2018, @09:52PM (#629593) Homepage
  
  I was guessing that they came up with a "maybe the vowels are missing?" idea, and then just made a leap to "like hebrew", and ran with it.
  
  But just look at it as a tesselation of symbols, it bears no resemblence in dynamics to hebrew texts - I can't believe it has the same kind of statistical distribution. It looks like poetic latinate text (the amount of repetition would be unusual for prose).
  
  --
  Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
  
  Parent
Re:Use on UNDHC is irrelevant, other issues Re:Use on UNDHC is irrelevant, other issues (Score: 1) by tftp on Sunday January 28 2018, @06:30PM (1 child)

by tftp (806) on Sunday January 28 2018, @06:30PM (#629529) Homepage

I thought it's already solved. The plants are Mexican [newscientist.com], and the language is [a version of] nahuatl [voynichms.com], written in some old Spanish font.

Parent
- No, it was EPIC FAIL like every time before (Score: 0) by Anonymous Coward on Sunday January 28 2018, @07:49PM
  
  by Anonymous Coward on Sunday January 28 2018, @07:49PM (#629554)
  
  http://voynichms.com/what-the-experts-say/ [voynichms.com]
  
  Parent
Re:Use on UNDHC is irrelevant, other issues (Score: 2) by darkfeline on Tuesday January 30 2018, @04:34AM

by darkfeline (1030) on Tuesday January 30 2018, @04:34AM (#630189) Homepage

> 380 *known* languages, this is a different problem domain.
This is a different problem domain for a human, not necessarily so for an AI.
The thing that you have got to understand is that the way AI "thinks" is fundamentally different from humans. It's like the difference between proving a theorem using geometry and proving the same theorem using linear algebra. The strategy, approach, and difficulties are going to be completely different. That's why an AI might mistake a poodle for a car, but a human might mistake a jar for a human face. Both have weaknesses, they're just completely different weaknesses.

--
Join the SDF Public Access UNIX System today!

Parent

Ah, promising claims. To market in 10 years, etc. Ah, promising claims. To market in 10 years, etc. (Score: 2) by requerdanos on Sunday January 28 2018, @04:18PM (6 children)

by requerdanos (5997)

on Sunday January 28 2018, @04:18PM (#629485) Journal

Greg Kondrak, a computer scientist from University of Alberta's AI lab, claims to have begun decoding [the Voynich manuscript] with his novel algorithm

He wouldn't be the first, yet it remains untranslated.

It is believed that the manuscript is somehow related to women's health

Um, citation needed? believed by whom? I sort of think it's about better living through plant-based chemical supplements in association with woo-woo astrology thinking, just judging by flipping through the thing, and I am not aware of any more responsible opinion despite the low, low bar that sets.

People have made wild guesses regarding the code, with at least eight making firm claims – only to be debunked later on.

This is why claims are merely annoying background noise; a headline akin to "So-and-so finally decyphers Voynich Manuscript; It's about how Aliens started the History Channel" would be newsworthy.

But, AI==for nerds, I get it. :)

Re:Ah, promising claims. To market in 10 years, et Re:Ah, promising claims. To market in 10 years, et (Score: 0) by Anonymous Coward on Sunday January 28 2018, @04:29PM (2 children)

by Anonymous Coward on Sunday January 28 2018, @04:29PM (#629490)

and he'snt even done! don't we usually wait until pet projects like that net something?? or is there some cool code i missed?

Parent
- Re:Ah, promising claims. To market in 10 years, et Re:Ah, promising claims. To market in 10 years, et (Score: 3, Funny) by maxwell demon on Sunday January 28 2018, @04:40PM (1 child)
  
  by maxwell demon (1608) on Sunday January 28 2018, @04:40PM (#629494) Journal
  
  or is there some cool code i missed?
  Sure, the Voynich code. ;-)
  
  --
  The Tao of math: The numbers you can count are not the real numbers.
  
  Parent
  - Re:Ah, promising claims. To market in 10 years, et (Score: 2) by requerdanos on Sunday January 28 2018, @04:47PM
    
    by requerdanos (5997) on Sunday January 28 2018, @04:47PM (#629499) Journal
    
    Sure, the Voynich code. ;-)
    Well, even that is unproven and very much up in the air.
    It could well be "The Voynich set of pictures and cool handwriting-looking meaningless drawings."
    
    Parent
Re:Ah, promising claims. To market in 10 years, et (Score: 0) by Anonymous Coward on Sunday January 28 2018, @04:46PM

by Anonymous Coward on Sunday January 28 2018, @04:46PM (#629498)

Um, citation needed? believed by whom?
It just is. You don't understand!

Parent
Re:Ah, promising claims. To market in 10 years, et Re:Ah, promising claims. To market in 10 years, et (Score: 0) by Anonymous Coward on Sunday January 28 2018, @05:51PM (1 child)

by Anonymous Coward on Sunday January 28 2018, @05:51PM (#629517)

It is believed that the manuscript is somehow related to women's health
Um, citation needed? believed by whom?

Well, there's lots of images of naked women. So it's either about women's health or a tedious tome of porn.

Parent
- Re:Ah, promising claims. To market in 10 years, et (Score: 3, Insightful) by ledow on Sunday January 28 2018, @09:39PM
  
  by ledow (5567) on Sunday January 28 2018, @09:39PM (#629586) Homepage
  
  If you bought a catalogue of greek statues, it would also be have lots of images of naked women.
  Doesn't mean it has anything to do with women's health, or porn for that matter.
  Hey, it's almost like NOBODY knows what it's really about, isn't it?
  
  Parent

Ob XKCD Ob XKCD (Score: 3, Interesting) by theluggage on Sunday January 28 2018, @04:43PM (3 children)

by theluggage (1797) on Sunday January 28 2018, @04:43PM (#629497)

I thought this had already been settled [xkcd.com].

Re:Ob XKCD (Score: 2) by requerdanos on Sunday January 28 2018, @04:51PM

by requerdanos (5997) on Sunday January 28 2018, @04:51PM (#629500) Journal

I thought this had already been settled [xkcd 593, Voynich Manuscript as Dungeons and Dragons DM Guide].
That's as good a guess as any, and much more interesting (and just as well-founded) as the "believed... related to women's health" guess in TFS.

Parent
Re:Ob XKCD Re:Ob XKCD (Score: 3, Funny) by Gaaark on Sunday January 28 2018, @05:38PM (1 child)

by Gaaark (41) on Sunday January 28 2018, @05:38PM (#629513) Journal

Yes... i solved it: it is Harry Potter fan-fiction porn.

--
--- Please remind me if I haven't been civil to you: I'm channeling MDC. ---Gaaark 2.0 ---

Parent
- Re:Ob XKCD (Score: 5, Touché) by requerdanos on Sunday January 28 2018, @06:00PM
  
  by requerdanos (5997) on Sunday January 28 2018, @06:00PM (#629520) Journal
  
  Okay, earlier when I said "That's as good a guess as any," it turns out I was wrong.
  
  Parent

Same as ever Same as ever (Score: 4, Insightful) by Anonymous Coward on Sunday January 28 2018, @05:57PM (1 child)

by Anonymous Coward on Sunday January 28 2018, @05:57PM (#629519)

It is believed that the manuscript is somehow related to women's health but there is no solid clue

Women have always been hard to figure out.

Re:Same as ever (Score: 3, Funny) by Gaaark on Sunday January 28 2018, @09:10PM

by Gaaark (41) on Sunday January 28 2018, @09:10PM (#629574) Journal

+1 BIG TIME Insightful...married 32 years now and STILL haven't seen logical consistency in any way, lol.
I know, I know, women are never logical nor consistent....but man I loves her. :)
"That's what ise appreciates about you"
--Letterkenny guy I can't remember name of.

--
--- Please remind me if I haven't been civil to you: I'm channeling MDC. ---Gaaark 2.0 ---

Parent

Computer Assisted Apophenia (Score: 2, Insightful) by Anonymous Coward on Sunday January 28 2018, @09:15PM

by Anonymous Coward on Sunday January 28 2018, @09:15PM (#629578)

sigh

Hmm... I'm skeptical Hmm... I'm skeptical (Score: 5, Interesting) by AthanasiusKircher on Sunday January 28 2018, @09:40PM (4 children)

by AthanasiusKircher (5291) on Sunday January 28 2018, @09:40PM (#629587) Journal

Don't we all remember the many times before when someone has claimed to have deciphered this thing, often to proved to be a complete idiot once any random expert who knows anything about medieval languages and manuscripts looks at their "translation"? Didn't we learn anything even from the one just a few months ago? (Reported for example in Ars Technica [arstechnica.com], then conclusively debunked [arstechnica.com] immediately, once really ANY medieval expert looked at it and said, "Huh, nope -- that doesn't make any actual sense in Latin.")

But maybe this guy's different, right? Well, I tried finding more details than in TFA, and I happened upon this piece in the Independent [independent.co.uk], which gives a few. Things aren't looking promising. Here are a few excerpted details from the Independent article:

“It turned out that over 80 per cent of the words were in a Hebrew dictionary, but we didn’t know if they made sense together,” said Professor Kondrak.

Hmm... what about the other 20%? How many hits did they get if they tried one of the other 380 languages they used to train their algorithm? Is this just a kind of "p-hacking" kind of thing where we just find the closest hit and assume it must be a meaningful pattern? (Devil's advocate: if the text deals with esoteric topics like alchemy, mystical knowledge about obscure herbs and botanical stuff, it's likely a lot of words would have to be coined in Hebrew to deal with it. I've encountered this problem myself reading renaissance Latin texts, where they often have to make up new Latin words to describe obscure stuff.)

While they noted that none of their results, using any reference language, resulted in text they could describe as “correct”, the Hebrew output was most successful.

Hmm... sounds even more suspiciously like the "p-hacking" kind of thing.

The scientists approached fellow computer scientist and native Hebrew speaker Professor Moshe Koppel with samples of deciphered text. Taking the first line as an example, Professor Koppel confirmed that it was not a coherent sentence in Hebrew.

Okay, so they consulted an expert, who told them it wasn't coherent Hebrew. But I don't understand -- they showed an expert only one sentence? That's all they used as an evaluation here? How about translating at least a few pages and see whether any of it makes sense?

But surely, they listened to the expert and gave up. Oh no...

However, following tweaks to the spelling, the scientists used Google Translate to convert it into English, which read: “She made recommendations to the priest, man of the house and me and people.”

WHAA?!! They consulted an actual expert in Hebrew, who told them it didn't make sense. So, rather than accepting maybe the method had problems or maybe translating more and taking it to more experts, they just started randomly changing spellings and then stuck it into Google Translate??!!!

How much spelling tweaks did they need to do? Were they just of the sort that might reasonably be encountered in a renaissance manuscript (where spelling wasn't necessarily as standardized as today)? Or did they randomly start swapping letters where it didn't make sense? Or did the words constitute actual Hebrew words, but they just tweaked the grammar and morphology through spelling to make it work? Either way, this sounds like we could easily go from "potential statistical significance" to "absolute BS I found in random characters that I forced to make sense" pretty quickly.

But surely they took this new version back to an expert in Hebrew or some other expert to determine whether it would possibly make sense as an opening sentence in a treatise? You know, understanding conventions of the Hebrew language, the kind of rhetoric a treatise of this era might use? Well...

“It’s a kind of strange sentence to start a manuscript but it definitely makes sense,” said Professor Kondrak.

Oh.

So, you got a rough sort of "hit" on Hebrew as pattern-matching a little better than other languages, then you translated only one sentence and took it to an expert, who told you it didn't make sense. Then, undeterred, you started tweaking the letters and changing spelling until Google Translate managed to come up with something that sounded vaguely like an English sentence. And then you just decided, "Yeah, I guess it 'definitely' makes sense..."

Okay. Thanks, Prof. Kondrak. Can you come back when you've successfully "deciphered" more than a single sentence and consulted an expert who actually knows the language you're talking about and its history and declared your method came up with something that made sense??

Too late -- numerous news stories are already claiming you successfully "decoded" it.

By the way, I tracked down the actual technical article [transacl.org] and skimmed it too. There's a histogram on p. 83 that shows Hebrew as an "outlier" which is why they chose it. But it's not like it's an extreme outlier in the 280 languages they tested. Also, they didn't address whether there might be linguistic features of Hebrew that make it statistically predisposed to be a better match. For example, Hebrew often doesn't record vowels, which can create more ambiguity in potential words. Does that predispose it to be a more likely match to random data using their algorithm? They talk a bit about the vowel issue, but don't address this concern directly (at least that I could find in skimming the article). They do note the second-best match is Esperanto, which is obviously rejected because it was a made up language of recent invention -- but there they note the regularity of the language makes it more likely to be a high match.

All in all, I'm really not convinced. The fact that Hebrew was the best match according to a couple different algorithms is potentially interesting, but I'm not sure how this is getting all this attention for "deciphering" anything when it sounds like the only evidence they offered was ONE BLOODY SENTENCE that was declared incorrect by an expert and that they only managed to translate by swapping out letters and plugging it into Google Translate.

...

Oh, and by the way, I should know about BS claims about translation. I owned the darn Voynich manuscript myself 350 years ago. I claimed to translate Egyptian hieroglyphics [wikipedia.org] even though I couldn't -- you can still see my faulty Latin translations of an Egyptian obelisk [wikipedia.org] in Piazza Navona in Rome. And I even claimed to translate stuff INTO Egyptian hieroglyphs, even though I had no idea what I was doing. (Actually, back then I maybe had a little too much of the communion wine and claimed I was "divinely inspired" to understand the Egyptian language and to do these "translations.")

Re:Hmm... I'm skeptical Re:Hmm... I'm skeptical (Score: 4, Interesting) by AthanasiusKircher on Sunday January 28 2018, @10:52PM (2 children)

by AthanasiusKircher (5291) on Sunday January 28 2018, @10:52PM (#629621) Journal

Just a couple other quick comments after reading the actual technical article more in-depth:
(1) They did translate more than a sentence, but it's not clear how much. They said the data is "noisy." Other than the opening sentence, the only bit of data they offer is that a 72-word section assumed to be about herbs contains the Hebrew words "narrow," "farmer," "light," "air," and "fire," which they assume are vocabulary words likely to occur in a section on herbs. No other detailed evidence of translation is offered other than the opening sentence and these five words.
(2) On p. 85 of the technical article, there's another histogram showing that if you drop vowels in other languages (as they do in Hebrew), Latin, Italian, and English all have matches for vocabulary in the 75-85% range, similar to Hebrew. (Latin appears to be even a slightly better match than Hebrew if you drop the vowels.) I don't know how many other languages they tested like this (if any), but that already begins to make the "more than 80% Hebrew words" thing sound like it's not necessarily a significant claim. And again, this leads me to wonder how much of the "Hebrew match" is just a statistical artifact due to linguistic features of Hebrew.

Parent
- Re:Hmm... I'm skeptical Re:Hmm... I'm skeptical (Score: 3, Interesting) by The Archon V2.0 on Monday January 29 2018, @03:00PM (1 child)
  
  by The Archon V2.0 (3887) on Monday January 29 2018, @03:00PM (#629818)
  
  > On p. 85 of the technical article, there's another histogram showing that if you drop vowels in other languages (as they do in Hebrew), Latin, Italian, and English all have matches for vocabulary in the 75-85% range, similar to Hebrew.
  So the biggest takeaway from this is that the script might be an abjad instead of an alphabet? Hasn't that hypothesis been floating around for a decade (likely more) already?
  
  Parent
  - Re:Hmm... I'm skeptical (Score: 3, Interesting) by AthanasiusKircher on Thursday February 01 2018, @09:07PM
    
    by AthanasiusKircher (5291) on Thursday February 01 2018, @09:07PM (#631692) Journal
    
    Sorry for the late reply, but I'm not sure that's the takeaway. Dropping the vowels makes it easier to match multiple words to a single set of symbols, thereby increasing the apparent "match" stats for a test like they did here.
    My point is that it's likely the "high match" percentage is just due to that basic statistical fact, i.e., that an abjad is likely to have a higher "hit rate" just due to random coincidence. And since Hebrew is frequently written as an abjad, perhaps that's one of the only reasons Hebrew was ranked higher in their algorithm. (Though I'm hoping they actually realized this, since it's a pretty basic feature likely to influence the stats. I'm hoping they did take that into account and that Hebrew still stood above other languages... though it's not clear from what I read that the difference is statistically big enough to justify their confidence that Hebrew is actually the language "encoded" in the manuscript.)
    
    Parent
Re:Hmm... I'm skeptical (Score: 2) by requerdanos on Monday January 29 2018, @01:27AM

by requerdanos (5997) on Monday January 29 2018, @01:27AM (#629664) Journal

a kind of "p-hacking" kind of thing where we just find the closest hit and assume it must be a meaningful pattern
That's kind of a long name for the program. Maybe they will eventually call it "Phacker" for short (pronounced "fack' er").

Parent

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In