Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Friday September 06 2019, @03:35PM   Printer-friendly
from the intelligent-content-might-be-significantly-lower dept.

Human speech may have a universal transmission rate: 39 bits per second

Italians are some of the fastest speakers on the planet, chattering at up to nine syllables per second. Many Germans, on the other hand, are slow enunciators, delivering five to six syllables in the same amount of time. Yet in any given minute, Italians and Germans convey roughly the same amount of information, according to a new study. Indeed, no matter how fast or slowly languages are spoken, they tend to transmit information at about the same rate: 39 bits per second, about twice the speed of Morse code.

"This is pretty solid stuff," says Bart de Boer, an evolutionary linguist who studies speech production at the Free University of Brussels, but was not involved in the work. Language lovers have long suspected that information-heavy languages—those that pack more information about tense, gender, and speaker into smaller units, for example—move slowly to make up for their density of information, he says, whereas information-light languages such as Italian can gallop along at a much faster pace. But until now, no one had the data to prove it.

Scientists started with written texts from 17 languages, including English, Italian, Japanese, and Vietnamese. They calculated the information density of each language in bits—the same unit that describes how quickly your cellphone, laptop, or computer modem transmits information. They found that Japanese, which has only 643 syllables, had an information density of about 5 bits per syllable, whereas English, with its 6949 syllables, had a density of just over 7 bits per syllable. Vietnamese, with its complex system of six tones (each of which can further differentiate a syllable), topped the charts at 8 bits per syllable.

Different languages, similar encoding efficiency: Comparable information rates across the human communicative niche

From the Abstract:

"Language is universal, but it has few indisputably universal characteristics, with cross-linguistic variation being the norm. For example, languages differ greatly in the number of syllables they allow, resulting in large variation in the Shannon information per syllable. Nevertheless, all natural languages allow their speakers to efficiently encode and transmit information. We show here, using quantitative methods on a large cross-linguistic corpus of 17 languages, that the coupling between language-level (information per syllable) and speaker-level (speech rate) properties results in languages encoding similar information rates (~39 bits/s) despite wide differences in each property individually: Languages are more similar in information rates than in Shannon information or speech rate. These findings highlight the intimate feedback loops between languages' structural properties and their speakers' neurocognition and biology under communicative pressures. Thus, language is the product of a multiscale communicative niche construction process at the intersection of biology, environment, and culture."


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by canopic jug on Saturday September 07 2019, @03:28AM (5 children)

    by canopic jug (3949) Subscriber Badge on Saturday September 07 2019, @03:28AM (#890819) Journal

    It would be interesting to see a human language designed for optimal information flow. I wonder how much junk we have in grammar could be eliminated.

    I'm not sure Esperanto would fall into the category of efficient. It is more like a simplified version of Spanish or Italian. The article places Spanish and English at about the same bit rate. Since Spanish is usually spoken much faster than English, from what I have heard, it would follow that it actually has a lower bit rate. Presumably Esperanto would be in the same situation. You do raise a good question about information-dense grammer. So I would wonder more about Ithkuil than about Esperanto. However, even natual language have not been heavily studied in this area yet.

    --
    Money is not free speech. Elections should not be auctions.
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by hemocyanin on Saturday September 07 2019, @07:36AM (4 children)

    by hemocyanin (186) on Saturday September 07 2019, @07:36AM (#890875) Journal

    What I was thinking about was waste. In some languages things have genders which effect lots of other words in the sentence. Waste. Even in English, where we can use "the" instead of a gendered alternative, we don't really need to most of the time. Also in English, we waste a lot of effort on verb conjugation because we have so many exceptions to standard forms. Another waste of cognitive effort. A simple example:

    Joe ran in the forest.

    We could lose "the" here -- it really doesn't specify anything at all -- there are many forests in the world and so without actually naming one, or having one named in the context previously, it's a useless word. Then there is "ran". Now if Joe "walked" in the forest, the grammar nazis are cool, but do not say "Joe runned in the forest -- that's wrong. Why? Then there are those pesky double consonants which we could eliminate just by using the a "u" or "-" over "run" -- with the u-shaped bit, it sounds like a person exercising run. With a line over the top, it sounds like "rune" -- the norse characters. Or eliminate one of the pronunciation symbols and presume that one unless told otherwise. So we get, prsuming the short vowel sound since the symbol is absent:

    Joe runed in forest.

    Meh -- I'm no linguist. I'm just thinking that language has a lot of junk -- a lot of frills so people can sound pretty.

           

    • (Score: 3, Interesting) by deimtee on Saturday September 07 2019, @08:18AM (1 child)

      by deimtee (3272) on Saturday September 07 2019, @08:18AM (#890888) Journal

      Something about gendered nouns I hadn't considered until it was brought up here a few weeks ago is that they can be used to disambiguate pronouns. I don't know if that increase in efficiency compensates for the extra overhead but it is something to take into account.
      Consider: "The rock fell on the car because it was in the wrong place" - Which was in the wrong place?
      Contrast: "Le rock fell on la car because he was in the wrong place" - you know it was the rock in the wrong place.

      --
      If you cough while drinking cheap red wine it really cleans out your sinuses.
    • (Score: 3, Interesting) by maxwell demon on Saturday September 07 2019, @09:29AM

      by maxwell demon (1608) on Saturday September 07 2019, @09:29AM (#890906) Journal

      Actually in speaking, there are many places where unnecessary stuff is shortened out. Like "it's" for "it is" in English. Actually your example sentence reminded me about that:

      Joe ran in the forest.

      In English, that can't be shortened, but in German, the most common way to say this indeed includes a shortened form:

      Joe rannte im Wald.

      Here "im" is short for "in dem" (in the). But nodoby would say "in dem Wald" here, unless they want to stress "dem" (which makes it essentially "Jo ran in that forest").

      On the other hand, note that some redundancy is essential in spoken language, since it has to work also in presence of some noise. Without any redundancy, the danger of misunderstandings would be too large.

      Also note that in cases where maximum efficiency is required, people won't form complete sentences anyway. If the house is burning, you wouldn't shout: "The house is burning, therefore you should better leave it now." You'll shout: "Fire! Get out!"

      --
      The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by Rupert Pupnick on Saturday September 07 2019, @12:45PM

      by Rupert Pupnick (7277) on Saturday September 07 2019, @12:45PM (#890945) Journal

      Calling it junk seems a little strong to me, but I think it is fair to ask how much value is there in all of the grammatical superstructure that permeates most languages. Again, I think of it as something like an error detection system, but how good is it really? The best you can hope for is that the listener asks the speaker to repeat because what was heard didn’t make sense. In many cases, though, you can commit grievous errors of grammar, but still get the message across.

      Presumably all grammars came about from cultural/evolutionary pressures over long periods of time, but I don’t understand how.