Stories
Slash Boxes
Comments

SoylentNews is people

posted by azrael on Tuesday August 05 2014, @01:08AM   Printer-friendly
from the ☯-♤-Ω--♤-♧-â& dept.

NPR is reporting that Google is working on a new font that will support every known written language.

Google has taken on its fair share of ambitious projects--digitizing millions and millions of books, mapping the whole world, pioneering self-driving cars. It's a company that doesn't shy away from grand plans.

But one recent effort, despite its rather lofty scope, has escaped much notice. The company is working on a font that aims to include "all the world's languages"--every written language on Earth.

"Tofu" is what the pros call those tiny, empty rectangles that show up when a script isn't supported. This is where Google's new font family, "Noto," gets its name: "No Tofu."

Right now, Noto includes a wide breadth of language scripts from all around the world--specifically, 100 scripts with 100,000 characters. That includes over 600 written languages, says Jungshik Shin, an engineer on Google's text and font team. The first fonts were released in 2012. But this month, Google (in partnership with Adobe) has released a new set of Chinese-Japanese-Korean fonts--the latest in their effort to make the Internet more inclusive.

But as with any product intended to be universal, the implementation gets complicated and not everyone for whom the product is intended is happy.

More information about Noto can be found here and here.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 4, Interesting) by The Mighty Buzzard on Tuesday August 05 2014, @01:24AM

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Tuesday August 05 2014, @01:24AM (#77428) Homepage Journal

    Damned shame there aren't monospace versions. I can't deal with variable-spaced fonts for coding or IRC.

    --
    My rights don't end where your fear begins.
    • (Score: 3, Funny) by Subsentient on Tuesday August 05 2014, @05:58AM

      by Subsentient (1111) on Tuesday August 05 2014, @05:58AM (#77491) Homepage Journal

      I wish everything was monospace.

      --
      "It is no measure of health to be well adjusted to a profoundly sick society." -Jiddu Krishnamurti
      • (Score: 1, Interesting) by Anonymous Coward on Tuesday August 05 2014, @04:45PM

        by Anonymous Coward on Tuesday August 05 2014, @04:45PM (#77650)

        can't we get rid of non-proportional text now... it was ok in the 80's but please...

        and I don't have ascii-art in my console very often so, anyone knows a terminal program that handles proportional font correctly?

  • (Score: 3, Interesting) by gman003 on Tuesday August 05 2014, @01:36AM

    by gman003 (4155) on Tuesday August 05 2014, @01:36AM (#77429)

    Explicit support is listed for Esperanto. Bone farita! It's not the hardest language to support, as long as you have circumflex variants for s, k, g, c, j (and h if you use an older style) and maybe a u with breve if you're really anal about it. All of those are things needed for other languages as well. But it's rare enough (or difficult enough) that most users have to just deal with weird approximations like c^i tio.

    Anyways, glad to see some recognition for Esperanto, even if it's just a font's supported languages list.

    • (Score: 2) by Geotti on Tuesday August 05 2014, @02:58AM

      by Geotti (1146) on Tuesday August 05 2014, @02:58AM (#77446) Journal

      weird approximations like c^i

      Why would you want to raise c to the power of i?
      So, you could just write it as e^(0+i log(c)), or cos(log(c)) + i sin(log(c)). But what for? Are you on to something and would like to let the world know?

    • (Score: 3, Funny) by Tork on Tuesday August 05 2014, @04:08AM

      by Tork (3914) Subscriber Badge on Tuesday August 05 2014, @04:08AM (#77469)
      Bonvolu alsendi la pordiston? Lausajne estas rano en mia bideo.
      --
      🏳️‍🌈 Proud Ally 🏳️‍🌈
      • (Score: 1) by gidds on Tuesday August 05 2014, @01:07PM

        by gidds (589) on Tuesday August 05 2014, @01:07PM (#77576)

        [notices parent is in Foreign™]

        [ignores parent]

        [...]

        [memory stirs]

        [reads parent]

        [laughs like a drain]

        Somebody mod the parent up [h2g2.com] please!

        --
        [sig redacted]
      • (Score: 2) by gman003 on Tuesday August 05 2014, @01:27PM

        by gman003 (4155) on Tuesday August 05 2014, @01:27PM (#77587)

        Kio rano en bideo estas farantas? La dorsfrapo?

        • (Score: 2) by gman003 on Wednesday August 06 2014, @03:44AM

          by gman003 (4155) on Wednesday August 06 2014, @03:44AM (#77891)

          For those without hope, that meant "What is a frog doing in your bidet? The backstroke?".

          Hopefully that's a proper construction; I could not find an official translation for "backstroke" so I had to construct one. I think I messed up some grammar too, but I'm quite out of practice.

    • (Score: 1, Troll) by frojack on Tuesday August 05 2014, @04:39AM

      by frojack (1554) on Tuesday August 05 2014, @04:39AM (#77475) Journal

      Couldn't care less about Esperanto, or the next pointless font that comes down the pike. Font designers all came to us on the B Ark from Golgafrincham anyway.

      The world needs fewer characters, not more, fewer languages, not more.
      I don't know how you get there, or who's ox gets gored along the way, but resuming work on the tower of babel just because just because you got a new load of digital stones is pretty counterproductive in my view.

      My lawn, son! Your're matting my lawn.

      --
      No, you are mistaken. I've always had this sig.
      • (Score: 1) by tonyPick on Tuesday August 05 2014, @07:43AM

        by tonyPick (1237) on Tuesday August 05 2014, @07:43AM (#77508) Homepage Journal

        Certainly the world needs a lot less encoding options for visually identical character representations leading to annoyingly complex normalisation rules from strings which "Look the same, but aren't, but maybe ought to be". http://unicode.org/reports/tr15/ [unicode.org] (and http://blog.golang.org/normalization [golang.org] for an idea of what this means in practice)

        Not a google problem, but more a general unicode thing. (/vents)

  • (Score: 2) by kaszz on Tuesday August 05 2014, @01:42AM

    by kaszz (4211) on Tuesday August 05 2014, @01:42AM (#77431) Journal

    UTF-17 characters could make up a 17-byte block with 8 characters. For a selection of 131 072 characters. But why waste like 781 kByte of graphics space on a mega-font where many people will only be able to read 320 ppm of that anyway..?

    By having blocks for characters you don't know anyway. It's easier to find the useful information.

    • (Score: 4, Informative) by c0lo on Tuesday August 05 2014, @03:00AM

      by c0lo (156) Subscriber Badge on Tuesday August 05 2014, @03:00AM (#77447) Journal

      UTF-17 characters could make up a 17-byte block with 8 characters. For a selection of 131 072 characters.

      What exactly is that "131 072" magic number?

      (as of June 2014, there are 252,603 characters [blogspot.com.au] assigned by the UNICODE consortium)

      --
      https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
      • (Score: 1, Interesting) by Anonymous Coward on Tuesday August 05 2014, @08:10AM

        by Anonymous Coward on Tuesday August 05 2014, @08:10AM (#77512)

        What exactly is that "131 072" magic number?

        It's two to the power of 17.

        However I wonder more about this strange UTF-17. As far as I know, there are approximately zero machines with 17-bit words out there. So why would anyone want to use an encoding that uses 17 bit units?

  • (Score: 3, Funny) by cafebabe on Tuesday August 05 2014, @01:48AM

    by cafebabe (894) on Tuesday August 05 2014, @01:48AM (#77434) Journal

    Let me know when fontzilla supports Rongorongo [wikipedia.org].

    --
    1702845791×2
    • (Score: 2) by c0lo on Tuesday August 05 2014, @02:48AM

      by c0lo (156) Subscriber Badge on Tuesday August 05 2014, @02:48AM (#77443) Journal
      Let me know when rongorongo is deciphered [wikipedia.org]; until then it would be useless for the UNICODE consortium to assign characters to their glyphs.
      --
      https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
      • (Score: 3, Insightful) by cafebabe on Tuesday August 05 2014, @03:19AM

        by cafebabe (894) on Tuesday August 05 2014, @03:19AM (#77454) Journal

        Rongorongo is already defined in Unicode despite it being unpronounceable and undeciphered.

        --
        1702845791×2
        • (Score: 2) by c0lo on Tuesday August 05 2014, @03:41AM

          by c0lo (156) Subscriber Badge on Tuesday August 05 2014, @03:41AM (#77462) Journal
          Uh? Seriously? I'll be grateful for a link.
          --
          https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
          • (Score: 2) by cafebabe on Tuesday August 05 2014, @04:09AM

            by cafebabe (894) on Tuesday August 05 2014, @04:09AM (#77470) Journal
            • (Score: 2) by c0lo on Tuesday August 05 2014, @04:50AM

              by c0lo (156) Subscriber Badge on Tuesday August 05 2014, @04:50AM (#77477) Journal
              "Tentatively included" and "actionable proposal has not been written".
              True, that still correct my assumption that only deciphered glyphs are considered; thanks.
              --
              https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
              • (Score: 4, Informative) by KritonK on Tuesday August 05 2014, @11:48AM

                by KritonK (465) on Tuesday August 05 2014, @11:48AM (#77554)

                The hieroglyphs of the disc of Phaistos [wikipedia.org] have been encoded [wikipedia.org] in Unicode since 2008. As the inscription on the disk has not been deciphered, this would indicate that not only deciphered scripts are considered. I would think that a script is added to Unicode if people need to use a it, whether it is to write in it, study it, or attempt to decipher it. People may be unable to write in Rongorongo or the script of the disc of Phaistos, but they do study these scripts.

  • (Score: 5, Informative) by forsythe on Tuesday August 05 2014, @01:51AM

    by forsythe (831) on Tuesday August 05 2014, @01:51AM (#77435)

    So, like GNU Unifont [unifoundry.com] then?

  • (Score: 0) by Anonymous Coward on Tuesday August 05 2014, @02:11AM

    by Anonymous Coward on Tuesday August 05 2014, @02:11AM (#77437)

    and then tell your mom, girlfriend, police, or whomever pays a penny

    !(*$

  • (Score: 1, Interesting) by Anonymous Coward on Tuesday August 05 2014, @03:02AM

    by Anonymous Coward on Tuesday August 05 2014, @03:02AM (#77449)

    Can't they use multiple existing typefaces? Stylistic reasons? Seems like they'd end up messing things up stylistically anyway.

    I doubt many people would notice or care if the typefaces of arabic, english, old mongolian, tamil, hindi, thai, khmer, japanese, sumerian cuneiform, egyptian hieroglyphs, nushu, naxi, voynich, etc are different stylistically, as long as they're not too ugly and not too different in "character" size (where "character size" makes sense ;) ).

    • (Score: 0) by Anonymous Coward on Friday September 12 2014, @04:18PM

      by Anonymous Coward on Friday September 12 2014, @04:18PM (#92461)

      NL3s7A zeczgzygpfqc [zeczgzygpfqc.com], [url=http://bwwsmgjitbus.com/]bwwsmgjitbus[/url], [link=http://uxscwubvbwog.com/]uxscwubvbwog[/link], http://oyiawhnhrvdw.com/ [oyiawhnhrvdw.com]

  • (Score: 1) by richtopia on Tuesday August 05 2014, @04:33AM

    by richtopia (3160) on Tuesday August 05 2014, @04:33AM (#77473) Homepage Journal

    Webdings. Nuf said.

  • (Score: 0) by Anonymous Coward on Tuesday August 05 2014, @05:16AM

    by Anonymous Coward on Tuesday August 05 2014, @05:16AM (#77479)

    They better support our language, or we will raid them together. Qapla'!

    • (Score: 2) by mendax on Tuesday August 05 2014, @03:33PM

      by mendax (2840) on Tuesday August 05 2014, @03:33PM (#77627)

      HIja'! qatlh vaj jang maH ghaH ghItlh tera'ngan petaq? (Indeed! Why should we reply upon the writing of the human petaQ?)

      Then Google can finally one-upBing's Klingon translator [bing.com] and properly render human and Klingon text uniformly.

      --
      It's really quite a simple choice: Life, Death, or Los Angeles.
  • (Score: 5, Interesting) by mojo chan on Tuesday August 05 2014, @07:28AM

    by mojo chan (266) on Tuesday August 05 2014, @07:28AM (#77506)

    You can't have a single font that does all human languages thanks to flaws in Unicode. For example, they tried to merge Chinese, Japanese and Korean (CJK) characters that are written slightly differently in each language. They are essentially the same character but need to be rendered differently depending on language and can sometimes have slightly different meanings. The worst part is that there is no way to determine which language text is in automatically, and sometimes you can only tell from the wider context. Documents with mixed languages, like a Japanese textbook for Chinese speakers, are impossible to store as Unicode.

    Thus it is also practically impossible to create a font for all languages. All the major font formats and operating systems use the broken Unicode architecture with hacks to decide which language's font to render CJK in. If Windows is set to Japanese and the user opens a Chinese Unicode text file it gets rendered in the wrong font.

    At this point is is going to be extremely hard to fix this problem, or to move away from Unicode.

    --
    const int one = 65536; (Silvermoon, Texture.cs)
    • (Score: 2, Interesting) by Anonymous Coward on Tuesday August 05 2014, @08:26AM

      by Anonymous Coward on Tuesday August 05 2014, @08:26AM (#77516)

      Documents with mixed languages, like a Japanese textbook for Chinese speakers, are impossible to store as Unicode.

      You may have heard of the concept of using several fonts in the same document. So just typeset the Japanese parts of the book in a Japanese font, and the Chinese parts in a Chinese font.

      There's absolute no reason all the information must be contained in the character codes.

      And even if you insist on having all the information in the character codes, you can just put modifiers in the private use space. Just like you can have a standard modifier for "switch to right-to-left mode", you also can have a modifier for "switch to Chinese character mode".

      However one might consider adding standard "language switch" codes to Unicode, since the language also affects things like proper capitalization (the upper case of "i" is "I" in English, but "İ" in Turkish, where the lower case of "I" is "ı").

      • (Score: 2) by mojo chan on Wednesday August 06 2014, @07:22AM

        by mojo chan (266) on Wednesday August 06 2014, @07:22AM (#77928)

        The problem with multiple fonts is that plain Unicode text has no support for them. You need a document format with metadata.

        Take as an example the tags on a MP3 file. There are some Japanese artists who sing in Japanese, Chinese and Korean on the same album. Given that the tags don't support fonts and even if they did how many MP3 players would also have support, you can see how it is actually impossible to accurately describe their work in the file's metadata. Same with the file name.

        I agree that modifiers would help, but unfortunately the Unicode standards body refuses to allow them. It would be a hack anyway; the best thing would be to abandon the current characters and assign a whole set of new ones for each language. It would be a huge pain for all involved but I can't see any other way of fixing Unicode at this point.

        --
        const int one = 65536; (Silvermoon, Texture.cs)
    • (Score: 2) by WillAdams on Tuesday August 05 2014, @12:52PM

      by WillAdams (1424) on Tuesday August 05 2014, @12:52PM (#77571)

      One can work around this by using stylistic alternates and stating that the Chinese versions are SALT01, Japanese, SALT02 and Korean, SALT03, so one can do this in a font, and in a document, but one has to have a layer of markup above Unicode.

    • (Score: 2) by darkfeline on Tuesday August 05 2014, @09:18PM

      by darkfeline (1030) on Tuesday August 05 2014, @09:18PM (#77755) Homepage

      Part of the problem is that those languages aren't standardized. Many characters have multiple written forms, even within the same language, due to historical reasons. For example, Simplified and Traditional Chinese. You can't possibly assign a code point to each and every variation.

      --
      Join the SDF Public Access UNIX System today!
      • (Score: 2) by mojo chan on Wednesday August 06 2014, @04:48PM

        by mojo chan (266) on Wednesday August 06 2014, @04:48PM (#78095)

        Unicode already includes many code points for character variations.

        --
        const int one = 65536; (Silvermoon, Texture.cs)
  • (Score: 3, Informative) by WillAdams on Tuesday August 05 2014, @01:01PM

    by WillAdams (1424) on Tuesday August 05 2014, @01:01PM (#77574)

    For those who've forgotten Bitstream's seminal role in early computer font history:

    http://en.wikipedia.org/wiki/Bitstream_Cyberbit [wikipedia.org]