Stories
Slash Boxes
Comments

SoylentNews is people

posted by cmn32480 on Monday June 06 2016, @04:14AM   Printer-friendly
from the bacon++ dept.

The Register is reporting the upcoming release of Unicode 9.0:

On 21 June, the world will become a slightly more agreeable place with the release of Unicode 9.0 - not because the standard will offer "Arabic characters to support Bravanese and Warsh, which are used in North and West Africa, along with Pakistani Quranic marks" and "significant updates to segmentation algorithms", but rather for the inclusion of a bacon emoji.

For the curious, emojipedia has offered their renditions of these for your perusal.

Thanks to the work of TheMightyBuzzard, SoylentNews supports UTF-8 character encoding. This would be a good time to consider finding an updated Unicode font to embrace these additions.

Though it may seem like we are falling back to an age of Egyptian hieroglyphics, there are some more prosaic changes as well. The Unicode 9.0 Summary follows.

[Continues...]

A. Summary

Unicode 9.0 adds exactly 7,500 characters, for a total of 128,172 characters. These additions include six new scripts and 72 new emoji characters.

Notable character additions include the following:

  • Osage script to support the Native American language, Osage
  • Adlam script to support Fulani and other African languages
  • Newa script to support the Nepal Bhasa language of Nepal
  • Tangut script, a major historic script of China
  • Arabic characters to support Bravanese and Warsh, which are used in North and West Africa, along with Pakistani Quranic marks
  • Emoji characters, including 22 new smilies and people,14 for animals and nature, and 18 for food and drink
  • Symbols to support the new 4K TV standard

Other important updates in Unicode Version 9.0 include:

  • Significant updates to segmentation algorithms
  • Improvements in the charts for the Mongolian script

Synchronization

Two other important Unicode specifications are maintained in synchrony with the Unicode Standard, and include updates for the repertoire additions made in Version 9.0, as well as other modifications:

This version of the Unicode Standard is synchronized with 10646:2015, fourth edition, plus Amd. 1 and Amd. 2, and 273 characters from forthcoming 10646, fifth edition.

See Sections D through H below for additional details regarding the changes in this version of the Unicode Standard, its associated annexes, and the other synchronized Unicode specifications.


Original Submission

Related Stories

Google CEO Drops Everything to Fix Cheeseburger Emoji 47 comments

The cheese on Google's version of the cheeseburger emoji is in the WRONG PLACE and that is problematic:

Responding to criticism about the placement of cheese on Google's version of the cheeseburger emoji, Google CEO Sundar Pichai said that he would take a look at the issue immediately. "Will drop everything else we are doing and address on Monday :) if folks can agree on the correct way to do this!" Pichai tweeted.

Pichai was responding to author Thomas Baekdal, who pointed out the difference in cheese placement between Apple's and Google's cheeseburger emojis. "I think we need to have a discussion about how Google's burger emoji is placing the cheese underneath the burger, which Apple puts on top," Baekdal tweeted.

The tweet ignited a debate about where the different ingredients of a cheeseburger belong. Among all the different cheeseburger emoji variants offered by various tech companies, Google's is the only version to place the cheese below the meat, according to images of cheeseburger emojis from Apple, Google, Samsung, Facebook and others, as seen on Emojipedia. It's generally accepted that cheeseburger cheese should be placed directly on the meat patty for optimal melting.

🍔🍕🍖🍗🍟🍩 🏃💨 🇺🇸 💩🚽

Unicode 11 emoji candidates, scheduled for June 2018.

Also at Brisbane Times and New Zealand Herald.

Previously: Tweet Emoji 4 Pizza: #Epitome of #Convenience
38 New Emojis to be Introduced in 2016
Unicode Considering 67 New Emoji for 2016
Unicode 9.0 Serves up Bacon Emoji, 71 others, and Six New Scripts
Apple Urged to Rethink Gun Emoji Change
Unicode 10.0's New Emojis
Apple's New iPhone X will let You Control the Poo Emoji with Your Face


Original Submission

Unicode Consortium Adding 230 New Emojis in Emoji 12.0 64 comments

Emoji 12.0 brings us waffles, more diversity, suggestive "finger pinch" glyph

There's a push for more diversity with this new emoji release. We have emojis for deaf people in three genders (male, female, and genderless) and five skin tones, an ear with a hearing aid, people in motorized and unmotorized wheelchairs, prosthetic arms and legs, a guide dog and a service dog, and people with a probing cane. There are actually only 59 distinct new emoji types in this release, but everything that depicts a human comes in five skin tones and three genders, which pumps up the numbers. You can really see this with the "People holding hands" emoji, which is completely configurable for a total of 70 possible combinations.

The emoji that's causing the most buzz is "pinching hand." Emojipedia's example shows a thumb and pointer finger with a small distance between them, which could also be interpreted as a hand signal for "small." People are already coming up with, uh, "suggestive" uses for such a glyph, and if the actual implementations follow Emojipedia's design, the glyph could end up on the naughty list next to peach and eggplant.

Thank you, Emojesus. ✝

By the way, what happened to calling it Unicode 12.0? Maybe they'll call it that in June.

Unicode Consortium blog post. Also at Emojipedia and 9to5Mac.

Previously: 38 New Emojis to be Introduced in 2016
Unicode Considering 67 New Emoji for 2016
Unicode 9.0 Serves up Bacon Emoji, 71 others, and Six New Scripts
Unicode 10.0's New Emojis
Stink Over Frowning Poo Emoji at the Unicode Consortium

Related: Apple's New iPhone X will let You Control the Poo Emoji with Your Face
Google CEO Drops Everything to Fix Cheeseburger Emoji
Microsoft Briefly Left Holding the Gun Emoji
Battle of the Bagel Emoji


Original Submission

Google Unveils 53 Gender Fluid Emoji 195 comments

Exclusive: Google releases 53 gender fluid emoji

[As emojis] become more inclusive, each becomes less universal. Jennifer Daniel, designer at Google, thinks about this deep irony at the heart of visual language all the time. She traces it back to the age-old problem with the male bathroom symbol. "That person could be man, woman, anyone," she says. "But they had to add a little detail, that dress, and suddenly that person symbol doesn't mean person anymore; it means man. And that culture means a man-centered culture."

While Daniel can't fix our bathroom signage, as the director of Android emojis, she can fix another problem: The lack of gender-neutral symbols in texting. She can give us the zombies, merpeople, children, weightlifters that are neither male nor female. "We're not calling this the non-binary character, the third gender, or an asexual emoji–and not gender neutral. Gender neutral is what you call pants," says Daniel. "But you can create something that feels more inclusive."

Google is launching 53 updated, gender ambiguous emoji as part of a beta release for Pixel smartphones this week (they'll come to all Android Q phones later this year). Whether Google calls them "non-binary" or not, they have been designed to live between the existing male and female emoji and recognize gender as a spectrum. Given that Google collaborates with many of its rivals on emoji, it's likely that Apple and others will release their takes on genderless emoji later this year.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 1, Interesting) by Anonymous Coward on Monday June 06 2016, @04:35AM

    by Anonymous Coward on Monday June 06 2016, @04:35AM (#355731)

    I'm a unicode dummy. Could someone explain what is meant by "Symbols to support the new 4K TV standard"? Is this for a bitmap-based unicode reference font? I thought that the glyphs would be rendered as vector graphics and would scale to any resolution, including 4K.

    • (Score: 3, Informative) by takyon on Monday June 06 2016, @04:42AM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday June 06 2016, @04:42AM (#355736) Journal

      See the linked PDF. It's a symbol that is like a label that describes the content. I'm guessing you have seen at least the symbols with "4K" and "5.1" in a rectangle. Perhaps printed on a 4K TV box or Blu-ray package, on a "Viewer Discretion Advised" splash screen before a TV show, or in the UI of an online video player.

      In the PDFs [unicode.org], the new symbols are highlighted in yellow.

      This page links to all the new characters: http://www.unicode.org/charts/PDF/Unicode-9.0/ [unicode.org]

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 0) by Anonymous Coward on Monday June 06 2016, @06:16AM

        by Anonymous Coward on Monday June 06 2016, @06:16AM (#355757)

        Thanks for the link. That makes sense now. I can't say I've ever seen those symbols. Maybe I have, but they don't seem like something that would stand out. Then again, I don't get much exposure to that stuff. I didn't even know that 4K TVs existed until I saw mention of them in a discussion on the other site last week.

  • (Score: 2) by takyon on Monday June 06 2016, @04:36AM

    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday June 06 2016, @04:36AM (#355732) Journal

    Symbols to support the new 4K TV standard

    I'm guessing this is... italic "4K" or "2160p" in a rectangle? Close: http://www.unicode.org/charts/PDF/Unicode-9.0/U90-1F100.pdf [unicode.org]

    They also have "8K", "Lossless", "5.1" and others. They could be somewhat useful in a video UI context. "60P" and "120P" were added to represent frames per second "progressive" [frankschrader.us].

    It appears they also added an on/off symbol: ⏻

    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 1) by milsorgen on Monday June 06 2016, @04:41AM

      by milsorgen (6225) on Monday June 06 2016, @04:41AM (#355735)

      Aren't a lot of those symbols put out by industry groups or consortiums? Not sure how I like private companies getting their branding pushed into our unicode.

      --
      On the Oregon Coast, born and raised, On the beach is where I spent most of my days...
      • (Score: 2) by takyon on Monday June 06 2016, @04:47AM

        by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday June 06 2016, @04:47AM (#355738) Journal

        That's kind of what happened with the emojis. Companies like Apple and Google started using their own emojis in messaging applications, and then the Unicode Consortium pieced 4-5 competing sets into Unicode emoji dumps over the last few years.

        "4K" or "UHD" glyph from a consortium is significantly less egregious than say, company logo getting officially added (although in some cases like Nike there are close approximations). After all, you can use FFMPEG to spit out 4K video, and then slap the handy glyph on your webpage. Or... just type '4' and 'K' or 2160p. Whatever.

        --
        [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    • (Score: 2) by Scruffy Beard 2 on Monday June 06 2016, @05:30AM

      by Scruffy Beard 2 (6030) on Monday June 06 2016, @05:30AM (#355748)

      Not sure what 22.2 is about. 22.2 surround sound [wikipedia.org] might be it.

      Refers to the number of channels apparently (with two sub-woofers?)

      • (Score: 1, Informative) by Anonymous Coward on Monday June 06 2016, @03:36PM

        by Anonymous Coward on Monday June 06 2016, @03:36PM (#355939)

        The .1 in 5.1 (and .2 in "22.2") does not refer specifically to sub-woofers, but "LFE" (low-frequency effect) which is not /quite/ the same thing.

        The most important technical distinction in a 5.1 setup is the LFE channel has an extra +10dB gain applied to it at the final amplification, so it is actually much louder than putting the signal on one of the other channels (~3 times the amplitude of the regular channels). The channel is filtered to <150Hz or so, which reduces the noise problems that would normally be caused by the extra gain.

    • (Score: 2) by gidds on Tuesday June 07 2016, @12:58PM

      by gidds (589) on Tuesday June 07 2016, @12:58PM (#356373)

      Thank goodness for that!  At last, we have a way to indicate in script that a video presentation is in 3D, or 4K, or HDR, or 5.1!

      Now, if only we had some sort of general-purpose, flexible writing system we could use in the meantime...

      --
      [sig redacted]
  • (Score: 3, Informative) by gman003 on Monday June 06 2016, @04:40AM

    by gman003 (4155) on Monday June 06 2016, @04:40AM (#355734)

    "Though it may seem like we are falling back to an age of Egyptian hieroglyphics, there are some more prosaic changes as well."

    For the record, Egyptian Hieroglyphics were added in 5.2, and range from U+13000 (𓀀) to U+1342E (𓐮) [unicode.org]. I notice several overlap with modern emoji, but there are many wholly unique characters.

    • (Score: 2) by Jeremiah Cornelius on Monday June 06 2016, @03:48PM

      by Jeremiah Cornelius (2785) on Monday June 06 2016, @03:48PM (#355945) Journal

      The expansion of the emoji set finally does some justice for my ability to express more accurate Twitter communication.

      I was sorely lacking the Tony Manero Disco Dancing emoji. I hope in the final, they retain the blue suit - which more accurately represents my dress than the traditional Angel's Flight suit.

      The omission of a Baguette has also been finally addressed. The honor of La Belle France is restored, and I have a suitable replacement for the wholly inadequate Eggplant emoji.

      --
      You're betting on the pantomime horse...
  • (Score: 2, Insightful) by Anonymous Coward on Monday June 06 2016, @04:54AM

    by Anonymous Coward on Monday June 06 2016, @04:54AM (#355742)

    Now everyone gets to upgrade to a new version of the ICU libraries so the selfie generation can get their emojis properly rendered.

    https://www.youtube.com/watch?v=t4ZGKI8vpcg [youtube.com]

    • (Score: 0) by Anonymous Coward on Monday June 06 2016, @05:16AM

      by Anonymous Coward on Monday June 06 2016, @05:16AM (#355746)

      Whine now, die later. Khmer Rouge 3.1 → it's for you ☠

  • (Score: 3, Insightful) by SpockLogic on Monday June 06 2016, @12:59PM

    by SpockLogic (2762) on Monday June 06 2016, @12:59PM (#355820)

    10 posts and not a comment on BACON. You guys have got your priorities way wrong. ;-)

    --
    Overreacting is one thing, sticking your head up your ass hoping the problem goes away is another - edIII
    • (Score: 1) by kurenai.tsubasa on Monday June 06 2016, @03:07PM

      by kurenai.tsubasa (5227) on Monday June 06 2016, @03:07PM (#355921) Journal

      Very well. I for one, as a kale-eating hipster, demand separate emojis for regular bacon, uncured bacon, thick cut bacon, and center cut! So:

      • Regular bacon
      • Uncured bacon
      • Thick cut bacon
      • Center-cut bacon
      • Uncured center-cut bacon
      • Thick center-cut bacon
      • Also turkey bacon
      • Almost forget Canadian bacon!
      • And of course bacon jerky

      We'll also need emojis to describe the curing and smoking process. Ideally there should be emojis for both applewood and hickory smoked, and then we need emojis for using curing salts vs. organic celery juice.

      You too can make your own bacon! Here are some instructions for smoking and curing. [amazingribs.com] If you're a kale-eating hipster like me, you may want to use sea salt and celery [takepart.com] during the curing process. (This is what “uncured” means, come to find out. Not cured with nitrite salts, just with the er… nitrites found in celery, but hey!)

      And don't forget! The Saturday before Labor day is Bacon Day [wikipedia.org].

  • (Score: 0) by Anonymous Coward on Monday June 06 2016, @01:14PM

    by Anonymous Coward on Monday June 06 2016, @01:14PM (#355828)

    I carry more computing power in my coat pocket than existed in the entire world in 1985.

    Do I use it for groundbreaking research in medicine?
    Do I use it for guiding interstellar data collection satellites?
    Do I use it to compute optimum strategies for solving world hunger?
    Do I use it to design affordable, energy efficient homes for the poor?

    No! Screw that shit! I use it to play candy crush and pepper my primitive grunting with bright shiny objects! Now *that's progress!*

    Ain't technology grand!?

    • (Score: 2) by jdavidb on Monday June 06 2016, @01:38PM

      by jdavidb (5690) on Monday June 06 2016, @01:38PM (#355850) Homepage Journal

      Do I use it to compute optimum strategies for solving world hunger? Do I use it to design affordable, energy efficient homes for the poor?

      If you put that computing power toward those uses, would it actually accomplish those goals?

      --
      ⓋⒶ☮✝🕊 Secession is the right of all sentient beings
      • (Score: 0) by Anonymous Coward on Monday June 06 2016, @02:08PM

        by Anonymous Coward on Monday June 06 2016, @02:08PM (#355881)

        If you put that computing power toward those uses, would it actually accomplish those goals?

        Always a good question. I'm not saying actually would, but I think it's a sad commentary that given the opportunity to attempt ambitious things, we've chosen otherwise.

      • (Score: 0) by Anonymous Coward on Monday June 06 2016, @03:22PM

        by Anonymous Coward on Monday June 06 2016, @03:22PM (#355930)
        I on the other hand think that people spending hours playing computer games has done more to reduce violence than many other schemes.

        Think about it, the more time young people spend shooting each other in CoD or Counterstrike, or raiding in WoW or whatever, the less time they spend actually shooting each other in real life. Sure occasionally some of it spills into real-life, but from what I recall it wasn't uncommon some decades ago for young humans (esp males) to roam about in packs attacking and often killing each other in real life for bullshit reasons. It still happens today but the stats show that it has been dropping. Some claim it's due to the switch to unleaded fuel but go compare what that demographic spend their hours on today vs in the past.

        This is the important circuses part of the bread and circuses thing.
        • (Score: 2) by ticho on Tuesday June 07 2016, @08:05AM

          by ticho (89) on Tuesday June 07 2016, @08:05AM (#356321) Homepage Journal

          In related news, SimCity series of games are reported to have caused reduction in appearance of new towns and villages worldwide, and we have EVE Online to blame for lack of progress in colonizing space.

    • (Score: 0) by Anonymous Coward on Monday June 06 2016, @01:45PM

      by Anonymous Coward on Monday June 06 2016, @01:45PM (#355859)

      This man [buzzaldrin.com] won't be happy.

      • (Score: 0) by Anonymous Coward on Monday June 06 2016, @02:02PM

        by Anonymous Coward on Monday June 06 2016, @02:02PM (#355877)

        This man won't be happy.

        That's EXACTLY what I'm talking about. We've peaked.

    • (Score: 2) by Tork on Monday June 06 2016, @04:13PM

      by Tork (3914) Subscriber Badge on Monday June 06 2016, @04:13PM (#355958)
      Yeah, it's a good thing you've reserved your high-powered computing device for the betterment-of-mankind tasks, like broadcasting your disapproval of gaming to the world!
      --
      🏳️‍🌈 Proud Ally 🏳️‍🌈
    • (Score: 1) by kurenai.tsubasa on Monday June 06 2016, @08:30PM

      by kurenai.tsubasa (5227) on Monday June 06 2016, @08:30PM (#356093) Journal

      Do I use it for groundbreaking research in medicine?

      Would now be a good time to mention the Soylent News Folding@Home Team [soylentnews.org]?

      • (Score: 0) by Anonymous Coward on Monday June 06 2016, @09:01PM

        by Anonymous Coward on Monday June 06 2016, @09:01PM (#356103)

        Do I use it for groundbreaking research in medicine?

        Would now be a good time to mention the Soylent News Folding@Home Team [soylentnews.org]?

        Yes! Things like this are a great use of extra/spare/gratuitous computing power. Thanks for bringing it up!

  • (Score: 2, Insightful) by Anonymous Coward on Monday June 06 2016, @01:55PM

    by Anonymous Coward on Monday June 06 2016, @01:55PM (#355869)

    I think Unicode lost its way. Maybe it's time to make a new standard, building on the good things Unicode did early on, correct the very few mistakes they did back then (as far as I can see, only two — the fact that combining characters follow the base character instead of preceding them, making necessary a lookahead that was carefully avoided with the UTF encodings for single code points, and the UTF-16 encoding whose oddness stems from the fact that originally they thought 16 bits would be enough for everybody), and remove the late additions of all sorts of crap.

    • (Score: 2) by takyon on Monday June 06 2016, @04:12PM

      by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday June 06 2016, @04:12PM (#355957) Journal

      and remove the late additions of all sorts of crap.

      Why? There's no reason to. Use ASCII or just don't use the glyphs you don't like.

      Most of the additions aren't emoji at all, but actual scripts for other languages, as dead or small as they may be.

      --
      [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
      • (Score: 0) by Anonymous Coward on Monday June 06 2016, @04:23PM

        by Anonymous Coward on Monday June 06 2016, @04:23PM (#355963)

        Different AC here, but I do wonder how different the upcoming release would have been, if there hadn't been the stories a few months ago on Unicode approving emojis with little debate or argument vs languages that people still use today being kept out after years of effort.

    • (Score: 2) by DannyB on Monday June 06 2016, @04:54PM

      by DannyB (5839) Subscriber Badge on Monday June 06 2016, @04:54PM (#355982) Journal

      Remember when Unicode used to fit into 16 bits?

      Characters were no longer 8 bits. Using an 8-bit representation meant that a character might take one or more bytes. So libraries, languages and systems had to deal with this complexity.

      The new way was to make 'wide' characters that were 16-bits. Some languages, for example Java, made its characters 16-bits from the start. Avoiding all this complexity. Other systems, had 'wide' strings, and massive library support for wide strings, and thus got similar benefits.

      Then Unicode needed 20 bits. And everyone was skreeewwwed again.

      Why not just expand Unicode such that a single character is 32-bits. This will make the unicode space 4096 times as large as it is today. (Currently about a million characters?) Won't that be enough for the emoji folks? And HD TV folks? And getting Coke™ and Pepsi™ and McDonalds™ symbols into our standard character sets?

      And what about alien languages?

      Ah, screw it. Maybe we should just define a character to be 64 bits.

      --
      The lower I set my standards the more accomplishments I have.
      • (Score: 2) by butthurt on Monday June 06 2016, @08:04PM

        by butthurt (6141) on Monday June 06 2016, @08:04PM (#356084) Journal

        Only 64 bits? Clearly you're not on the IPv6 working group.

      • (Score: 2) by HiThere on Monday June 06 2016, @08:23PM

        by HiThere (866) Subscriber Badge on Monday June 06 2016, @08:23PM (#356090) Journal

        Unicode NEVER fit into 16 bits. Not in any way other than the way it could fit into 8 bits. (We could do it as a graphic representations, with 8 bits being a 2X2 square with on and off bits, detail, and transparent bits, and an all transparent having the special meaning of don't advance position, etc. So each possible character could be done as a composition to any desired depth.)

        IIRC, Japanese was the first language to demonstrate that unicode couldn't fit into 16-bits, and there was never any intention to not include Japanese. So what we had was a PARTIAL implementation of unicode that fit into 16-bits. The problem is that now people are saying 32-bits! We'll NEVER run out of space for characters. And that's just foolishness. If they want to do emoji they should hard fence off a part of the space for graphic representations, and stuff all the emoji into there. The number of images that people can come up with wouldn't fit into a 64-bit code set. Take 1/4 of the 32 bits and reserve that for a combination of bit maps and svg rules. Or just say "There *is* no limit, but the only representation guaranteed to work for everything is utf-8. (Even that has limits, but unless we expand into a galactic empire we'll probably never hit them.)

        --
        Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 2) by maxwell demon on Monday June 06 2016, @09:24PM

          by maxwell demon (1608) on Monday June 06 2016, @09:24PM (#356117) Journal

          UTF-8 can only encode up to 31 bits:

          Code points with up to 7 bits: 0xxxxxxx
          Code points with up to 11 bits: 110xxxxx 10xxxxxx
          Code points with up to 16 bits: 1110xxxx 10xxxxxx 10xxxxxx
          Code points with up to 21 bits: 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx
          Code points with up to 26 bits: 111110xx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx
          Code points with up to 31 bits: 1111110x 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

          The two bytes of the values 1111111x are not allowed in UTF8 (they are needed for the BOM).

          An obvious extension would be to only reserve 0xFF (which would still allow the BOM to be reliably detected) and add the extra encoding

          11111110 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx 10xxxxxx

          However note that this only extends the space to 36 bits,

          --
          The Tao of math: The numbers you can count are not the real numbers.
          • (Score: 2) by HiThere on Tuesday June 07 2016, @01:40AM

            by HiThere (866) Subscriber Badge on Tuesday June 07 2016, @01:40AM (#356205) Journal

            Yeah, but even if you do it that way, 36 bits are a huge amount more than 32 bits. 16 times as many. And if you see that coming, there are non-obvious extensions that can extend it in the way that "Big Int"s extend to an essentially unlimited number of places. But that you would need more than 64 bits is probably unimaginable.

            Still, if you see that coming at the time of the redesign, you just reserve a few of the patterns to mean "switch to code plane n". It's a bit cumbersome, but if you assume that letters in the same "code plane" will tend to occur near each other, you could have several thousand code planes, each the size of a full utf-8-extended sequence, and each with a arbitrary 127 item character set that's as space efficient as ASCII. This is sort of like the way I understand utf-16 as being implemented, only with a MUCH larger base plane. The defect is that mixing code planes is quite inefficient.

            The comment about a "galactic empire" was sort of a joke, but to take it a bit seriously: 2^36 is probably large enough not to worry about. 68,719,476,736 seems like the kind of sizeof character-set that you won't run out of without establishing a galactic empire. Even there, the problem is likely to be allocation of rights to assign characters in a space and font design. The estimate is that there are about 400 billion stars in the Milky Way, i.e. 400,000,000,000, now that's about 0.2 characters per star, but most of the stars probably don't have habitable planets. And of that ones that are habitable, it's probable that only a few will have life forms that have a written language. I think that having 1 out of 10,000,000 stars have a civilized planet within your empire is probably an excessively optimistic estimate, and it's large enough to handle that fairly well.

            More seriously, a really large character-set has intrinsic problems. Some fonts are disadvantaged. 6 bytes for a full stop punctuation is really a bit verbose. If you see that coming, switch to a base-64 system. And even there you have a real problem if you want to store symbols for each character. You'll be spending most of your time looking up how to represent the next character, and most of your disk space storiing the representations of characters you'll never use. I think utf-32 (23 bits) is already pushing hard against the reasonable limits. Most of my fonts only handle ASCII and a few European additions. But I said a few. They don't handle Greek, which *is* European. They also don't handle Etruscan, Ogham, or various other Europeran character sets that are rarely used on my system for one reason or another. It's reasonable that Chinese and Japanese should need to put up with large byte-sets/character because they have so many different glyphs within just their own language that there's no way they could fit into a single byte system, but when you get beyond 4-bytes/glyph you are being unreasonable even for the large character-set languages. This is (part of) why utf-16 divided up everything into code-planes, but that has caused continual problems, and anyway 2^16 isn't large enough. If we had 9-bit bytes, though, the approach would have worked a lot better.

            Now if we every DO have a "galactic empire" scale problem, my recommendation would be to base it around "utf-64", but to use that system only for translation from one sub-set to another, or for other special purposes. That size "universal character set" is too large to reasonably handle.

            --
            Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
  • (Score: 2) by Gravis on Monday June 06 2016, @07:54PM

    by Gravis (4596) on Monday June 06 2016, @07:54PM (#356077)

    Though it may seem like we are falling back to an age of Egyptian hieroglyphics, there are some more prosaic changes as well.

    yeah, this totally makes me 𓁄 [graphemica.com]

  • (Score: 0) by Anonymous Coward on Tuesday June 07 2016, @05:27AM

    by Anonymous Coward on Tuesday June 07 2016, @05:27AM (#356284)

    Kids will be named "😷" in the hope they'll grow up to be doctors. By the time they do grow up and become doctors, the word "doctor" will be replaced by "😷" in medical documents, so prescriptions will be written by "😷😷."

  • (Score: 0) by Anonymous Coward on Tuesday June 07 2016, @09:44AM

    by Anonymous Coward on Tuesday June 07 2016, @09:44AM (#356339)

    A a developer who creates their own full software stacks I ended support for Unicode versions beyond the one where they introduced the emojis. U+10FFFF is the highest codeplane.

    I'm sorry, it just became too much bullshit to support and ensure the security of that unessential crap. Now, I could just grab someone else's implementation, but I have found them all to be full of security holes, so no thanks.

    My next project will use only the classic CP437 (OEM Charset), be in English and have zero support for translations. Language evolves. At some point it's better to standardize humans than require endless modifications for the glyph pack of the week. English is the language of Earth now. Sorry, it is. If you want to write low level code you have to learn English because all assemblers have English directives and token names (even the ones for those crazy Chinese and Russian MIPS). To use C you must use English. Fuck it. I'm done. Everyone is already being taught English as a secondary language in their schools for a reason. Unicode is on the wrong side of history.

  • (Score: 0) by Anonymous Coward on Tuesday June 07 2016, @12:43PM

    by Anonymous Coward on Tuesday June 07 2016, @12:43PM (#356369)

    Themightybutthole actually does something useful for SN besides lame troll posts? Maybe I should stop wishing his worthless kids watch him slowly die of brain cancer.