Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Tuesday January 05 2016, @12:55PM   Printer-friendly
from the emoji-are-the-modern-world's-hieroglyphics dept.

Unicode version 9.0 is scheduled for release in June 2016. The final repertoire is not yet fixed, but currently 7,227 characters are scheduled for addition to Unicode 9.0, which will bring the total number of graphic and format characters in the Unicode Standard to 127,899 characters (in case you are concerned that Unicode is running out of space, that still leaves room for another 846,566 characters to be encoded). In summary, Unicode 9.0 will include 9 new blocks (named ranges of characters) and cover 4 new scripts (Osage, Bhaiksuki, Marchen and Tangut), making a total of 268 blocks and 133 scripts.


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by The Mighty Buzzard on Tuesday January 05 2016, @01:16PM

    by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Tuesday January 05 2016, @01:16PM (#285099) Homepage Journal

    Yes, our current setup can already handle Unicode 9.0, in case you were wondering and somewhat to my dismay. I really wish we could scrap the emoji and just support non-ascii language characters but that's strictly a personal bias and will not impact anyone having 💩in their sig.

    --
    My rights don't end where your fear begins.
    • (Score: 0) by Anonymous Coward on Tuesday January 05 2016, @01:52PM

      by Anonymous Coward on Tuesday January 05 2016, @01:52PM (#285113)

      Finally I have a place where it's relevant to yell this at you: the API returns doubly encoded UTF-8. Boo! Yell!

      (github sucks because they don't allow anonymous bug reports)

    • (Score: 1) by RamiK on Tuesday January 05 2016, @02:20PM

      by RamiK (1813) on Tuesday January 05 2016, @02:20PM (#285128)

      Oh Mighty Buzzard. pray tell, how does one use strike-through?

      --
      compiling...
      • (Score: 2) by The Mighty Buzzard on Tuesday January 05 2016, @04:13PM

        by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Tuesday January 05 2016, @04:13PM (#285186) Homepage Journal

        One doesn't currently unless one has uber edity powers and then only in the stories. Easy enough to add to comments though as it's just a db setting. May add <strike> tag (or whatever the html5 proper way to do it is) support this next update.

        --
        My rights don't end where your fear begins.
        • (Score: 2) by Pino P on Tuesday January 05 2016, @06:25PM

          by Pino P (4721) on Tuesday January 05 2016, @06:25PM (#285242) Journal

          HTML5 removed the <strike> element in favor of the otherwise synonymous <s> element [mozilla.org], an element whose phrasing contents "represent things that are no longer relevant or no longer accurate". Contrast with the <del> and <ins> elements, which are intended for documents that include inline diffs.

          Until Rehash on SoylentNews is configured to allow the <s> element, you can add ^W times the number of words to delete after the text: "I know why legislators let this piece of dung law through: brib^W campaign contr^W^W contributions to the super PACs supporting them."

          • (Score: 0) by Anonymous Coward on Tuesday January 05 2016, @11:55PM

            by Anonymous Coward on Tuesday January 05 2016, @11:55PM (#285392)

            Yeah, the ^W (or ^H) thing was mildly amusing 20 years ago.

            Not so much today.

            Pretty lame, actually.

          • (Score: 2) by darkfeline on Wednesday January 06 2016, @12:18AM

            by darkfeline (1030) on Wednesday January 06 2016, @12:18AM (#285406) Homepage

            Semantically, it seems like most usages of strikethrough in posts would use the tag and not the tag.

            For example, "Praise be to Big Brother^H^H^HGoogle".

            Here, "Big Brother" isn't "no longer relevant", but rather text deleted in the course of censorship^H^H^Hediting. So fits semantically.

            Since SoylentNews doesn't allow editing posts, shouldn't ever be needed; if it's no longer accurate at the time of writing the post, don't include it at all.

            --
            Join the SDF Public Access UNIX System today!
        • (Score: 2) by RamiK on Tuesday January 05 2016, @06:27PM

          by RamiK (1813) on Tuesday January 05 2016, @06:27PM (#285244)
          I see <s> but not <strike> in the html5.1 October 2015 draft: http://www.w3.org/html/wg/drafts/html/master/single-page.html .
          <s> is described as "Inaccurate text".

          There's also <del> and <ins> (still tagged "non-normative") which would actually be more appropriate for all the <del>stupid jokes</del><ins>whimsical usages</ins> I had in mind... but I'm not sure it's worth it.
          --
          compiling...
          • (Score: 0) by Anonymous Coward on Tuesday January 05 2016, @08:48PM

            by Anonymous Coward on Tuesday January 05 2016, @08:48PM (#285312)

            What are you talking about? The <del> and <ins> elements are standard. The only thing non-normative is the sections are the descriptions about their behavior in lists and across paragraphs.

            • (Score: 2) by RamiK on Wednesday January 06 2016, @04:04AM

              by RamiK (1813) on Wednesday January 06 2016, @04:04AM (#285488)
              Look under 4.7.4. It seems the implied paragraph boundaries aren't fully agreed on in some cases that don't necessarily concern tables and lists. I think the issue is merging and cutting paragraphs. In those instances, you'd want to <del> foo.</p><p>Bar...</del> but with the current rules you won't be able to.

              Well, I haven't really looked too hard at it so you might be right... But that's what I'm getting from it at a moment's glance.
              --
              compiling...
        • (Score: 1, Interesting) by Anonymous Coward on Tuesday January 05 2016, @09:00PM

          by Anonymous Coward on Tuesday January 05 2016, @09:00PM (#285321)

          Speaking of the posting form, why are the descriptions for the posting modes so weird?

          Plain Old Text: not plain old text, actually allows HTML tags
          HTML formatted: POT but without preserving line endings
          Extrans: What you'd expect Plain Old Text to be
          Code: Extrans but wrapped in a <tt>

          • (Score: 2) by The Mighty Buzzard on Wednesday January 06 2016, @12:51AM

            by The Mighty Buzzard (18) Subscriber Badge <themightybuzzard@proton.me> on Wednesday January 06 2016, @12:51AM (#285432) Homepage Journal

            I have no idea. They were that weird when we got them. At this point I think it would confuse more people than it would make happy to rework them into something that made sense.

            --
            My rights don't end where your fear begins.
            • (Score: 0) by Anonymous Coward on Wednesday January 06 2016, @01:16AM

              by Anonymous Coward on Wednesday January 06 2016, @01:16AM (#285438)

              Maybe an idea for the world-famous Soylent Poll Booth® [soylentnews.org]?

    • (Score: 3, Informative) by wisnoskij on Tuesday January 05 2016, @02:40PM

      by wisnoskij (5149) <jonathonwisnoskiNO@SPAMgmail.com> on Tuesday January 05 2016, @02:40PM (#285142)

      Why use pregenerated emoji, when you can roll your own? ٩(-̮̮̃-̃)۶ ٩(●̮̮̃•̃)۶ ٩(͡๏̯͡๏)۶ ٩(-̮̮̃•̃).

      What I really wonder is why Unicode allows anyone to type in 💩 like Example zalgo҉ text, lots of dirty c҉̫̞harac҉ters
      Which no website or document can render in a useful way.

    • (Score: 2) by Tork on Tuesday January 05 2016, @08:48PM

      by Tork (3914) on Tuesday January 05 2016, @08:48PM (#285311)
      It was meeeeeeeeeeeeee!
      --
      🏳️‍🌈 Proud Ally 🏳️‍🌈
      • (Score: 0) by Anonymous Coward on Tuesday January 05 2016, @11:08PM

        by Anonymous Coward on Tuesday January 05 2016, @11:08PM (#285370)

        With the upcoming 9.0, you can finally make your sig

        Slashdolt Logic: "18 year old jokes about [U+1F988] and lasers are +5, Funny." 💩

  • (Score: 1, Insightful) by Anonymous Coward on Tuesday January 05 2016, @02:26PM

    by Anonymous Coward on Tuesday January 05 2016, @02:26PM (#285131)

    Laugh all you want, but some of us old geezers have most of the first 127 characters indelibly inscribed in our aging grey matter.

    Not sure why I need 30.000 unicode emoji, but there we are.

    • (Score: 3, Interesting) by pTamok on Tuesday January 05 2016, @02:33PM

      by pTamok (3042) on Tuesday January 05 2016, @02:33PM (#285136)

      Yup. And Unicode 8.0 has the control code pictures, used to represent the control codes in text: ␍␤

      http://www.unicode.org/charts/PDF/U2400.pdf [unicode.org] ␃␄

    • (Score: 3, Insightful) by RamiK on Tuesday January 05 2016, @02:40PM

      by RamiK (1813) on Tuesday January 05 2016, @02:40PM (#285143)

      My problem is that there's options between 7bit and Unicode. There really shouldn't be any. You either support all languages, LTR and RTL, or you don't. It's the crap in-between that pissed me off.

      --
      compiling...
      • (Score: 2) by Pino P on Tuesday January 05 2016, @06:35PM

        by Pino P (4721) on Tuesday January 05 2016, @06:35PM (#285250) Journal

        You either support all languages, LTR and RTL, or you don't.

        For one thing, Unicode doesn't support Quenya and Sindarin because the tengwar script proposal has languished in for well over a decade.

        Besides, which way is Mongolian written? (Not to mention sarati.)

    • (Score: 1) by DannyB on Tuesday January 05 2016, @02:52PM

      by DannyB (5839) Subscriber Badge on Tuesday January 05 2016, @02:52PM (#285147) Journal

      > Not sure why I need 30.000 unicode emoji, but there we are.

      Let me give you a reason.

      Reason to have 30,000 unicode emoji: so someone can design a new programming language whose source code is written entirely using these emoji.

      --
      Young people won't believe you if you say you used to get Netflix by US Postal Mail.
      • (Score: 3, Insightful) by fritsd on Tuesday January 05 2016, @03:03PM

        by fritsd (4586) on Tuesday January 05 2016, @03:03PM (#285154) Journal

        Reason to have 30,000 unicode emoji: so someone can design a new programming language whose source code is written entirely using these emoji.

        we already have the APL language [wikipedia.org]

        • (Score: 0) by Anonymous Coward on Tuesday January 05 2016, @06:03PM

          by Anonymous Coward on Tuesday January 05 2016, @06:03PM (#285236)

          APL even has symbols to show how you feel, no emoji necessary.

          somewhat shocked
          really shocked
          really happy
          a bit unsure

      • (Score: 2) by Thexalon on Tuesday January 05 2016, @03:15PM

        by Thexalon (636) Subscriber Badge on Tuesday January 05 2016, @03:15PM (#285159)

        Surely it would be trivial to add emoji support to Brainfuck [muppetlabs.com] as a replacement for the existing 8 tokens that make up one of the nuttiest Turing-complete languages ever devised.

        --
        The only thing that stops a bad guy with a compiler is a good guy with a compiler.
        • (Score: 2) by Pino P on Tuesday January 05 2016, @06:41PM

          by Pino P (4721) on Tuesday January 05 2016, @06:41PM (#285252) Journal

          Nuttiest, but first. Brainfuck is P'' (P prime prime) [wikipedia.org], the first language using while instead of goto to be proven Turing-complete, plus two I/O instructions.

    • (Score: 1) by WillR on Tuesday January 05 2016, @06:34PM

      by WillR (2012) on Tuesday January 05 2016, @06:34PM (#285249)
      Because putting them into Unicode solved a problem with phone carriers having different shift-JIS encodings for the emoji glyphs their users were trying to send out. That's what a single unified text encoding is supposed to do, right?

      (And yes, we get it, you're better than those kids today who can't stop instabooktweeting in emoji.)
      • (Score: 0) by Anonymous Coward on Wednesday January 06 2016, @12:12AM

        by Anonymous Coward on Wednesday January 06 2016, @12:12AM (#285404)

        Granted, the Japanese carriers' emoji were nicely unified with Unicode. But what's the reason for adding more and more new emoji with little rationalization other than 'we have a fish, why not shark too?'

        • (Score: 1) by driverless on Wednesday January 06 2016, @03:30AM

          by driverless (4770) on Wednesday January 06 2016, @03:30AM (#285472)

          'we have a fish, why not shark too?'

          If you can compose glyphs of a shark and someone jumping then you'd have the representative symbol for Unicode 9.

  • (Score: 2) by wisnoskij on Tuesday January 05 2016, @02:30PM

    by wisnoskij (5149) <jonathonwisnoskiNO@SPAMgmail.com> on Tuesday January 05 2016, @02:30PM (#285135)

    What the the reason we need to update the Unicode standard all the time? Typically character sets stay relatively constant, why has Unicode needed to be updated 9 times in under 30 years?

    Who are the people eagerly waiting for this update so that they can more easily do their jobs/hobbies?

    • (Score: 3, Insightful) by FatPhil on Tuesday January 05 2016, @02:36PM

      by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Tuesday January 05 2016, @02:36PM (#285138) Homepage
      Just what I was going to say. I'm working on code that implements the 4th version of a protocol which only ever became popular in its 3rd version, and that was 2 decades back. I implement that in a language which is in only its 4th standardisation despite the fact that it's now over 40 years old.

      However, we're both overlooking the fact that unicode is no longer about textual communication, it's basically a giant clip-art library now.
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 2) by SanityCheck on Tuesday January 05 2016, @02:36PM

      by SanityCheck (5190) on Tuesday January 05 2016, @02:36PM (#285139)

      Young people use technology differently. Sure we can debate whether encoding emojis as single characters is the best thing to do as opposed to some other solution (maybe even encode it as bitmaps?), but suffice to say that the young are looking forward to these so the need for such emojis should not be critiqued.

      • (Score: 2) by tangomargarine on Tuesday January 05 2016, @02:42PM

        by tangomargarine (667) on Tuesday January 05 2016, @02:42PM (#285144)

        For the next 3 years or so, after which it will probably be passe.

        Good thing we got it into Unicode, though!

        --
        "Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
    • (Score: 3, Insightful) by snick on Tuesday January 05 2016, @02:58PM

      by snick (1408) on Tuesday January 05 2016, @02:58PM (#285150)

      If there was just some way to discover that the update adds Osage [wikipedia.org], Bhaiksuki [wikipedia.org], Marchen [wikipedia.org] and Tangut [wikipedia.org] ...
      Not very useful for you and me, but probably pretty useful for academics studying/writing about dead languages.

    • (Score: 1) by pTamok on Tuesday January 05 2016, @03:56PM

      by pTamok (3042) on Tuesday January 05 2016, @03:56PM (#285178)

      Because people are finding character sets that are in use and which would benefit from standardisation.

      It's a pity that the Unicode committee saw fit to not approve a codeset for Klingon (https://en.wikipedia.org/wiki/Klingon_alphabets), but as far as I know Tolkien's written languages (Cirth and Tengwar) have not been rejected as yet.

  • (Score: 2) by jasassin on Tuesday January 05 2016, @03:15PM

    by jasassin (3566) <jasassin@gmail.com> on Tuesday January 05 2016, @03:15PM (#285160) Homepage Journal

    Now I can use the control codes to hack your ANSI.SYS loaded from your CONFIG.SYS to DDOS my HTTPS to get money from my ADS server! Awesome!

    --
    jasassin@gmail.com GPG Key ID: 0x663EB663D1E7F223
  • (Score: 3, Insightful) by bradley13 on Tuesday January 05 2016, @03:17PM

    by bradley13 (3053) Subscriber Badge on Tuesday January 05 2016, @03:17PM (#285161) Homepage Journal

    We are already to the point that - if you need anything special - you have to choose your font carefully. It is entirely unrealistic to expect font makers to keep up with the continual expansion of Unicode, which defeats one of the major reasons for having a unified character set in the first place.

    I can see adding fonts for languages, but only as long as those languages actually have a written form. Osage, for example, does not - it was a purely spoken dialect, and some academic retroactively invented an alphabet for it in 2006. So that's nonsense, and has no business in Unicode.

    Meanwhile, what is the point of adding ever more graphical characters? Why, for example, do we need a unicode character for two wrestlers (U+1F93C)? Anyone who needs that specific image can use...an image.

    --
    Everyone is somebody else's weirdo.
    • (Score: 2) by jasassin on Tuesday January 05 2016, @03:29PM

      by jasassin (3566) <jasassin@gmail.com> on Tuesday January 05 2016, @03:29PM (#285166) Homepage Journal

      Hear ye hear ye! Unicode is fucked!

      --
      jasassin@gmail.com GPG Key ID: 0x663EB663D1E7F223
    • (Score: 5, Insightful) by TheRaven on Tuesday January 05 2016, @03:41PM

      by TheRaven (270) on Tuesday January 05 2016, @03:41PM (#285169) Journal
      That's not an issue with the fonts, that's an issue with the font engine correctly handling fallback. For example, most of the fonts on my system don't have a full set of Chinese ideograms (or, indeed, any of them), but the font engine will happily substitute glyphs from another font when asked to do so. Most applications are therefore completely unaware of this substitution and as long as the system ships with at least one font for each glyph then it's fine.
      --
      sudo mod me up
    • (Score: 2) by Pino P on Tuesday January 05 2016, @06:48PM

      by Pino P (4721) on Tuesday January 05 2016, @06:48PM (#285257) Journal

      It's so that, say, Comic Neue can replace the wrestlers with xkcd-style stick figures, and WWE can replace the wrestlers in its corporate font with "wrestlers" at the same time it replaces the IPA letter ʬ (LATIN LETTER BILABIAL PERCUSSIVE) [wikipedia.org] with its logo.

  • (Score: 1) by jlv2 on Tuesday January 05 2016, @05:40PM

    by jlv2 (5299) on Tuesday January 05 2016, @05:40PM (#285224)

    Jump the U+1F988

    • (Score: 0) by Anonymous Coward on Tuesday January 05 2016, @05:53PM

      by Anonymous Coward on Tuesday January 05 2016, @05:53PM (#285232)

      Dear jlv2,

      How do you type without U+1F94A on?

      Crapfully yours,
      Anonymous Coward

  • (Score: 3, Informative) by unzombied on Tuesday January 05 2016, @10:03PM

    by unzombied (4572) on Tuesday January 05 2016, @10:03PM (#285345)
    Which typeface has 128K characters? None [wikipedia.org]. However:
    • 64K - GNU/Unifont [wikipedia.org] - GPL, functional yet dot matrix printer ugly
    • 53K - Code 2000 [wikipedia.org] - (unrestricted) shareware (last download spotted at Wayback Machine)
    • 10K - FreeSerif [wikipedia.org] - GPL (decent serif)
    • 06K - FreeSans [wikipedia.org] - GPL (decent sans serif)
    • 02K - Gentium [sil.org] - SIL (quite open, Latin/Greek/Cyrillic, layperson's Garamond)
    • ??K - Liberation [wikipedia.org] - GPL (Latin/Greek/Cyrillic, serif and sans serif)