posted by
martyb
on Tuesday January 05 2016, @12:55PM
from the emoji-are-the-modern-world's-hieroglyphics dept.
from the emoji-are-the-modern-world's-hieroglyphics dept.
Unicode version 9.0 is scheduled for release in June 2016. The final repertoire is not yet fixed, but currently 7,227 characters are scheduled for addition to Unicode 9.0, which will bring the total number of graphic and format characters in the Unicode Standard to 127,899 characters (in case you are concerned that Unicode is running out of space, that still leaves room for another 846,566 characters to be encoded). In summary, Unicode 9.0 will include 9 new blocks (named ranges of characters) and cover 4 new scripts (Osage, Bhaiksuki, Marchen and Tangut), making a total of 268 blocks and 133 scripts.
This discussion has been archived.
No new comments can be posted.
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(Score: 2) by The Mighty Buzzard on Tuesday January 05 2016, @01:16PM
Yes, our current setup can already handle Unicode 9.0, in case you were wondering and somewhat to my dismay. I really wish we could scrap the emoji and just support non-ascii language characters but that's strictly a personal bias and will not impact anyone having 💩in their sig.
My rights don't end where your fear begins.
(Score: 0) by Anonymous Coward on Tuesday January 05 2016, @01:52PM
Finally I have a place where it's relevant to yell this at you: the API returns doubly encoded UTF-8. Boo! Yell!
(github sucks because they don't allow anonymous bug reports)
(Score: 2) by The Mighty Buzzard on Tuesday January 05 2016, @04:17PM
Does it? Well damn. I'll see if I can fix that tonight or tomorrow. Won't take effect until this month's (hopefully) update though.
My rights don't end where your fear begins.
(Score: 1) by RamiK on Tuesday January 05 2016, @02:20PM
Oh Mighty Buzzard. pray tell, how does one use strike-through?
compiling...
(Score: 2) by The Mighty Buzzard on Tuesday January 05 2016, @04:13PM
One doesn't currently unless one has uber edity powers and then only in the stories. Easy enough to add to comments though as it's just a db setting. May add <strike> tag (or whatever the html5 proper way to do it is) support this next update.
My rights don't end where your fear begins.
(Score: 2) by Pino P on Tuesday January 05 2016, @06:25PM
HTML5 removed the <strike> element in favor of the otherwise synonymous <s> element [mozilla.org], an element whose phrasing contents "represent things that are no longer relevant or no longer accurate". Contrast with the <del> and <ins> elements, which are intended for documents that include inline diffs.
Until Rehash on SoylentNews is configured to allow the <s> element, you can add ^W times the number of words to delete after the text: "I know why legislators let this piece of dung law through: brib^W campaign contr^W^W contributions to the super PACs supporting them."
(Score: 0) by Anonymous Coward on Tuesday January 05 2016, @11:55PM
Yeah, the ^W (or ^H) thing was mildly amusing 20 years ago.
Not so much today.
Pretty lame, actually.
(Score: 2) by darkfeline on Wednesday January 06 2016, @12:18AM
Semantically, it seems like most usages of strikethrough in posts would use the tag and not the tag.
For example, "Praise be to Big Brother^H^H^HGoogle".
Here, "Big Brother" isn't "no longer relevant", but rather text deleted in the course of censorship^H^H^Hediting. So fits semantically.
Since SoylentNews doesn't allow editing posts, shouldn't ever be needed; if it's no longer accurate at the time of writing the post, don't include it at all.
Join the SDF Public Access UNIX System today!
(Score: 2) by RamiK on Tuesday January 05 2016, @06:27PM
<s> is described as "Inaccurate text".
There's also <del> and <ins> (still tagged "non-normative") which would actually be more appropriate for all the <del>stupid jokes</del><ins>whimsical usages</ins> I had in mind... but I'm not sure it's worth it.
compiling...
(Score: 0) by Anonymous Coward on Tuesday January 05 2016, @08:48PM
What are you talking about? The <del> and <ins> elements are standard. The only thing non-normative is the sections are the descriptions about their behavior in lists and across paragraphs.
(Score: 2) by RamiK on Wednesday January 06 2016, @04:04AM
Well, I haven't really looked too hard at it so you might be right... But that's what I'm getting from it at a moment's glance.
compiling...
(Score: 1, Interesting) by Anonymous Coward on Tuesday January 05 2016, @09:00PM
Speaking of the posting form, why are the descriptions for the posting modes so weird?
Plain Old Text: not plain old text, actually allows HTML tags
HTML formatted: POT but without preserving line endings
Extrans: What you'd expect Plain Old Text to be
Code: Extrans but wrapped in a <tt>
(Score: 2) by The Mighty Buzzard on Wednesday January 06 2016, @12:51AM
I have no idea. They were that weird when we got them. At this point I think it would confuse more people than it would make happy to rework them into something that made sense.
My rights don't end where your fear begins.
(Score: 0) by Anonymous Coward on Wednesday January 06 2016, @01:16AM
Maybe an idea for the world-famous Soylent Poll Booth® [soylentnews.org]?
(Score: 3, Informative) by wisnoskij on Tuesday January 05 2016, @02:40PM
Why use pregenerated emoji, when you can roll your own? ٩(-̮̮̃-̃)۶ ٩(●̮̮̃•̃)۶ ٩(͡๏̯͡๏)۶ ٩(-̮̮̃•̃).
What I really wonder is why Unicode allows anyone to type in 💩 like Example zalgo҉ text, lots of dirty c҉̫̞harac҉ters
Which no website or document can render in a useful way.
(Score: 2) by Tork on Tuesday January 05 2016, @08:48PM
🏳️🌈 Proud Ally 🏳️🌈
(Score: 0) by Anonymous Coward on Tuesday January 05 2016, @11:08PM
With the upcoming 9.0, you can finally make your sig
Slashdolt Logic: "18 year old jokes about [U+1F988] and lasers are +5, Funny." 💩
(Score: 1, Insightful) by Anonymous Coward on Tuesday January 05 2016, @02:26PM
Laugh all you want, but some of us old geezers have most of the first 127 characters indelibly inscribed in our aging grey matter.
Not sure why I need 30.000 unicode emoji, but there we are.
(Score: 3, Interesting) by pTamok on Tuesday January 05 2016, @02:33PM
Yup. And Unicode 8.0 has the control code pictures, used to represent the control codes in text: ␍

http://www.unicode.org/charts/PDF/U2400.pdf [unicode.org] ␃␄
(Score: 3, Insightful) by RamiK on Tuesday January 05 2016, @02:40PM
My problem is that there's options between 7bit and Unicode. There really shouldn't be any. You either support all languages, LTR and RTL, or you don't. It's the crap in-between that pissed me off.
compiling...
(Score: 2) by Pino P on Tuesday January 05 2016, @06:35PM
You either support all languages, LTR and RTL, or you don't.
For one thing, Unicode doesn't support Quenya and Sindarin because the tengwar script proposal has languished in for well over a decade.
Besides, which way is Mongolian written? (Not to mention sarati.)
(Score: 1) by DannyB on Tuesday January 05 2016, @02:52PM
> Not sure why I need 30.000 unicode emoji, but there we are.
Let me give you a reason.
Reason to have 30,000 unicode emoji: so someone can design a new programming language whose source code is written entirely using these emoji.
Young people won't believe you if you say you used to get Netflix by US Postal Mail.
(Score: 3, Insightful) by fritsd on Tuesday January 05 2016, @03:03PM
we already have the APL language [wikipedia.org]
(Score: 0) by Anonymous Coward on Tuesday January 05 2016, @06:03PM
APL even has symbols to show how you feel, no emoji necessary.
⍤ somewhat shocked
⍥ really shocked
⍢ really happy
⍨ a bit unsure
(Score: 2) by Thexalon on Tuesday January 05 2016, @03:15PM
Surely it would be trivial to add emoji support to Brainfuck [muppetlabs.com] as a replacement for the existing 8 tokens that make up one of the nuttiest Turing-complete languages ever devised.
The only thing that stops a bad guy with a compiler is a good guy with a compiler.
(Score: 2) by Pino P on Tuesday January 05 2016, @06:41PM
Nuttiest, but first. Brainfuck is P'' (P prime prime) [wikipedia.org], the first language using while instead of goto to be proven Turing-complete, plus two I/O instructions.
(Score: 1) by WillR on Tuesday January 05 2016, @06:34PM
(And yes, we get it, you're better than those kids today who can't stop instabooktweeting in emoji.)
(Score: 0) by Anonymous Coward on Wednesday January 06 2016, @12:12AM
Granted, the Japanese carriers' emoji were nicely unified with Unicode. But what's the reason for adding more and more new emoji with little rationalization other than 'we have a fish, why not shark too?'
(Score: 1) by driverless on Wednesday January 06 2016, @03:30AM
'we have a fish, why not shark too?'
If you can compose glyphs of a shark and someone jumping then you'd have the representative symbol for Unicode 9.
(Score: 2) by wisnoskij on Tuesday January 05 2016, @02:30PM
What the the reason we need to update the Unicode standard all the time? Typically character sets stay relatively constant, why has Unicode needed to be updated 9 times in under 30 years?
Who are the people eagerly waiting for this update so that they can more easily do their jobs/hobbies?
(Score: 3, Insightful) by FatPhil on Tuesday January 05 2016, @02:36PM
However, we're both overlooking the fact that unicode is no longer about textual communication, it's basically a giant clip-art library now.
Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
(Score: 2) by SanityCheck on Tuesday January 05 2016, @02:36PM
Young people use technology differently. Sure we can debate whether encoding emojis as single characters is the best thing to do as opposed to some other solution (maybe even encode it as bitmaps?), but suffice to say that the young are looking forward to these so the need for such emojis should not be critiqued.
(Score: 2) by tangomargarine on Tuesday January 05 2016, @02:42PM
For the next 3 years or so, after which it will probably be passe.
Good thing we got it into Unicode, though!
"Is that really true?" "I just spent the last hour telling you to think for yourself! Didn't you hear anything I said?"
(Score: 3, Insightful) by snick on Tuesday January 05 2016, @02:58PM
If there was just some way to discover that the update adds Osage [wikipedia.org], Bhaiksuki [wikipedia.org], Marchen [wikipedia.org] and Tangut [wikipedia.org] ...
Not very useful for you and me, but probably pretty useful for academics studying/writing about dead languages.
(Score: 1) by pTamok on Tuesday January 05 2016, @03:56PM
Because people are finding character sets that are in use and which would benefit from standardisation.
It's a pity that the Unicode committee saw fit to not approve a codeset for Klingon (https://en.wikipedia.org/wiki/Klingon_alphabets), but as far as I know Tolkien's written languages (Cirth and Tengwar) have not been rejected as yet.
(Score: 2) by jasassin on Tuesday January 05 2016, @03:15PM
Now I can use the control codes to hack your ANSI.SYS loaded from your CONFIG.SYS to DDOS my HTTPS to get money from my ADS server! Awesome!
jasassin@gmail.com GPG Key ID: 0x663EB663D1E7F223
(Score: 3, Insightful) by bradley13 on Tuesday January 05 2016, @03:17PM
We are already to the point that - if you need anything special - you have to choose your font carefully. It is entirely unrealistic to expect font makers to keep up with the continual expansion of Unicode, which defeats one of the major reasons for having a unified character set in the first place.
I can see adding fonts for languages, but only as long as those languages actually have a written form. Osage, for example, does not - it was a purely spoken dialect, and some academic retroactively invented an alphabet for it in 2006. So that's nonsense, and has no business in Unicode.
Meanwhile, what is the point of adding ever more graphical characters? Why, for example, do we need a unicode character for two wrestlers (U+1F93C)? Anyone who needs that specific image can use...an image.
Everyone is somebody else's weirdo.
(Score: 2) by jasassin on Tuesday January 05 2016, @03:29PM
Hear ye hear ye! Unicode is fucked!
jasassin@gmail.com GPG Key ID: 0x663EB663D1E7F223
(Score: 5, Insightful) by TheRaven on Tuesday January 05 2016, @03:41PM
sudo mod me up
(Score: 2) by Pino P on Tuesday January 05 2016, @06:48PM
It's so that, say, Comic Neue can replace the wrestlers with xkcd-style stick figures, and WWE can replace the wrestlers in its corporate font with "wrestlers" at the same time it replaces the IPA letter ʬ (LATIN LETTER BILABIAL PERCUSSIVE) [wikipedia.org] with its logo.
(Score: 1) by jlv2 on Tuesday January 05 2016, @05:40PM
Jump the U+1F988
(Score: 0) by Anonymous Coward on Tuesday January 05 2016, @05:53PM
Dear jlv2,
How do you type without U+1F94A on?
Crapfully yours,
Anonymous Coward
(Score: 3, Informative) by unzombied on Tuesday January 05 2016, @10:03PM