blockquote {border-left:3px solid #0F0 !important; padding-left:1em !important;}
/* Submissions */
.data .status0 {background:#FFF !important; color:#080 !important;}
.data .status0 a {color:#080 !important;}
.data .status0 a:visited {color:#0A0 !important;}
.data .status0 a:hover {color:#0C0 !important;}
.data .status1 {background:#800 !important;}
.data .status2 {background:#256625 !important;}
NOTE: This is a work-in-progress; read at your own risk/confusion. It is an attempt to gather together bookmarks, tabs, and information pertaining to Unicode, UTF-8, HTML, and 'characters'.
It would seem to be a simple enough question to answer, but things are not always as they seem:
What characters should SoylentNews support?
Motivation: as many of you are aware, one of the early improvements that SoylentNews made to its base source code was to support Unicode characters. (Thanks to the heroic efforts of The Mighty Buzzard.) The underlying code only supported ASCII (American Standard Code for Information Interchange) characters. Which was just fine for as far as it went. It just didn't go far enough for us...
I took on the task of testing our implementation of UTF-8 support. Little did I know what I was getting into! It has been a fascinating journey, indeed!
What is Unicode?
This is taken from What is Unicode?:
Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.
These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.
Unicode is changing all that!
Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystems, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.
Here is an excerpt from Wikipedia's entry for Unicode:
Unicode has the explicit aim of transcending the limitations of traditional character encodings, such as those defined by the ISO 8859 standard, which find wide usage in various countries of the world but remain largely incompatible with each other. Many traditional character encodings share a common problem in that they allow bilingual computer processing (usually using Latin characters and the local script), but not multilingual computer processing (computer processing of arbitrary scripts mixed with each other).
Unicode, in intent, encodes the underlying characters—graphemes and grapheme-like units—rather than the variant glyphs (renderings) for such characters. ...
In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character. In other words, Unicode represents a character in an abstract way and leaves the visual rendering (size, shape, font, or style) to other software, such as a web browser or word processor.
A little more background: There are certain code points in Unicode that are of questionable value in the context of a web page; further, there are code points which are defined to be invalid! And then, just to make things even more interesting, I found a list of invalid characters in an HTML document:
Illegal characters
HTML forbids[6] the use of the characters with Universal Character Set/Unicode code points (in decimal form, preceded by x in hexadecimal form)
- 0 to 31, except 9, 10, and 13 (C0 control characters)
- 127 (DEL character)
- 128 to 159 (x80 – x9F, C1 control characters)
- 55296 to 57343 (xD800 – xDFFF, the UTF-16 surrogate halves)
The Unicode standard also forbids:
- 65534 and 65535 (xFFFE – xFFFF), non-characters, related to xFEFF, the byte order mark.
UTF-8; Unicode Transfer Format - 8-bit
Though there are several means by which Unicode characters can be transmitted between contexts, one of the most popular is UTF-8, which is what was chosen for use in SoylentNews.
SoylentNews:
What you see from our site mostly comes via a browser (though we also support Gopher and NNTP; you can have stories e-mailed to you; and we also have an RSS/Atom feeds... wow!)
Our site currently formats web pages as HTML 4.01; here's a representative DOCTYPE:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
"http://www.w3.org/TR/html4/strict.dtd">
At some point in the future we may want to directly support HTML5; ideally nothing should preclude or complicate that effort.
See also:
Obviously, we need not support the invalid code points. (Enumerate them here).
Unicode and UTF-8
So Unicode is a collection of mappings of code-points (numbers) to 'characters'; UTF-8 is a Unicode Transformation Format, 8-bit, used to transmit/encode Unicode code points.
To be continued...
Today I lapsed in my local front of the ’War On Open Tabs (And Also Windows)’ (or ‘Woot(Aaw)’ for short) and from looking closer at ChipWhisperer-Lite detouring to EEVBlog and on to ESD (ElectroStatic Discharge) issues and Bill Beaty who tricked me into clicking onwards to ‘Vulture Central’ (The Register) where I ended up picking through the bones of the recent news of “Lost White City of the Monkey God Found After 500 Years” in which I came across a reference to a work of Lovecraft written 94 years ago which I didn't remember reading.
However it turned out I had read it under a different name, and on that same Wikipedia page once again yet another venerated authority was quoted slagging off HPL accusing him of being a “racist” or supremacist or anything else non-imaginative even though HPL had plainly explained what he was writing about or in response to (which was unimaginative shitty white village dramas).
At which point I wanted to congratulate E. F. Bleiler for picking through a dead man's brain in order to arouse himself and revel in the sticky glory of it. HPL would surely appreciate the combined or better yet unified necrophiliac morbidity and righteous “holy” bigoted megalomania of Bleiler's actions.
(And where else to offer my sarcastic approval than on my very own journal? Did I piss on his grave? I apologize but in my defense I can't be blamed for not noticing it on account of the latrine placed at the same spot by so many of his peers).
But yes for a while already E. F. Bleiler has been just as dead and if there's anything left after a few generations of macro and microfaunal procreation any maggots can continue his gruesome trade on his own brain :)
Now please excuse me as I make a few bookmarks for perusal in the eternity of time I do not have (and correct an index in an actual book) and close twenty or so tabs… the war must go on.
…and now I can't help but wonder what the world would be like if there was a Mr. and/or Mrs. Crowley-Lovecraft out there, and yes it has to be a double-barreled surname ;)
P.S. Happy Easter!
Title says it all really but I found it surprising that there wasn't much noticeable difference. It started out foggy but the fog had mostly risen to 100% low cloud cover at local eclipse maximum (88.8%) yet if I didn't know and someone told me there was an 88.8% eclipse at that moment I wouldn't have believed them at all. The level of light felt unexceptionally normal. It was more noticeable a while after the local maximum was over as it started to get a little bit brighter but for all purposes it was just like a normal variation caused by weather, no weird shadows, not even any streetlights turning themselves on.
It was so unnoticeable I double-checked the time of the event and my clock. I guess my eyes almost entirely compensated for the small and ever so gradual change. Right now it's not even supposed to be entirely over yet but meh :)
I slept through the last eclipse so maybe this is all completely normal, the lack of difference that is; me sleeping right through “events” is very normal :D
This editor thinks about things... usually does not reach a conclusion.
Every so often an important story happens, or there are no usable submissions and an editor might elect to circumvent the normal process and set their own story for release. This goes against the normal submissions process, it is not something that happens very often. Site news is the exception to this for obvious reasons.
On the occasions we have released a story as described above -- not waiting for a submission -- there has been no complaints that I am aware of.
Honestly, I do not rush to start releasing my own stories, or to make submissions. Organic and original submissions are far better and what I really want to see more of.
What are your opinions on editors finding and releasing stories this way more often? Especially when it comes to 'breaking news', but more generally also.
[This journal entry is just that, it is not an official SoylentNews RFC or endorsed by any of the staff.]
A CSX train carrying crude oil has derailed and is on fire. At least one house has been burned and it's been reported a railcar fell into the Kanawha River.
What one would think would be an embarassing photo of Supreme Court Justice Ruth Bader Ginsburg nodding off at the last SOTU speech was laughed off by her and a colleague in a lighthearted moment before an audience at George Washington University in Washington yesterday:
"The audience – for the most part – is awake because they're bobbing up and down all the time and we sit there stone-faced, sober judges," Ginsburg said. "At least I wasn't 100 percent sober because before we went to the State of the Union we had dinner."
Ginsburg said that Justice Anthony Kennedy was the culprit, bringing wine to dinner.
At least.