If I can break something without even trying, surely I'll be able to do it when I do try
According to the preview, Chinese characters don't work either I think
I donâ€™t see a comment button other than â€œreply to this.â€ Hmmâ€¦ I wasnâ€™t willing to use unicode in my comment on another page, because the preview looked wrong, but maybe this comment will look fine after I submit it.
Hitting "reply" to the main story is equivalent to "post." But you're welcome to test out almost anything in this thread so we can work the bugs out.
Ah yes, there is the reply button. Thanks.
I wonder how someone else managed to post in braille, and I canâ€™t even get quote marks to work. Iâ€™ll try HTML.
⡌⠁⠧⠑ ⠼⠁⠒ ⡍⠜⠇⠑⠹⠰⠎ ⡣⠕⠌
⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠙⠑⠁⠙⠒ ⠞⠕ ⠃⠑⠛⠔ ⠺⠊⠹⠲ ⡹⠻⠑ ⠊⠎ ⠝⠕ ⠙⠳⠃⠞
⠱⠁⠞⠑⠧⠻ ⠁⠃⠳⠞ ⠹⠁⠞⠲ ⡹⠑ ⠗⠑⠛⠊⠌⠻ ⠕⠋ ⠙⠊⠎ ⠃⠥⠗⠊⠁⠇ ⠺⠁⠎
⠎⠊⠛⠝⠫ ⠃⠹ ⠹⠑ ⠊⠇⠻⠛⠹⠍⠁⠝⠂ ⠹⠑ ⠊⠇⠻⠅⠂ ⠹⠑ ⠥⠝⠙⠻⠞⠁⠅⠻⠂
⠁⠝⠙ ⠹⠑ ⠡⠊⠑⠋ ⠍⠳⠗⠝⠻⠲ ⡎⠊⠗⠕⠕⠛⠑ ⠎⠊⠛⠝⠫ ⠊⠞⠲ ⡁⠝⠙
⡎⠊⠗⠕⠕⠛⠑⠰⠎ ⠝⠁⠍⠑ ⠺⠁⠎ ⠛⠕⠕⠙ ⠥⠏⠕⠝ ⠰⡡⠁⠝⠛⠑⠂ ⠋⠕⠗ ⠁⠝⠹⠹⠔⠛ ⠙⠑
⠡⠕⠎⠑ ⠞⠕ ⠏⠥⠞ ⠙⠊⠎ ⠙⠁⠝⠙ ⠞⠕⠲
⡕⠇⠙ ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲
⡍⠔⠙⠖ ⡊ ⠙⠕⠝⠰⠞ ⠍⠑⠁⠝ ⠞⠕ ⠎⠁⠹ ⠹⠁⠞ ⡊ ⠅⠝⠪⠂ ⠕⠋ ⠍⠹
⠪⠝ ⠅⠝⠪⠇⠫⠛⠑⠂ ⠱⠁⠞ ⠹⠻⠑ ⠊⠎ ⠏⠜⠞⠊⠊⠥⠇⠜⠇⠹ ⠙⠑⠁⠙ ⠁⠃⠳⠞
⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲ ⡊ ⠍⠊⠣⠞ ⠙⠁⠧⠑ ⠃⠑⠲ ⠔⠊⠇⠔⠫⠂ ⠍⠹⠎⠑⠇⠋⠂ ⠞⠕
⠗⠑⠛⠜⠙ ⠁ ⠊⠕⠋⠋⠔⠤⠝⠁⠊⠇ ⠁⠎ ⠹⠑ ⠙⠑⠁⠙⠑⠌ ⠏⠊⠑⠊⠑ ⠕⠋ ⠊⠗⠕⠝⠍⠕⠝⠛⠻⠹
⠔ ⠹⠑ ⠞⠗⠁⠙⠑⠲ ⡃⠥⠞ ⠹⠑ ⠺⠊⠎⠙⠕⠍ ⠕⠋ ⠳⠗ ⠁⠝⠊⠑⠌⠕⠗⠎
⠊⠎ ⠔ ⠹⠑ ⠎⠊⠍⠊⠇⠑⠆ ⠁⠝⠙ ⠍⠹ ⠥⠝⠙⠁⠇⠇⠪⠫ ⠙⠁⠝⠙⠎
⠩⠁⠇⠇ ⠝⠕⠞ ⠙⠊⠌⠥⠗⠃ ⠊⠞⠂ ⠕⠗ ⠹⠑ ⡊⠳⠝⠞⠗⠹⠰⠎ ⠙⠕⠝⠑ ⠋⠕⠗⠲ ⡹⠳
⠺⠊⠇⠇ ⠹⠻⠑⠋⠕⠗⠑ ⠏⠻⠍⠊⠞ ⠍⠑ ⠞⠕ ⠗⠑⠏⠑⠁⠞⠂ ⠑⠍⠏⠙⠁⠞⠊⠊⠁⠇⠇⠹⠂ ⠹⠁⠞
⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲
(The first couple of paragraphs of "A Christmas Carol" by Dickens)
I can't see the naked lady.
She doesn't appear until chapter 2.
All I see is blonde, brunette, redhead ...
Actually, it's braille, so you feel the blonde, brunette and redhead.
Her mouth says "no" but her bumps say "⢀⣲⠢⡔".
Works like a charm!
Followup test as Landon says he can't post.
As Henry Ford would've said, you can post in any character set as long as it's USASCII.
oooooOOOOoooɹɹɹɐɐɐɐ --werewolf greeting in upside down. :-)
Alas, mirrored text (done by using ‮) doesn't seem to work, even if it does show up mirrored when pasted in the comment editbox.
xBFVm6 http://www.qs3pe5zgdxc9iovktapt2dbyppkmkqfz.com/ [qs3pe5zgdx...kmkqfz.com]
A character beyond what UNICODE restricts itself: &#x20FFFF; - shows like
Something below the max limit, valid, but unassigned 𰀁 - shows like
Something that's an invalid UNICODE character -  - shows like
The REPLACEMENT CHARACTER i.e. � - shows like �
Now, what the above will do inside the storage, I don't know, I'm just trying to go forth and BREAK those minions!
(hmmm, the "preview" looks like they are totally kicked out, not replaced by the replacement character [wikipedia.org] - won't cry over the loss of it)
You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the nerves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the trangession of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of regex parsers for HTML will instantly transport a programmer's consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection will devour your HTML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fight he com̡e̶s, ̕h̵is un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸ ̛l̕ik͏e liquid pain, the song of re̸gular expression parsing will extinguish the voices of mortal man from the sphere I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful the final snuffing of the lies of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S ̨̥̫͎̭ͯ̿̔̀ͅ
So Tony the Pony brings us UTF-8... I would have expected a unicorn. :-/
UTF = Unicorn Tony Form?䍶
Then we get to know why slashdot won't support UTF-8...
Wow. Your comment made Hal look lucid in his last moments :)
I know UTF-8 is one of those "features" that many people on slashdot have missed for a long time, but I thought most of that was for simple additions like the euro/pound/yen symbols and such.
Am I the only one that would prefer not to see non-english text mixed in with the comments?
Then again, I'd fully support an exception for Klingon. Maybe elvish too.
Yeah it may mean some unreadable posts, but as pointed out in the story text moderation should take care of that. There was even talk of a new mod category, something like "-1 Unintelligible" :P
I rather like that idea for a mod.
There will be stories about Finns with äs in their names. Swedes with Ås in their names, and maybe, just maybe, stores about icelandic volcanoes with ðs in their name. Whilst I like the idea of everyone agreeing to use English language, that doesn't mean every word they'll be typing will be English.
Note - this post isn't UTF-8, this is plain old ASCII, I used the &entity; syntax.
At present, there's nothing stopping someone from writing in German, French, Spanish, or many other languages using Latin letters (more or less). Since it doesn't happen, I don't think we need to worry.
(But if I post in a quote with an em-dash: â€” or some â€œproperâ€ â€˜quotes,â€™ maybe an arrow â†’ or temperature (0Â°C) it won't make a mess.)
(Nope, looks like it's going to make a mess.)
Well, why shouldn't we be able to post non-English comments?
And even if you don't, it would be nice to be able to spell some people's names correctly.
Lech WaÅ‚Ä™sa [wikipedia.org]FriÃ°rik ÃžÃ³r FriÃ°riksson [wikipedia.org]Jaroslav HaÅ¡ek [wikipedia.org]
How do you consider the use of â„ƒ or ãŽž units: are they proper English or part of the non-english text?
Oooopps. A copy/paste of these characters from the "Character map" (on Ubuntu/Firefox) straight into the reply text box results in them showing mangled in the preview (no mater if plain-old text or HTML).
To provide the context: I was enquiring about the use of these ℃ and ㎞ units.
The bytes are probably interpreted by slashcode as latin1 instead of utf8.
Hmm, not seeing a preview. Using Plain Old Text mode. Switching to HTML next...
OK, HTML Formatted does give a preview.
But the 4 horizontally stacked lines do not represent properly in preview.
So, it's a bug.
Here's a guide to getting FULL UTF8MB4 support in the DB:
OH, and now I see preview in Plain Old Text, having added more than the "foo_bar" that was originally present.
And, to be certain, I removed everything but the "foo_bar" and got ... no preview in Plain Old Text.
I like my cookies crispy with chocolate chips!
no can post.
ï¼‡The Rossâ€“Littlewood paradox[clarification needed] (also known as the balls and vase problem or the ping pong ball problem) is a hypothetical problem in abstract mathematics and logic designed to illustrate the seemingly paradoxical, or at least non-intuitive, nature of infinity. More specifically, like the Thomson's lamp paradox, the Rossâ€“Littlewood paradox tries to illustrate the conceptual difficulties with the notion of a supertask, in which an infinite number of tasks are completed sequentially. The problem was originally described by mathematician John E. Littlewood in his 1953 book Littlewood's Miscellany, and was later expanded upon by Sheldon Ross in his 1988 book A First Course in Probability.
"MÎ±gÄ‘Î±lÑÐ¸â€²s ÄÎ±ÑÎºÐ¸ÑÑs" is a bad bad string about a bad mans darkness.
Oh bleep. There goes the neighborhood (U+0CCB). à³‹
When writing letters and symbols outside of the normal keyboard mapping, what is the most often used method? Is it with AltGr key (like AltGr-M for µ), or with a Compose key, a keycombo to enter unicode char number (ctrl-shift-u ?), or simply cut&paste from a character table application?
A⃣ B⃣ C⃣
There will probably be use of lots of unicode in discussions about languages etc later on.
I might add that I think UTF-16 would be preferable.
Let me sing a little song to celebrate this event: ♩♫♬♯♪♩♫♬♯♩♪♫♬♯♪
I'm not sure it is my web browsers fault or not, but it didn't work if I wrote the characters here directly, only if I entered them html encoded
( â™©â™«â™¬â™¯â™ªâ™©â™«ð…Ÿâ™¬â™¯â™ªð…¡ð…žâ™©â™«ð…Ÿâ™ ¬â™¯â™ªð…¡ð…ž)
I don't seem to be able to post anything with non-Latin characters.
Preview doen't work... neither text nor html
textÑ‚ÐµÐºÑÑ‚ÎºÎµÎ¯Î¼ÎµÎ½Î¿è¯¾æ–‡èª²æ–‡ãƒ†ã‚ã‚¹ãƒˆÕ¿Õ¥Ö„Õ½Õ¿mÉ™tnà¦¶à¦¿à¦°à§‹à¦¨à¦¾à¦®áƒ¢áƒ”áƒ¥áƒ¡áƒ¢áƒ˜àªªàª¾àª tÃ¨ksà¤ªà¤¾à¤ à²ªà² à³à²¯áž¢ážáŸ’ážáž”áž‘ì›ë³¸à»€àº™àº·à»‰àºà»ƒàº™à¤®à¤œà¤•à¥‚à¤°à®‰à®°à¯ˆà°Ÿà±†à°•à±à°¸à±à°Ÿà±à¸‚à¹‰à¸à¸„à¸§à¸²à¸¡vÄƒn báº£nÑ‚ÑÐºÑÑ‚Ù…ØªÙ†Ù†Øµ×˜×§×¡×˜
If I just type it in I get the traditional "utf8 shows as 8859-1" crap:
Â«Les accents ont une fonction en franÃ§ais, ne serait-ce que pour distinguer,
dans un hÃ´pital psychiatrique, entre les internes et les internÃ©s.Â»
- Annie Bourret
Maxwells equations always sounded better in their original Klingon.
Its supposed to be pretty tame stuff, div B equals zero and all that. Net charge of magnetic field in space doesn't exist aka net flow in and out of a closed surface is zero as long as monopoles don't exist, that kind of thing.
Doesn't look like it does. Been trying to type Japanese text but it doesn't seem to work. Japanese text in the subject gets mangled into XML entities.
Seems to be still just as broken as it was on the old site. I get the same garbage when I type something like this: Qu'on me donne six lignes Ã©crites de la main du plus honnÃªte homme, j'y trouverai de quoi le faire pendre. That's supposed to be a quote attributed to Cardinal Richelieu, and is my main rebuttal to people who say they have nothing to hide. It works if I force encoding to ISO-8859-1 as I had to before (see my sig for how it should look).
Does this in any way suggest that we might have a SoylentNews.jp [soylentnews.jp] in the future or are we abandoning all hope for the Japanese /. crowd that might be interested in migrating or at least additionally visiting SN?
E = mcÂ²
F = Tâˆ‡St
I donâ€™t think itâ€™s working for me, but it is for other peopleâ€¦
UTF-8: CafÃ©, soupÃ§on
HTML entities: Café, soupçon
Id rather not HTML-encode everything I type, but at least Duck Duck Go gives me a handy table of HTML entities [duckduckgo.com].
My quotation marks are disappearing, when encoded as HTML.
Isn't setting up a UTF-8 capable front-end and database a pretty basic task these days; something do get done after following a few tutorials and articles?
You create your database with the right settings (e.g. utf8_general_ci collation in MySQL) and make sure that your page scripts don't garble the content entered via the form. Recent versions of PHP and Python can do that just fine, never used Perl though.
Nothing to do with comments, but rather with contents of the article box:
The links [ /dev/random ] [ The Main Page ]on THIS page work.
However, the links [ Soylent ] [ The Main Page ]on other pages do not work for me.
If I turn off CSS, then these links work. So it's the CSS, not the links themselves.
[SeaMonkey 2.5 with JS turned off, here.]
Cripes, you'd think I could come up with a better bug than that. :D
The text below is in Arabic, entered from Firefox on Linux.
It does not display correctly for some unknown reason: