Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

Sections

SoylentNews

Log In

Create Account | Retrieve Password

Gift a Subscription

Why Gift

Announcing UTF-8 Support on SoylentNews

posted by NCommander on Sunday February 16 2014, @10:13PM

from the ¡sᴉɥʇ-sǝlpuɐɥ-ʍou-ǝʇᴉs-ǝɥʇ dept.

So, after dealing with a bit of monkeying with the database, I'm pleased to announce that Soylent should (in theory) have support for UTF-8 starting immediately. Now obviously this isn't well tested, so this is your chance to break the site in two, consider the comments below to be "open season" so to speak. I know the comment preview has some issues with UTF-8 (and it only works at all in Plain Text or HTML modes)

For purposes of breakage, anything that breaks the site layout/Reply To/Parent/Moderate buttons, or breaks any comments beyond itself is considered bad. We need to stop those. If you can break it (which shouldn't be hard), you earn a cookie, and I'll get you in the CREDITS file as something awesome.

For comments that are just plain unreadable, moderation will take care of them, and that isn't considered a bug. So go forth and BREAK my minions! ()}:o)↺

Well, we've survived our first week as a functional website, and have yet to go belly up because of it. The speed and growth of our community is staggering to say the least, and we are working hard to get this site fully operational. I'm pleased to announce that a development VM is now available for public consumption, and if you're interested in site development, one should join us in #dev on irc.soylentnews.org. Beyond that though, I've got a few points to address on and updated statistics to share ...

End of Day 1: Systems Update 149 comments

So, as I write this, day one has officially come to an end. I'm still somewhat in shock over it. Last night when I was editing the database to change over hostnames and such, I was thinking, man, it would be great if we got 100 regular users by tomorrow. Turns out I was wrong. By a factor of ten. Holy cow, people. I'm still in a state of disbelief, partially due to the epic turnout, but also because our very modest server hardware hasn't soiled itself from the influx (the numbers are, well, "impressive" is a way to put it). Anyway, I wanted to do a bit of a writeup of where we stand now, what works, and what doesn't. Check it out (and some raw numbers) after the break! Warning, it is a bit lengthy.

Always Use UTF-8 and Always Label Your HTML to Say So 25 comments

canopic jug submitted a story which was the inspiration for:

Helsinki-based software developer, Henri Sivonen, has written a pair of blog posts about UTF-8; why it should be used and how to inform the user agent when it is used.

The first blog post explains problems that can arise when UTF-8 is used without explicitly stating so. Here is a short selection from Why Supporting Unlabeled UTF-8 in HTML on the Web Would Be Problematic:

UTF-8 has won. Yet, Web authors have to opt in to having browsers treat HTML as UTF-8 instead of the browsers Just Doing the Right Thing by default. Why?
I'm writing this down in comprehensive form, because otherwise I will keep rewriting unsatisfactory partial explanations repeatedly as bug comments again and again. For more on how to label, see another writeup.
Legacy Content Won't Be Opting Out
First of all, there is the "Support Existing Content" design principle. Browsers can't just default to UTF-8 and have HTML documents encoded in legacy encodings opt out of UTF-8, because there is unlabeled legacy content, and we can't realistically expect the legacy content to be actively maintained to add opt-outs now. If we are to keep supporting such legacy content, the assumption we have to start with is that unlabeled content could be in a legacy encoding.
In this regard, <meta charset=utf-8> is just like <!DOCTYPE html> and <meta name="viewport" content="width=device-width, initial-scale=1">. Everyone wants newly-authored content to use UTF-8, the No-Quirks Mode (better known as the Standards Mode), and to work well on small screens. Yet, every single newly-authored HTML document has to explicitly opt in to all three, since it isn't realistic to get all legacy pages to opt out.

The second blog post explains how one explicitly communicates to the user agent that UTF-8 is employed in the current document. Always Use UTF-8 & Always Label Your HTML Saying So:

This discussion has been archived. No new comments can be posted.

Announcing UTF-8 Support on SoylentNews | Log In/Create an Account | Top | 57 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Try to break it? Try to break it? (Score: 4, Informative) by mattie_p on Sunday February 16 2014, @10:24PM

by mattie_p (13) on Sunday February 16 2014, @10:24PM (#322) Journal

If I can break something without even trying, surely I'll be able to do it when I do try
- Re:Try to break it? Re:Try to break it? (Score: 1) by StupendousMan on Monday February 17 2014, @02:03AM
  
  by StupendousMan (103) on Monday February 17 2014, @02:03AM (#368)
  
  Try to break it? Use funny characters? bã‚ã‘ã&# 8218;“ æ°´ OD.1.3 Ï€Î¿Î»Î» á¿¶Î½
  
  Parent
  - Re:Try to break it? Re:Try to break it? (Score: 1) by StupendousMan on Monday February 17 2014, @02:05AM
    
    by StupendousMan (103) on Monday February 17 2014, @02:05AM (#369)
    
    My post below has several lines of Japanese and Greek letters, pasted into the "Comment" box.
    
    So, if I try using "Plain old text", the code below cannot be posted.
    
    If I try "HTML formatted", it can't be posted.
    
    If I try "code", it IS posted, but the results -- shown in post above this one -- are bad: one can't see the characters properly.
    
    bã‚ã‘ã‚“
    
    æ°´
    
    OD.1.3 Ï€Î¿Î»Î»& #225;¿¶Î½
    
    Parent
    - Re:Try to break it? Re:Try to break it? (Score: 1) by StupendousMan on Monday February 17 2014, @02:09AM
      
      by StupendousMan (103) on Monday February 17 2014, @02:09AM (#371)
      
      And the post above, I tried "Extrans", but that didn't work, either.
      
      Rats. Can't get Japanese kana or Greek letters.
      
      Parent
      - Re:Try to break it?(Score: 1) by omoc on Monday February 17 2014, @06:28AM
        
        by omoc (39) on Monday February 17 2014, @06:28AM (#453)
        
        According to the preview, Chinese characters don't work either I think
        æˆ‘å¾ˆå¿«ä¹
        
        Parent
- Re:Try to break it? Re:Try to break it? (Score: 1) by yellowantphil on Thursday February 20 2014, @02:53AM
  
  by yellowantphil (2125) on Thursday February 20 2014, @02:53AM (#3096) Homepage
  
  I donâ€™t see a comment button other than â€œreply to this.â€ Hmmâ€¦ I wasnâ€™t willing to use unicode in my comment on another page, because the preview looked wrong, but maybe this comment will look fine after I submit it.
  
  Parent
  - Re:Try to break it? Re:Try to break it? (Score: 2) by mattie_p on Thursday February 20 2014, @03:33AM
    
    by mattie_p (13) on Thursday February 20 2014, @03:33AM (#3129) Journal
    
    Hitting "reply" to the main story is equivalent to "post." But you're welcome to test out almost anything in this thread so we can work the bugs out.
    
    Parent
    - Re:Try to break it? (Score: 1) by yellowantphil on Thursday February 20 2014, @04:19AM
      
      by yellowantphil (2125) on Thursday February 20 2014, @04:19AM (#3156) Homepage
      
      Ah yes, there is the reply button. Thanks.
      I wonder how someone else managed to post in braille, and I canâ€™t even get quote marks to work. Iâ€™ll try HTML.
      
      Parent
BrailleBraille (Score: 5, Interesting) by ticho on Sunday February 16 2014, @10:24PM

by ticho (89) on Sunday February 16 2014, @10:24PM (#323) Homepage Journal

Braille:
⡌⠁⠧⠑ ⠼⠁⠒ ⡍⠜⠇⠑⠹⠰⠎ ⡣⠕⠌
⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠙⠑⠁⠙⠒ ⠞⠕ ⠃⠑⠛⠔ ⠺⠊⠹⠲ ⡹⠻⠑ ⠊⠎ ⠝⠕ ⠙⠳⠃⠞
⠱⠁⠞⠑⠧⠻ ⠁⠃⠳⠞ ⠹⠁⠞⠲ ⡹⠑ ⠗⠑⠛⠊⠌⠻ ⠕⠋ ⠙⠊⠎ ⠃⠥⠗⠊⠁⠇ ⠺⠁⠎
⠎⠊⠛⠝⠫ ⠃⠹ ⠹⠑ ⠊⠇⠻⠛⠹⠍⠁⠝⠂ ⠹⠑ ⠊⠇⠻⠅⠂ ⠹⠑ ⠥⠝⠙⠻⠞⠁⠅⠻⠂
⠁⠝⠙ ⠹⠑ ⠡⠊⠑⠋ ⠍⠳⠗⠝⠻⠲ ⡎⠊⠗⠕⠕⠛⠑ ⠎⠊⠛⠝⠫ ⠊⠞⠲ ⡁⠝⠙
⡎⠊⠗⠕⠕⠛⠑⠰⠎ ⠝⠁⠍⠑ ⠺⠁⠎ ⠛⠕⠕⠙ ⠥⠏⠕⠝ ⠰⡡⠁⠝⠛⠑⠂ ⠋⠕⠗ ⠁⠝⠹⠹⠔⠛ ⠙⠑
⠡⠕⠎⠑ ⠞⠕ ⠏⠥⠞ ⠙⠊⠎ ⠙⠁⠝⠙ ⠞⠕⠲
⡕⠇⠙ ⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲
⡍⠔⠙⠖ ⡊ ⠙⠕⠝⠰⠞ ⠍⠑⠁⠝ ⠞⠕ ⠎⠁⠹ ⠹⠁⠞ ⡊ ⠅⠝⠪⠂ ⠕⠋ ⠍⠹
⠪⠝ ⠅⠝⠪⠇⠫⠛⠑⠂ ⠱⠁⠞ ⠹⠻⠑ ⠊⠎ ⠏⠜⠞⠊⠊⠥⠇⠜⠇⠹ ⠙⠑⠁⠙ ⠁⠃⠳⠞
⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲ ⡊ ⠍⠊⠣⠞ ⠙⠁⠧⠑ ⠃⠑⠲ ⠔⠊⠇⠔⠫⠂ ⠍⠹⠎⠑⠇⠋⠂ ⠞⠕
⠗⠑⠛⠜⠙ ⠁ ⠊⠕⠋⠋⠔⠤⠝⠁⠊⠇ ⠁⠎ ⠹⠑ ⠙⠑⠁⠙⠑⠌ ⠏⠊⠑⠊⠑ ⠕⠋ ⠊⠗⠕⠝⠍⠕⠝⠛⠻⠹
⠔ ⠹⠑ ⠞⠗⠁⠙⠑⠲ ⡃⠥⠞ ⠹⠑ ⠺⠊⠎⠙⠕⠍ ⠕⠋ ⠳⠗ ⠁⠝⠊⠑⠌⠕⠗⠎
⠊⠎ ⠔ ⠹⠑ ⠎⠊⠍⠊⠇⠑⠆ ⠁⠝⠙ ⠍⠹ ⠥⠝⠙⠁⠇⠇⠪⠫ ⠙⠁⠝⠙⠎
⠩⠁⠇⠇ ⠝⠕⠞ ⠙⠊⠌⠥⠗⠃ ⠊⠞⠂ ⠕⠗ ⠹⠑ ⡊⠳⠝⠞⠗⠹⠰⠎ ⠙⠕⠝⠑ ⠋⠕⠗⠲ ⡹⠳
⠺⠊⠇⠇ ⠹⠻⠑⠋⠕⠗⠑ ⠏⠻⠍⠊⠞ ⠍⠑ ⠞⠕ ⠗⠑⠏⠑⠁⠞⠂ ⠑⠍⠏⠙⠁⠞⠊⠊⠁⠇⠇⠹⠂ ⠹⠁⠞
⡍⠜⠇⠑⠹ ⠺⠁⠎ ⠁⠎ ⠙⠑⠁⠙ ⠁⠎ ⠁ ⠙⠕⠕⠗⠤⠝⠁⠊⠇⠲
(The first couple of paragraphs of "A Christmas Carol" by Dickens)
- Re:BrailleRe:Braille (Score: 2, Funny) by Anonymous Coward on Sunday February 16 2014, @10:30PM
  
  by Anonymous Coward on Sunday February 16 2014, @10:30PM (#326)
  
  I can't see the naked lady.
  
  Parent
  - Re:Braille(Score: 1) by stderr on Sunday February 16 2014, @10:59PM
    
    by stderr (11) on Sunday February 16 2014, @10:59PM (#332) Journal
    
    She doesn't appear until chapter 2.
    
    --
    alias sudo="echo make it yourself #" # ... and get off my lawn!
    
    Parent
- Re:BrailleRe:Braille (Score: 4, Insightful) by Nerdfest on Monday February 17 2014, @12:34AM
  
  by Nerdfest (80) on Monday February 17 2014, @12:34AM (#353)
  
  All I see is blonde, brunette, redhead ...
  
  Parent
  - Re:BrailleRe:Braille (Score: 5, Funny) by chromas on Monday February 17 2014, @01:06AM
    
    by chromas (34) on Monday February 17 2014, @01:06AM (#364) Journal
    
    Actually, it's braille, so you feel the blonde, brunette and redhead.
    
    Parent
    - Re:Braille(Score: 0) by Anonymous Coward on Monday February 17 2014, @12:52PM
      
      by Anonymous Coward on Monday February 17 2014, @12:52PM (#637)
      
      Her mouth says "no" but her bumps say "⢀⣲⠢⡔".
      
      Parent
testtest (Score: 1) by Landon on Sunday February 16 2014, @10:28PM

by Landon (45) on Sunday February 16 2014, @10:28PM (#324) Journal

test
- Re:testRe:test (Score: 1) by mtrycz on Sunday February 16 2014, @10:30PM
  
  by mtrycz (60) on Sunday February 16 2014, @10:30PM (#327)
  
  Works like a charm!
  
  --
  In capitalist America, ads view YOU!
  
  Parent
  - Re:test(Score: 1) by NCommander on Sunday February 16 2014, @10:32PM
    
    by NCommander (2) <michael@casadevall.pro> on Sunday February 16 2014, @10:32PM (#328) Homepage Journal
    
    Followup test as Landon says he can't post.
    
    --
    Still always moving
    
    Parent
- Re:test(Score: 1) by regift_of_the_gods on Monday February 17 2014, @07:03PM
  
  by regift_of_the_gods (138) on Monday February 17 2014, @07:03PM (#946)
  
  As Henry Ford would've said, you can post in any character set as long as it's USASCII.
  
  Parent
oooooOOOOoooɹɹɹɐɐɐ&#oooooOOOOoooɹɹɹɐɐɐ&# (Score: 1) by Techwolf on Sunday February 16 2014, @10:29PM

by Techwolf (87) on Sunday February 16 2014, @10:29PM (#325)

oooooOOOOoooɹɹɹɐɐɐɐ --werewolf greeting in upside down. :-)
- Re:oooooOOOOoooɹɹɹɐɐɐRe:oooooOOOOoooɹɹɹɐɐɐ (Score: 1) by ticho on Sunday February 16 2014, @10:34PM
  
  by ticho (89) on Sunday February 16 2014, @10:34PM (#330) Homepage Journal
  
  Alas, mirrored text (done by using ‮) doesn't seem to work, even if it does show up mirrored when pasted in the comment editbox.
  
  Parent
  - yPgQeKwsmLSq(Score: 0) by Anonymous Coward on Wednesday April 09 2014, @05:25AM
    
    by Anonymous Coward on Wednesday April 09 2014, @05:25AM (#28622)
    
    xBFVm6 http://www.qs3pe5zgdxc9iovktapt2dbyppkmkqfz.com/ [qs3pe5zgdx...kmkqfz.com]
    
    Parent
- Let's try some invalid/undefined/malformed UNICODE(Score: 1) by c0lo on Wednesday February 19 2014, @02:58AM
  
  by c0lo (156) on Wednesday February 19 2014, @02:58AM (#2111) Journal
  
  A character beyond what UNICODE restricts itself: &amp#x20FFFF; - shows like
  Something below the max limit, valid, but unassigned 𰀁 - shows like
  Something that's an invalid UNICODE character -  - shows like
  The REPLACEMENT CHARACTER i.e. � - shows like �
  Now, what the above will do inside the storage, I don't know, I'm just trying to go forth and BREAK those minions!
  (hmmm, the "preview" looks like they are totally kicked out, not replaced by the replacement character [wikipedia.org] - won't cry over the loss of it)
  
  --
  https://www.youtube.com/@ProfSteveKeen https://soylentnews.org/~MichaelDavidCrawford
  
  Parent
For historical reasons:For historical reasons: (Score: 5, Funny) by mtrycz on Sunday February 16 2014, @10:32PM

by mtrycz (60) on Sunday February 16 2014, @10:32PM (#329)

You can't parse [X]HTML with regex. Because HTML can't be parsed by regex. Regex is not a tool that can be used to correctly parse HTML. As I have answered in HTML-and-regex questions here so many times before, the use of regex will not allow you to consume HTML. Regular expressions are a tool that is insufficiently sophisticated to understand the constructs employed by HTML. HTML is not a regular language and hence cannot be parsed by regular expressions. Regex queries are not equipped to break down HTML into its meaningful parts. so many times but it is not getting to me. Even enhanced irregular regular expressions as used by Perl are not up to the task of parsing HTML. You will never make me crack. HTML is a language of sufficient complexity that it cannot be parsed by regular expressions. Even Jon Skeet cannot parse HTML using regular expressions. Every time you attempt to parse HTML with regular expressions, the unholy child weeps the blood of virgins, and Russian hackers pwn your webapp. Parsing HTML with regex summons tainted souls into the realm of the living. HTML and regex go together like love, marriage, and ritual infanticide. The cannot hold it is too late. The force of regex and HTML together in the same conceptual space will destroy your mind like so much watery putty. If you parse HTML with regex you are giving in to Them and their blasphemous ways which doom us all to inhuman toil for the One whose Name cannot be expressed in the Basic Multilingual Plane, he comes. HTML-plus-regexp will liquify the nerves of the sentient whilst you observe, your psyche withering in the onslaught of horror. Rege̿̔̉x-based HTML parsers are the cancer that is killing StackOverflow it is too late it is too late we cannot be saved the trangession of a chi͡ld ensures regex will consume all living tissue (except for HTML which it cannot, as previously prophesied) dear lord help us how can anyone survive this scourge using regex to parse HTML has doomed humanity to an eternity of dread torture and security holes using regex as a tool to process HTML establishes a breach between this world and the dread realm of c͒ͪo͛ͫrrupt entities (like SGML entities, but more corrupt) a mere glimpse of the world of regex parsers for HTML will instantly transport a programmer's consciousness into a world of ceaseless screaming, he comes, the pestilent slithy regex-infection will devour your HTML parser, application and existence for all time like Visual Basic only worse he comes he comes do not fight he com̡e̶s, ̕h̵is un̨ho͞ly radiańcé destro҉ying all enli̍̈́̂̈́ghtenment, HTML tags lea͠ki̧n͘g fr̶ǫm ̡yo͟ur eye͢s̸ ̛l̕ik͏e liquid pain, the song of re̸gular expression parsing will extinguish the voices of mortal man from the sphere I can see it can you see ̲͚̖͔̙î̩́t̲͎̩̱͔́̋̀ it is beautiful the final snuffing of the lies of Man ALL IS LOŚ͖̩͇̗̪̏̈́T ALL IS LOST the pon̷y he comes he c̶̮omes he comes the ichor permeates all MY FACE MY FACE ᵒh god no NO NOO̼OO NΘ stop the an*̶͑̾̾̅ͫ͏̙̤g͇̫͛͆̾ͫ̑͆l͖͉̗̩̳̟̍ͫͥͨe̠̅s ͎a̧͈͖r̽̾̈́͒͑e not rè̑ͧ̌aͨl̘̝̙̃ͤ͂̾̆ ZA̡͊͠͝LGΌ ISͮ̂҉̯͈͕̹̘̱ TO͇̹̺ͅƝ̴ȳ̳ TH̘Ë͖́̉ ͠P̯͍̭O̚N̐Y̡ H̸̡̪̯ͨ͊̽̅̾̎Ȩ̬̩̾͛ͪ̈́̀́͘ ̶̧̨̱̹̭̯ͧ̾ͬC̷̙̲̝͖ͭ̏ͥͮ͟Oͮ͏̮̪̝͍M̲̖͊̒ͪͩͬ̚̚͜Ȇ̴̟̟͙̞ͩ͌͝S ̨̥̫͎̭ͯ̿̔̀ͅ
- Re:For historical reasons:Re:For historical reasons: (Score: 1) by Pav on Monday February 17 2014, @10:54AM
  
  by Pav (114) on Monday February 17 2014, @10:54AM (#561)
  
  So Tony the Pony brings us UTF-8... I would have expected a unicorn. :-/
  
  Parent
  - Re:For historical reasons:(Score: 0) by Anonymous Coward on Thursday February 20 2014, @09:10AM
    
    by Anonymous Coward on Thursday February 20 2014, @09:10AM (#3277)
    
    UTF = Unicorn Tony Form?
    䍶
    
    Parent
- Re:For historical reasons:(Score: 1) by slartibartfastatp on Monday February 17 2014, @03:37PM
  
  by slartibartfastatp (588) on Monday February 17 2014, @03:37PM (#779) Journal
  
  Then we get to know why slashdot won't support UTF-8...
  
  Parent
- Re:For historical reasons:(Score: 2, Funny) by edIII on Monday February 17 2014, @06:33PM
  
  by edIII (791) on Monday February 17 2014, @06:33PM (#927)
  
  Wow. Your comment made Hal look lucid in his last moments :)
  "Daisy...."
  
  --
  Technically, lunchtime is at any moment. It's just a wave function.
  
  Parent
Clean EnglishClean English (Score: 5, Insightful) by bryan on Sunday February 16 2014, @11:18PM

by bryan (29) <bryan@pipedot.org> on Sunday February 16 2014, @11:18PM (#337) Homepage Journal

I know UTF-8 is one of those "features" that many people on slashdot have missed for a long time, but I thought most of that was for simple additions like the euro/pound/yen symbols and such.
Am I the only one that would prefer not to see non-english text mixed in with the comments?
Then again, I'd fully support an exception for Klingon. Maybe elvish too.
- Re:Clean EnglishRe:Clean English (Score: 1) by clone141166 on Monday February 17 2014, @01:31AM
  
  by clone141166 (59) on Monday February 17 2014, @01:31AM (#366)
  
  Yeah it may mean some unreadable posts, but as pointed out in the story text moderation should take care of that. There was even talk of a new mod category, something like "-1 Unintelligible" :P
  
  Parent
  - Re:Clean EnglishRe:Clean English (Score: 1) by turtledawn on Monday February 17 2014, @06:16AM
    
    by turtledawn (136) <{turtledawn} {at} {gmail.com}> on Monday February 17 2014, @06:16AM (#447)
    
    I rather like that idea for a mod.
    
    Parent
    - Re:Clean English(Score: 1) by Spook brat on Wednesday February 19 2014, @03:57PM
      
      by Spook brat (775) on Wednesday February 19 2014, @03:57PM (#2544) Journal
      
      There's some room for abuse there; 'unintelligible' ranges from "I don't speak that language" to Time Cube to "poster doesn't make cogent argument". Since that last one borders closely on "I don't agree with this" I hope meta-moderation will keep that in check.
      
      --
      Travel the galaxy! Meet fascinating life forms... And kill them [schlockmercenary.com]
      
      Parent
- Re:Clean English(Score: 1) by FatPhil on Tuesday February 18 2014, @11:53AM
  
  by FatPhil (863) <reversethis-{if.fdsa} {ta} {tnelyos-cp}> on Tuesday February 18 2014, @11:53AM (#1546) Homepage
  
  There will be stories about Finns with äs in their names. Swedes with Ås in their names, and maybe, just maybe, stores about icelandic volcanoes with ðs in their name. Whilst I like the idea of everyone agreeing to use English language, that doesn't mean every word they'll be typing will be English.
  Note - this post isn't UTF-8, this is plain old ASCII, I used the &entity; syntax.
  
  --
  Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
  
  Parent
- Re:Clean English(Score: 1) by xaxa on Tuesday February 18 2014, @07:49PM
  
  by xaxa (1489) on Tuesday February 18 2014, @07:49PM (#1848)
  
  At present, there's nothing stopping someone from writing in German, French, Spanish, or many other languages using Latin letters (more or less). Since it doesn't happen, I don't think we need to worry.
  (But if I post in a quote with an em-dash: â€” or some â€œproperâ€ â€˜quotes,â€™ maybe an arrow â†’ or temperature (0Â°C) it won't make a mess.)
  (Nope, looks like it's going to make a mess.)
  
  Parent
- Re:Clean English(Score: 1) by M. Baranczak on Wednesday February 19 2014, @02:38AM
  
  by M. Baranczak (1673) on Wednesday February 19 2014, @02:38AM (#2101)
  
  Well, why shouldn't we be able to post non-English comments?
  And even if you don't, it would be nice to be able to spell some people's names correctly.
  Lech WaÅ‚Ä™sa [wikipedia.org]
  FriÃ°rik ÃžÃ³r FriÃ°riksson [wikipedia.org]
  Jaroslav HaÅ¡ek [wikipedia.org]
  
  Parent
- Re:Clean EnglishRe:Clean English (Score: 1) by c0lo on Wednesday February 19 2014, @03:12AM
  
  by c0lo (156) on Wednesday February 19 2014, @03:12AM (#2115) Journal
  
  Am I the only one that would prefer not to see non-english text mixed in with the comments?
  How do you consider the use of â„ƒ or ãŽž units: are they proper English or part of the non-english text?
  Oooopps. A copy/paste of these characters from the "Character map" (on Ubuntu/Firefox) straight into the reply text box results in them showing mangled in the preview (no mater if plain-old text or HTML).
  To provide the context: I was enquiring about the use of these ℃ and ㎞ units.
  
  --
  https://www.youtube.com/@ProfSteveKeen https://soylentnews.org/~MichaelDavidCrawford
  
  Parent
  - Re:Clean English(Score: 1) by maxwell demon on Thursday February 20 2014, @08:31AM
    
    by maxwell demon (1608) on Thursday February 20 2014, @08:31AM (#3264) Journal
    
    The bytes are probably interpreted by slashcode as latin1 instead of utf8.
    
    --
    The Tao of math: The numbers you can count are not the real numbers.
    
    Parent
Testing UTF8 vs UTF8MB4 supportTesting UTF8 vs UTF8MB4 support (Score: 3, Interesting) by Maow on Monday February 17 2014, @02:35AM

by Maow (8) on Monday February 17 2014, @02:35AM (#379) Homepage

fooðŒ†bar
Hmm, not seeing a preview. Using Plain Old Text mode. Switching to HTML next...
OK, HTML Formatted does give a preview.
But the 4 horizontally stacked lines do not represent properly in preview.
So, it's a bug.
Here's a guide to getting FULL UTF8MB4 support in the DB:
http://mathiasbynens.be/notes/mysql-utf8mb4 [mathiasbynens.be]
OH, and now I see preview in Plain Old Text, having added more than the "foo_bar" that was originally present.
And, to be certain, I removed everything but the "foo_bar" and got ... no preview in Plain Old Text.
I like my cookies crispy with chocolate chips!
- Re:Testing UTF8 vs UTF8MB4 supportRe:Testing UTF8 vs UTF8MB4 support (Score: 1) by Popsikle on Monday February 17 2014, @03:10AM
  
  by Popsikle (77) on Monday February 17 2014, @03:10AM (#386) Homepage
  
  no can post.
  
  Parent
  - Re:Testing UTF8 vs UTF8MB4 supportRe:Testing UTF8 vs UTF8MB4 support (Score: 1) by Popsikle on Monday February 17 2014, @03:13AM
    
    by Popsikle (77) on Monday February 17 2014, @03:13AM (#390) Homepage
    
    ï¼‡The Rossâ€“Littlewood paradox[clarification needed] (also known as the balls and vase problem or the ping pong ball problem) is a hypothetical problem in abstract mathematics and logic designed to illustrate the seemingly paradoxical, or at least non-intuitive, nature of infinity. More specifically, like the Thomson's lamp paradox, the Rossâ€“Littlewood paradox tries to illustrate the conceptual difficulties with the notion of a supertask, in which an infinite number of tasks are completed sequentially.[1] The problem was originally described by mathematician John E. Littlewood in his 1953 book Littlewood's Miscellany, and was later expanded upon by Sheldon Ross in his 1988 book A First Course in Probability.
    
    Parent
    - Re:Testing UTF8 vs UTF8MB4 support(Score: 1) by Popsikle on Monday February 17 2014, @03:16AM
      
      by Popsikle (77) on Monday February 17 2014, @03:16AM (#392) Homepage
      
      "MÎ±gÄ‘Î±lÑÐ¸â€²s ÄÎ±ÑÎºÐ¸ÑÑs" is a bad bad string about a bad mans darkness.
      
      Parent
Oh Kannada...(Score: 1) by weilawei on Monday February 17 2014, @05:39AM

by weilawei (109) on Monday February 17 2014, @05:39AM (#413)

Oh bleep. There goes the neighborhood (U+0CCB). à³‹
Comment Below Threshold

▷ nice! ◁(Score: -1, Redundant) by Anonymous Coward on Monday February 17 2014, @05:46AM

by Anonymous Coward on Monday February 17 2014, @05:46AM (#420)

Although I can't think of an example right now, there was moments when the lack of unicode was problematic earlier for some discussions.
When writing letters and symbols outside of the normal keyboard mapping, what is the most often used method? Is it with AltGr key (like AltGr-M for µ), or with a Compose key, a keycombo to enter unicode char number (ctrl-shift-u ?), or simply cut&paste from a character table application?
A⃣ B⃣ C⃣
There will probably be use of lots of unicode in discussions about languages etc later on. I might add that I think UTF-16 would be preferable.
Let me sing a little song to celebrate this event: ♩♫♬♯♪♩♫♬♯♩♪♫♬♯♪
I'm not sure it is my web browsers fault or not, but it didn't work if I wrote the characters here directly, only if I entered them html encoded ( â™©â™«â™¬â™¯â™ªâ™©â™«ð…Ÿâ™¬â™¯â™ªð…¡ð…žâ™©â™«ð…Ÿâ™ ¬â™¯â™ªð…¡ð…ž) ��
Darn(Score: 1) by iNaya on Monday February 17 2014, @07:11AM

by iNaya (176) on Monday February 17 2014, @07:11AM (#469)

I don't seem to be able to post anything with non-Latin characters.
TestTest (Score: 1) by k8n on Monday February 17 2014, @09:15AM

by k8n (295) on Monday February 17 2014, @09:15AM (#519)

Preview doen't work... neither text nor html
text
Ñ‚ÐµÐºÑÑ‚
ÎºÎµÎ¯Î¼ÎµÎ½Î¿
è¯¾æ–‡
èª²æ–‡
ãƒ†ã‚ã‚¹ãƒˆ
Õ¿Õ¥Ö„Õ½Õ¿
mÉ™tn
à¦¶à¦¿à¦°à§‹à¦¨à¦¾à¦®
áƒ¢áƒ”áƒ¥áƒ¡áƒ¢áƒ˜
àªªàª¾àª
tÃ¨ks
à¤ªà¤¾à¤
à²ªà² à³à²¯
áž¢ážáŸ’ážáž”áž‘
ì›ë³¸
à»€àº™àº·à»‰àºà»ƒàº™
à¤®à¤œà¤•à¥‚à¤°
à®‰à®°à¯ˆ
à°Ÿà±†à°•à±à°¸à±à°Ÿà±
à¸‚à¹‰à¸à¸„à¸§à¸²à¸¡
vÄƒn báº£n
Ñ‚ÑÐºÑÑ‚
Ù…ØªÙ†
Ù†Øµ
×˜×§×¡×˜
- Re:Test(Score: 1) by regift_of_the_gods on Monday February 17 2014, @06:38PM
  
  by regift_of_the_gods (138) on Monday February 17 2014, @06:38PM (#930)
  
  OÃ¹ trouver un trÃ¨s bon restaurant Ã Paris qui sert des Å“ufs pour le petit dÃ©jeuner ?
  
  Parent
So, how does one enter utf-8(Score: 1) by Eunuchswear on Monday February 17 2014, @01:48PM

by Eunuchswear (525) on Monday February 17 2014, @01:48PM (#687) Journal

If I just type it in I get the traditional "utf8 shows as 8859-1" crap:
Â«Les accents ont une fonction en franÃ§ais, ne serait-ce que pour distinguer,
dans un hÃ´pital psychiatrique, entre les internes et les internÃ©s.Â»
- Annie Bourret

--
Watch this Heartland Institute video [youtube.com]
Could be handy for physics storiesCould be handy for physics stories (Score: 1) by VLM on Monday February 17 2014, @02:19PM

by VLM (445) on Monday February 17 2014, @02:19PM (#711)

âˆ‡.D=p
âˆ‡.B=0
âˆ‡xE=-âˆ‚B/âˆ‚t
âˆ‡xH= âˆ‚D/âˆ‚t+j
- Something doesn't look right.(Score: 2, Funny) by VLM on Monday February 17 2014, @02:25PM
  
  by VLM (445) on Monday February 17 2014, @02:25PM (#720)
  
  Maxwells equations always sounded better in their original Klingon.
  Its supposed to be pretty tame stuff, div B equals zero and all that. Net charge of magnetic field in space doesn't exist aka net flow in and out of a closed surface is zero as long as monopoles don't exist, that kind of thing.
  
  Parent
Well, let's see if this really worksWell, let's see if this really works (Score: 2, Informative) by stormwyrm on Monday February 17 2014, @04:20PM

by stormwyrm (717) on Monday February 17 2014, @04:20PM (#814) Journal

Doesn't look like it does. Been trying to type Japanese text but it doesn't seem to work. Japanese text in the subject gets mangled into XML entities.

--
Numquam ponenda est pluralitas sine necessitate.
- Re:Well, let's see if this really works(Score: 1) by stormwyrm on Monday February 17 2014, @11:32PM
  
  by stormwyrm (717) on Monday February 17 2014, @11:32PM (#1176) Journal
  
  Seems to be still just as broken as it was on the old site. I get the same garbage when I type something like this: Qu'on me donne six lignes Ã©crites de la main du plus honnÃªte homme, j'y trouverai de quoi le faire pendre. That's supposed to be a quote attributed to Cardinal Richelieu, and is my main rebuttal to people who say they have nothing to hide. It works if I force encoding to ISO-8859-1 as I had to before (see my sig for how it should look).
  
  --
  Numquam ponenda est pluralitas sine necessitate.
  
  Parent
- Re:Well, let's see if this really works(Score: 0) by Anonymous Coward on Thursday February 20 2014, @01:07AM
  
  by Anonymous Coward on Thursday February 20 2014, @01:07AM (#3027)
  
  <p>æ—¥æœ¬è&#17 0;žã®æ–‡å&#17 3;—ã‚’ä½¿&#2 27;£ã¦è¦‹ã&#19 0;ã—ã‚‡</p>
  
  Parent
SoylentNews.jp(Score: 2, Interesting) by DarkMorph on Monday February 17 2014, @07:07PM

by DarkMorph (674) on Monday February 17 2014, @07:07PM (#948)

Does this in any way suggest that we might have a SoylentNews.jp [soylentnews.jp] in the future or are we abandoning all hope for the Japanese /. crowd that might be interested in migrating or at least additionally visiting SN?
Trying Unicode againTrying Unicode again (Score: 1) by yellowantphil on Sunday February 23 2014, @01:38AM

by yellowantphil (2125) on Sunday February 23 2014, @01:38AM (#5034) Homepage

E = mcÂ²
F = Tâˆ‡St
I donâ€™t think itâ€™s working for me, but it is for other peopleâ€¦
- Re:Trying Unicode again(Score: 1) by yellowantphil on Sunday February 23 2014, @01:52AM
  
  by yellowantphil (2125) on Sunday February 23 2014, @01:52AM (#5037) Homepage
  
  UTF-8: CafÃ©, soupÃ§on
  HTML entities: Café, soupçon
  Id rather not HTML-encode everything I type, but at least Duck Duck Go gives me a handy table of HTML entities [duckduckgo.com].
  My quotation marks are disappearing, when encoded as HTML.
  
  Parent
Sadly, It does not appear to work.(Score: 1) by PrinceVince on Monday February 24 2014, @01:27AM

by PrinceVince (2801) on Monday February 24 2014, @01:27AM (#5429)

Isn't setting up a UTF-8 capable front-end and database a pretty basic task these days; something do get done after following a few tutorials and articles?
You create your database with the right settings (e.g. utf8_general_ci collation in MySQL) and make sure that your page scripts don't garble the content entered via the form. Recent versions of PHP and Python can do that just fine, never used Perl though.
A very small bug(Score: 1) by Reziac on Friday March 07 2014, @02:25AM

by Reziac (2489) on Friday March 07 2014, @02:25AM (#12407) Homepage

Nothing to do with comments, but rather with contents of the article box:
The links [ /dev/random ] [ The Main Page ]
on THIS page work.
However, the links [ Soylent ] [ The Main Page ]
on other pages do not work for me.
If I turn off CSS, then these links work. So it's the CSS, not the links themselves.
[SeaMonkey 2.5 with JS turned off, here.]
Cripes, you'd think I could come up with a better bug than that. :D

--
And there is no Alkibiades to come back and save us from ourselves.
Does not work for Arabic(Score: 1) by kbahey on Wednesday March 12 2014, @12:55AM

by kbahey (1147) on Wednesday March 12 2014, @12:55AM (#14960) Homepage

The text below is in Arabic, entered from Firefox on Linux.
It does not display correctly for some unknown reason:
Ø§Ù„Ø¹Ø±Ø¨ÙŠØ©

--
2bits.com, Inc: Drupal, WordPress, and LAMP performance tuning [2bits.com].

Moderator Help

SoylentNews

SoylentNews is people

Navigation

Sections

SoylentNews

Log In

Related Links

Announcing UTF-8 Support on SoylentNews

Related Stories

Try to break it? Try to break it? (Score: 4, Informative) by mattie_p on Sunday February 16 2014, @10:24PM

Re:Try to break it? Re:Try to break it? (Score: 1) by StupendousMan on Monday February 17 2014, @02:03AM

Re:Try to break it? Re:Try to break it? (Score: 1) by StupendousMan on Monday February 17 2014, @02:05AM

Re:Try to break it? Re:Try to break it? (Score: 1) by StupendousMan on Monday February 17 2014, @02:09AM

Re:Try to break it?(Score: 1) by omoc on Monday February 17 2014, @06:28AM

Re:Try to break it? Re:Try to break it? (Score: 1) by yellowantphil on Thursday February 20 2014, @02:53AM

Re:Try to break it? Re:Try to break it? (Score: 2) by mattie_p on Thursday February 20 2014, @03:33AM

Re:Try to break it? (Score: 1) by yellowantphil on Thursday February 20 2014, @04:19AM

BrailleBraille (Score: 5, Interesting) by ticho on Sunday February 16 2014, @10:24PM

Re:BrailleRe:Braille (Score: 2, Funny) by Anonymous Coward on Sunday February 16 2014, @10:30PM

Re:Braille(Score: 1) by stderr on Sunday February 16 2014, @10:59PM

Re:BrailleRe:Braille (Score: 4, Insightful) by Nerdfest on Monday February 17 2014, @12:34AM

Re:BrailleRe:Braille (Score: 5, Funny) by chromas on Monday February 17 2014, @01:06AM

Re:Braille(Score: 0) by Anonymous Coward on Monday February 17 2014, @12:52PM

testtest (Score: 1) by Landon on Sunday February 16 2014, @10:28PM

Re:testRe:test (Score: 1) by mtrycz on Sunday February 16 2014, @10:30PM

Re:test(Score: 1) by NCommander on Sunday February 16 2014, @10:32PM

Re:test(Score: 1) by regift_of_the_gods on Monday February 17 2014, @07:03PM

oooooOOOOoooɹɹɹɐɐɐ&#oooooOOOOoooɹɹɹɐɐɐ&# (Score: 1) by Techwolf on Sunday February 16 2014, @10:29PM

Re:oooooOOOOoooɹɹɹɐɐɐRe:oooooOOOOoooɹɹɹɐɐɐ (Score: 1) by ticho on Sunday February 16 2014, @10:34PM

yPgQeKwsmLSq(Score: 0) by Anonymous Coward on Wednesday April 09 2014, @05:25AM

Let's try some invalid/undefined/malformed UNICODE(Score: 1) by c0lo on Wednesday February 19 2014, @02:58AM

For historical reasons:For historical reasons: (Score: 5, Funny) by mtrycz on Sunday February 16 2014, @10:32PM

Re:For historical reasons:Re:For historical reasons: (Score: 1) by Pav on Monday February 17 2014, @10:54AM

Re:For historical reasons:(Score: 0) by Anonymous Coward on Thursday February 20 2014, @09:10AM

Re:For historical reasons:(Score: 1) by slartibartfastatp on Monday February 17 2014, @03:37PM

Re:For historical reasons:(Score: 2, Funny) by edIII on Monday February 17 2014, @06:33PM

Clean EnglishClean English (Score: 5, Insightful) by bryan on Sunday February 16 2014, @11:18PM

Re:Clean EnglishRe:Clean English (Score: 1) by clone141166 on Monday February 17 2014, @01:31AM

Re:Clean EnglishRe:Clean English (Score: 1) by turtledawn on Monday February 17 2014, @06:16AM

Re:Clean English(Score: 1) by Spook brat on Wednesday February 19 2014, @03:57PM

Re:Clean English(Score: 1) by FatPhil on Tuesday February 18 2014, @11:53AM

Re:Clean English(Score: 1) by xaxa on Tuesday February 18 2014, @07:49PM

Re:Clean English(Score: 1) by M. Baranczak on Wednesday February 19 2014, @02:38AM

Re:Clean EnglishRe:Clean English (Score: 1) by c0lo on Wednesday February 19 2014, @03:12AM

Re:Clean English(Score: 1) by maxwell demon on Thursday February 20 2014, @08:31AM

Testing UTF8 vs UTF8MB4 supportTesting UTF8 vs UTF8MB4 support (Score: 3, Interesting) by Maow on Monday February 17 2014, @02:35AM

Re:Testing UTF8 vs UTF8MB4 supportRe:Testing UTF8 vs UTF8MB4 support (Score: 1) by Popsikle on Monday February 17 2014, @03:10AM

Re:Testing UTF8 vs UTF8MB4 supportRe:Testing UTF8 vs UTF8MB4 support (Score: 1) by Popsikle on Monday February 17 2014, @03:13AM

Re:Testing UTF8 vs UTF8MB4 support(Score: 1) by Popsikle on Monday February 17 2014, @03:16AM

Oh Kannada...(Score: 1) by weilawei on Monday February 17 2014, @05:39AM

Comment Below Threshold

▷ nice! ◁(Score: -1, Redundant) by Anonymous Coward on Monday February 17 2014, @05:46AM

Darn(Score: 1) by iNaya on Monday February 17 2014, @07:11AM

TestTest (Score: 1) by k8n on Monday February 17 2014, @09:15AM

Re:Test(Score: 1) by regift_of_the_gods on Monday February 17 2014, @06:38PM

So, how does one enter utf-8(Score: 1) by Eunuchswear on Monday February 17 2014, @01:48PM

Could be handy for physics storiesCould be handy for physics stories (Score: 1) by VLM on Monday February 17 2014, @02:19PM

Something doesn't look right.(Score: 2, Funny) by VLM on Monday February 17 2014, @02:25PM

Well, let's see if this really worksWell, let's see if this really works (Score: 2, Informative) by stormwyrm on Monday February 17 2014, @04:20PM

Re:Well, let's see if this really works(Score: 1) by stormwyrm on Monday February 17 2014, @11:32PM

Re:Well, let's see if this really works(Score: 0) by Anonymous Coward on Thursday February 20 2014, @01:07AM

SoylentNews.jp(Score: 2, Interesting) by DarkMorph on Monday February 17 2014, @07:07PM

Trying Unicode againTrying Unicode again (Score: 1) by yellowantphil on Sunday February 23 2014, @01:38AM

Re:Trying Unicode again(Score: 1) by yellowantphil on Sunday February 23 2014, @01:52AM

Sadly, It does not appear to work.(Score: 1) by PrinceVince on Monday February 24 2014, @01:27AM

A very small bug(Score: 1) by Reziac on Friday March 07 2014, @02:25AM

Does not work for Arabic(Score: 1) by kbahey on Wednesday March 12 2014, @12:55AM