Stories
Slash Boxes
Comments

SoylentNews is people

Log In

Log In

Create Account  |  Retrieve Password


VT100 CSS Fix

Posted by takyon on Monday May 04 2015, @11:50PM (#1196)
0 Comments
Code

blockquote {border-left:3px solid #0F0 !important; padding-left:1em !important;}

/* Submissions */

.data .status0 {background:#FFF !important; color:#080 !important;}
.data .status0 a {color:#080 !important;}
.data .status0 a:visited {color:#0A0 !important;}
.data .status0 a:hover {color:#0C0 !important;}
.data .status1 {background:#800 !important;}
.data .status2 {background:#256625 !important;}

SoylentNews, Unicode, UTF-8, and HTML

Posted by martyb on Friday April 24 2015, @12:08AM (#1176)
0 Comments
Code

NOTE: This is a work-in-progress; read at your own risk/confusion. It is an attempt to gather together bookmarks, tabs, and information pertaining to Unicode, UTF-8, HTML, and 'characters'.

It would seem to be a simple enough question to answer, but things are not always as they seem:

What characters should SoylentNews support?

Motivation: as many of you are aware, one of the early improvements that SoylentNews made to its base source code was to support Unicode characters. (Thanks to the heroic efforts of The Mighty Buzzard.) The underlying code only supported ASCII (American Standard Code for Information Interchange) characters. Which was just fine for as far as it went. It just didn't go far enough for us...

I took on the task of testing our implementation of UTF-8 support. Little did I know what I was getting into! It has been a fascinating journey, indeed!

What is Unicode?

This is taken from What is Unicode?:

Fundamentally, computers just deal with numbers. They store letters and other characters by assigning a number for each one. Before Unicode was invented, there were hundreds of different encoding systems for assigning these numbers. No single encoding could contain enough characters: for example, the European Union alone requires several different encodings to cover all its languages. Even for a single language like English no single encoding was adequate for all the letters, punctuation, and technical symbols in common use.

These encoding systems also conflict with one another. That is, two encodings can use the same number for two different characters, or use different numbers for the same character. Any given computer (especially servers) needs to support many different encodings; yet whenever data is passed between different encodings or platforms, that data always runs the risk of corruption.

Unicode is changing all that!

Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language. The Unicode Standard has been adopted by such industry leaders as Apple, HP, IBM, JustSystems, Microsoft, Oracle, SAP, Sun, Sybase, Unisys and many others. Unicode is required by modern standards such as XML, Java, ECMAScript (JavaScript), LDAP, CORBA 3.0, WML, etc., and is the official way to implement ISO/IEC 10646. It is supported in many operating systems, all modern browsers, and many other products. The emergence of the Unicode Standard, and the availability of tools supporting it, are among the most significant recent global software technology trends.

Here is an excerpt from Wikipedia's entry for Unicode:

Unicode has the explicit aim of transcending the limitations of traditional character encodings, such as those defined by the ISO 8859 standard, which find wide usage in various countries of the world but remain largely incompatible with each other. Many traditional character encodings share a common problem in that they allow bilingual computer processing (usually using Latin characters and the local script), but not multilingual computer processing (computer processing of arbitrary scripts mixed with each other).

Unicode, in intent, encodes the underlying characters—graphemes and grapheme-like units—rather than the variant glyphs (renderings) for such characters. ...

In text processing, Unicode takes the role of providing a unique code point—a number, not a glyph—for each character. In other words, Unicode represents a character in an abstract way and leaves the visual rendering (size, shape, font, or style) to other software, such as a web browser or word processor.

A little more background: There are certain code points in Unicode that are of questionable value in the context of a web page; further, there are code points which are defined to be invalid! And then, just to make things even more interesting, I found a list of invalid characters in an HTML document:

Illegal characters

HTML forbids[6] the use of the characters with Universal Character Set/Unicode code points (in decimal form, preceded by x in hexadecimal form)

  • 0 to 31, except 9, 10, and 13 (C0 control characters)
  • 127 (DEL character)
  • 128 to 159 (x80 – x9F, C1 control characters)
  • 55296 to 57343 (xD800 – xDFFF, the UTF-16 surrogate halves)

The Unicode standard also forbids:

  • 65534 and 65535 (xFFFE – xFFFF), non-characters, related to xFEFF, the byte order mark.

UTF-8; Unicode Transfer Format - 8-bit

Though there are several means by which Unicode characters can be transmitted between contexts, one of the most popular is UTF-8, which is what was chosen for use in SoylentNews.

SoylentNews:

What you see from our site mostly comes via a browser (though we also support Gopher and NNTP; you can have stories e-mailed to you; and we also have an RSS/Atom feeds... wow!)

Our site currently formats web pages as HTML 4.01; here's a representative DOCTYPE:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN"
            "http://www.w3.org/TR/html4/strict.dtd">

At some point in the future we may want to directly support HTML5; ideally nothing should preclude or complicate that effort.

See also:

Obviously, we need not support the invalid code points. (Enumerate them here).

Unicode and UTF-8

So Unicode is a collection of mappings of code-points (numbers) to 'characters'; UTF-8 is a Unicode Transformation Format, 8-bit, used to transmit/encode Unicode code points.

To be continued...

Simple brute force Mandelbrot from 1401 to Basic to Python

Posted by Yog-Yogguth on Friday April 17 2015, @02:07PM (#1163)
3 Comments
Code
Simple brute force Mandelbrot from 1401 to Basic to Python

Read about http://dpeckett.com/turning-the-arduino-uno-into-an-apple who in
turn had been inspired by
http://www.righto.com/2015/03/12-minute-mandelbrot-fractals-on-50.html which
was discussed on Soylent
https://soylentnews.org/article.pl?sid=15/03/26/1555201 and out of pure
curiosity wanted to see what the simple Basic program would look like as
Python.

The Apple II result of the program is very limited by the screen/terminal as is
the direct translation into Python but it is then slightly improved to give a
result fairly close to the original 1978 first ever plot of a Mandelbrot. The
recent 1401 version has about twice as much detail than that.

N.b.:    The Python is written and tested in Python 3.2.3 it does not and will
        not work in Python 2.7.3 (having both in Linux is usually easy as
        they're treated entirely separately as python and python3).

        The Basic isn't tested or anything by me as I didn't write it, it is
        some version that runs on an Apple II emulation.

Integer Basic program to calculate the Mandelbrot fractal:

1 DIM LINE$(31)
2 FOR PY=1 TO 15
3 FOR PX=1 TO 31
4 X=0
5 XT=0
6 Y=0
7 FOR I=0 to 11
8 XT = (X*X)/10 - (Y*Y)/10 + (PX-23)
9 Y = (X*Y)/5 + (10*PY - 75)/8
10 X = XT
11 IF (X/10)*X + (Y/10)*Y >= 400 THEN GOTO 15
12 NEXT I
13 LINE$(PX)="*"
14 GOTO 16
15 LINE$(PX)=" "
16 NEXT PX
17 PRINT LINE$
18 NEXT PY
19 END

Python translation and result:

N.b.:    Unlike in Basic the Python ranges do not include the end points.
        Basic goto structure is rearranged using a break and else statement.
        "px" becomes "px-1" where applicable due to the difference between
        Basic arrays and Python lists (or maybe I'm wrong about that).
        Python printing is a bit more complex for the use case.

        The Python still manages to be fewer lines and is much easier to read
        in my opinion (I don't really know Basic but I don't know all that much
        Python either). I can see why goto is vilified :3

>>> line = [None] * 31
>>> for py in range(1, 16):
...     for px in range(1, 32):
...             x = 0
...             xt = 0
...             y = 0
...             for i in range(0, 12):
...                     xt = (x * x) / 10 - (y * y) / 10 + (px - 23)
...                     y = (x * y) / 5 + (10 * py - 75) / 8
...                     x = xt
...                     if (x / 10) * x + (y / 10) * y >= 400:
...                             line[px-1] = " "
...                             break
...             else:
...                     line[px-1] = "*"
...     for j in range(len(line)):
...             print(line[j], end = "")
...     print("")
...
                    ***
                *   ***
                 **********
               ***********
          * *  ************
          *****************
        ******************
        ******************
          *****************
          * *  ************
               ***********
                 **********
                *   ***
                    ***
                     *
>>>

N.b.:    The output isn't identical most likely because Python automatically
        creates floating point numbers out of divided integers.

Expanded Python example and result:

Increasing the fidelity of the plot isn't as straightforward as one could
assume in Python: ranges can be stepped but don't take floating point values so
one has to do it "manually" with a few more variables. We change things to
start from zero as well, no reason not to.

The smallest preset Linux terminals are 80x24 so we can at least try to make
use of up to 79x24 for display, or more if we imagine printing it as was done
in 1978 and resulted in a picture consisting of at least 31 rows and 68
columns (and if less than half the height scrolls off the screen that's not
much of a problem).

Our plot is distorted because our typeface is taller than it's wide but we can
use that to cram higher detail into the horizontal axis (and this will help
make it look less distorted as well).

Out of laziness I left the px-1 stuff in to save/discard an empty column and
shifting the plot 1 character to the left. One could increase that to something
like px-7 or 8 and lower stepx appropriately for a little bit more detail. If
one wanted to do it properly and avoid the wasted work one could instead shift
the 'for px...' range to (7, 86) or similar.

I think I've goldplated this enough as is, I blame procrastination :3

(I'm posting this in my journal to force myself to stop).

line = [None] * 79
stepx = 0.35
stepy = 0.5
for py in range(0, 30):
    for px in range(0, 79):
        sx = px * stepx
        sy = py * stepy
        x = 0
        xt = 0
        y = 0
        for i in range(0, 25):
            xt = (x * x) / 10 - (y * y) / 10 + (sx - 23)
            y = (x * y) / 5 + (10 * sy - 75) / 8
            x = xt
            if (x / 10) * x + (y / 10) * y >= 400:
                line[px-1] = " "
                break
        else:
            line[px-1] = "*"
    for j in range(len(line)):
        print(line[j], end = "")
    print("")

                                                             **
                                                          ******
                                                          *******
                                                  *        *****
                                                 ***  ***************      *
                                                 ********************** ***
                                                 *************************
                                              ****************************
                                             ****************************** *
                                    * *       *******************************
                               **********   ********************************
                               ***********  ********************************
                              ************* ********************************
                          *************************************************
        *****************************************************************
                          *************************************************
                              ************* ********************************
                               ***********  ********************************
                               **********   ********************************
                                    * *       *******************************
                                             ****************************** *
                                              ****************************
                                                 *************************
                                                 ********************** ***
                                                 ***  ***************      *
                                                  *        *****
                                                          *******
                                                          ******
                                                             **

Not bad for 22 lines of code, easily recognizable :)

April 1st, 2015

Posted by takyon on Thursday April 02 2015, @03:03AM (#1124)
0 Comments
/dev/random

A day that will live in e-fame-y.

Brief inconsequential rant on behalf of H.P. Lovecraft

Posted by Yog-Yogguth on Monday March 30 2015, @04:54PM (#1119)
2 Comments
/dev/random

Today I lapsed in my local front of the ’War On Open Tabs (And Also Windows)’ (or ‘Woot(Aaw)’ for short) and from looking closer at ChipWhisperer-Lite detouring to EEVBlog and on to ESD (ElectroStatic Discharge) issues and Bill Beaty who tricked me into clicking onwards to ‘Vulture Central’ (The Register) where I ended up picking through the bones of the recent news of “Lost White City of the Monkey God Found After 500 Years” in which I came across a reference to a work of Lovecraft written 94 years ago which I didn't remember reading.

However it turned out I had read it under a different name, and on that same Wikipedia page once again yet another venerated authority was quoted slagging off HPL accusing him of being a “racist” or supremacist or anything else non-imaginative even though HPL had plainly explained what he was writing about or in response to (which was unimaginative shitty white village dramas).

At which point I wanted to congratulate E. F. Bleiler for picking through a dead man's brain in order to arouse himself and revel in the sticky glory of it. HPL would surely appreciate the combined or better yet unified necrophiliac morbidity and righteous “holy” bigoted megalomania of Bleiler's actions.

(And where else to offer my sarcastic approval than on my very own journal? Did I piss on his grave? I apologize but in my defense I can't be blamed for not noticing it on account of the latrine placed at the same spot by so many of his peers).

But yes for a while already E. F. Bleiler has been just as dead and if there's anything left after a few generations of macro and microfaunal procreation any maggots can continue his gruesome trade on his own brain :)

Now please excuse me as I make a few bookmarks for perusal in the eternity of time I do not have (and correct an index in an actual book) and close twenty or so tabs… the war must go on.

…and now I can't help but wonder what the world would be like if there was a Mr. and/or Mrs. Crowley-Lovecraft out there, and yes it has to be a double-barreled surname ;)

P.S. Happy Easter!

UTF-8 Regression Testing

Posted by martyb on Sunday March 29 2015, @05:37PM (#1115)
6 Comments
Code

This is just a place to hang some UTF-8 character regression tests.

Foggy/overcast 88.8% eclipse: not much (or any) difference

Posted by Yog-Yogguth on Friday March 20 2015, @10:30AM (#1095)
10 Comments
/dev/random

Title says it all really but I found it surprising that there wasn't much noticeable difference. It started out foggy but the fog had mostly risen to 100% low cloud cover at local eclipse maximum (88.8%) yet if I didn't know and someone told me there was an 88.8% eclipse at that moment I wouldn't have believed them at all. The level of light felt unexceptionally normal. It was more noticeable a while after the local maximum was over as it started to get a little bit brighter but for all purposes it was just like a normal variation caused by weather, no weird shadows, not even any streetlights turning themselves on.

It was so unnoticeable I double-checked the time of the event and my clock. I guess my eyes almost entirely compensated for the small and ever so gradual change. Right now it's not even supposed to be entirely over yet but meh :)

I slept through the last eclipse so maybe this is all completely normal, the lack of difference that is; me sleeping right through “events” is very normal :D

Your Thoughts on the Editorial Process: Editor's Submissions

Posted by n1 on Saturday March 07 2015, @02:03AM (#1060)
11 Comments
Soylent

This editor thinks about things... usually does not reach a conclusion.

Every so often an important story happens, or there are no usable submissions and an editor might elect to circumvent the normal process and set their own story for release. This goes against the normal submissions process, it is not something that happens very often. Site news is the exception to this for obvious reasons.

On the occasions we have released a story as described above -- not waiting for a submission -- there has been no complaints that I am aware of.

Honestly, I do not rush to start releasing my own stories, or to make submissions. Organic and original submissions are far better and what I really want to see more of.

What are your opinions on editors finding and releasing stories this way more often? Especially when it comes to 'breaking news', but more generally also.

[This journal entry is just that, it is not an official SoylentNews RFC or endorsed by any of the staff.]

train carrying crude derails, on fire near river

Posted by fliptop on Monday February 16 2015, @11:11PM (#1022)
0 Comments
News

A CSX train carrying crude oil has derailed and is on fire. At least one house has been burned and it's been reported a railcar fell into the Kanawha River.

Ginsburg was drunk at SOTU

Posted by fliptop on Friday February 13 2015, @06:22PM (#1011)
0 Comments
/dev/random

What one would think would be an embarassing photo of Supreme Court Justice Ruth Bader Ginsburg nodding off at the last SOTU speech was laughed off by her and a colleague in a lighthearted moment before an audience at George Washington University in Washington yesterday:

"The audience – for the most part – is awake because they're bobbing up and down all the time and we sit there stone-faced, sober judges," Ginsburg said. "At least I wasn't 100 percent sober because before we went to the State of the Union we had dinner."

Ginsburg said that Justice Anthony Kennedy was the culprit, bringing wine to dinner.

At least.