Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Wednesday April 15 2020, @10:11PM   Printer-friendly
from the rushin'-hacks dept.

[20200416_143747 UTC: Update 2: Added an Example to make clear what the problem was, and added 2 headings subsequent to the example to better organize the information. --martyb]
[20200416_005831 UTC; Update 1: Updated title and corrected spelling of balanceTags(). --martyb]

Ooops! Things should be working correctly, now.

tl;dr: Back on March 20th, someone tripped over a bug that appears to be in the balanceTags() routine in our Perl code. I found a way to made a quick fix to prevent its happening again, but the fix was missing a couple steps. I caught and fixed one of them, but only now just handled the other.

Workaround: When writing a comment, writing or editing a journal entry, or when submitting a story, use "DEL" instead of "STRIKE" to make text look like this.

This story is the result of something I learned in the process: properly notify the community of any changes to the site!

Symptom: It all started with a tag (i.e. HTML element) error in this comment in a user's journal where the user coded a <strike> tag, but neglected to provide a matching </strike> tag.

Example: Let's look back to the original comment which manifested this bug. Here's the latter part of it, after being corrected:

Looks who's projecting. Consider your phrase "steal jobs and send them overseas for cheap/free labor" (no such thing as free labor). That helps billions of poor people improve their lives. Yet here you are, selfishly obsessing over your developed world pricing power (with some very unempathic label spewing) rather than display the alleged empathy or morality that you claim to be concerned about.

Your empathy is nonexistent and your morals are bankrupt - definitely not the sort of person I should be taking advice from!

The problem is that there was no closing <strike> after the word "cheap", so it looked like this:

Looks who's projecting. Consider your phrase "steal jobs and send them overseas for cheap/free labor" (no such thing as free labor). That helps billions of poor people improve their lives. Yet here you are, selfishly obsessing over your developed world pricing power (with some very unempathic label spewing) rather than display the alleged empathy or morality that you claim to be concerned about.

Your empathy is nonexistent and your morals are bankrupt - definitely not the sort of person I should be taking advice from!

If that was all that happened it would be ugly, but tolerable. Unfortunately, every single character following it on the page was struck through, too. Not Good™.

Immediate Fix: To my knowledge there was only one way to rectify the immediate issue: manually go into the DB and insert the missing tag. This I was able to do quite quickly, but I still saw a problem.

More to Come: Anyone who saw this comment discussion, either at this moment, or who happened upon it later, would see an opportunity to intentionally leave a hanging tag and thus disfigure the site. Trolls gotta troll. So, I made the fix and noted same in this comment reply.

So, an instance of the problem was fixed, but now what? There's a "proper" way to do it, and there is another way to get the same effect that can be quickly implemented. I chose the latter.

Perl Code: Normally, such HTML errors in a user's comment or journal entry (or an editor's edit of a story!) are caught and handled by a routine in our Perl code: balanceTags(). The code looks though all the tags, with whatever nesting is present, detects where tags do not have a required closing tag,and silently inserts it into the text that makes it into the DB. It's rather hairy code because it also needs to handle: extra closing tags, mis-matched closing tags (e.g.: <b> bold <bold and italic> </b> </i>), mistyped or otherwise non-existent tags, restricting what tags are supported, and custom-created site tags! Whew!

Further, to fix it in the Perl code means going through the process of: checking the code out from GitHub, understanding the code, making the change, compiling the change, testing the change, (after rolling it out to our dev server), and then -- if all looks good -- rolling the change out to our production servers. And, of course, nobody was around at the moment who could support such activities even if it were an easy coding change (and it is not!)

Expediency: I realized there was another approach which would mitigate the problem -- not requiring Perl coding changes -- but could still prevent its recurrence: changing the value of a "Site Variable" (aka "site var").

Rehash Primer: Now I need to step back for a moment and explain a couple things. The code for SoylentNews.org is a fork of ancient Slashcode that was put up on GitHub. Slashcode was implemented using a Model View Controller design. There is a clear demarcation between what is done where and at what level.

Templates: As part of its implementation, the SlashCode implemented "Templates" which generate the HTML pages that get sent to the browser and act as an interface between the code and the user. As far as I know, every page you see on the site comes by way of a template. Each template may, in turn, make use of other templates. Templates can make calls to underlying Perl code. This is where the site implements the heavy lifting of talking to the database (DB), creating e-mails, and other closer-to-the-metal activities. The template language (from personal inspection; I have yet to find an official document as to its syntax and semantics) presents what appears to be a simple, macro-capable language. The templates are stored in the DB and loaded into memory when the site is started. An advantage of this is that changes to templates can be made "on the fly" using a template editor (which is, itself, a template!) There is one caveat: for the changes to take effect, processes on the front-end servers need to be "bounced", i.e. restarted, so the changes are loaded into memory from the (updated) DB.

Site Variables: There are some parameters whose values affect the site's operations: Name of the site, domain name of the site, the name of the Anonymous User account, ... it goes on and on and on. There are no less than 750 site variables! And, as many things that grew beyond their initial construction, there is no simple way to look for what site vars might be appropriate to any given situation. One is just expected to know what they are and what they do and how they do it. Simple enough approach when they first started, I guess. A search capability would be very nice to have, but it will take some coding to make that happen, so it has become just another of the several changes that would be nice to make to the site.

So, back to the matter at hand, I knew about the "approvedtags" site var which lists all tags which are permitted to be used on the site. Sure enough, "STRIKE" was in there! And, I saw that "DEL" was in there, too. Does "DEL" have the same problem? I tried a quick test comment on our development server and it revealed that balanceTags() properly handled a hanging <DEL> without a matching </DEL>. Yay! I removed "STRIKE" from the "approvedtags" list, saved the change, bounced the front-end servers, and breathed a sigh of relief.

All was good, until someone asked in a footnote to a comment why do we still list STRIKE as being a permitted tag for comments? What? I double-checked and verified that "STRIKE" was no longer listed in "approvedtags". What is going on? So, I commenced searching and finally discovered another site var: "approvedtags_visible" which contains the list of tags that is presented to the user as being available. And, sure enough, "STRIKE" was in that list. Grrr! I removed "STRIKE" from "approvedtags_visible", saved the changes, and saw no further issues mentioned there. Finally!

Or so I thought. Did you see what was missed? The site vars were now correct and up-to-date. The changes were saved to the DB. But... those changes existed only in the DB. Still needed to 'bounce' the front end servers for the changes to take effect. So, that entailed a quick SSH to our servers, running the bounce scripts, and verifying that "STRIKE" was truly and properly removed from the tags presented to the user as being available for use, and that anyone trying to use <STRIKE>, anyway, would discover it did not work.

Conclusions: So, here are some lessons learned:

  1. When you want to use a <STRIKE> tag, use <DEL>, instead.
  2. There is no assurance that reporting a problem in the comments will be noticed.
  3. Please report site issues with an e-mail sent to admin (at) soylentnews (dot) org.
  4. In addition to sending an e-mail, mention it in the "#dev" channel of our IRC server.
  5. This particular issue should now be well and truly fixed. Please report any problems you may discover with it.
  6. When communicating changes made to the site, a mention in the comments does not suffice.
  7. --martyb

[Janrinok says: TL:DR Martyb fixed it, OK, OK, I have read it....]


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by Muad'Dave on Thursday April 16 2020, @03:35PM (6 children)

    by Muad'Dave (1413) on Thursday April 16 2020, @03:35PM (#983630)

    I guess I still don't understand why you removed the tag instead of just fixing it. How was strike any different than del? I see that strike is deprecated [mozilla.org], but you have <del> but not <s> [mozilla.org]?

    Del does this, but s does nothing.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by maxwell demon on Thursday April 16 2020, @04:40PM (1 child)

    by maxwell demon (1608) on Thursday April 16 2020, @04:40PM (#983664) Journal

    Because the code as is supports del, but not strike. And the tag names are hard-coded in that function.

    --
    The Tao of math: The numbers you can count are not the real numbers.
    • (Score: 2) by Muad'Dave on Friday April 17 2020, @11:40AM

      by Muad'Dave (1413) on Friday April 17 2020, @11:40AM (#984074)

      Thank you - that's the answer I was looking for.

  • (Score: 3, Informative) by martyb on Thursday April 16 2020, @07:09PM (3 children)

    by martyb (76) Subscriber Badge on Thursday April 16 2020, @07:09PM (#983738) Journal

    I guess I still don't understand why you removed the tag instead of just fixing it. How was strike any different than del? I see that strike is deprecated [mozilla.org] [mozilla.org], but you have <del> but not <s> [mozilla.org] [mozilla.org]?

    Del does this, but s does nothing.

    Over the years, new HTML Tags have been proposed and implemented. Any code that is designed to deal with HTML Tags needs to be able to handle whatever new Tags may come along. "It's difficult to predict things, especially the future!" So, rather than accept everything that comes along and then start blacklisting [wikipedia.org] new, unsupported Tags as they appear, the code instead uses a form of whitelisting [wikipedia.org] to specifically enumerate the Tags it can properly deal with.

    It is important to note that there are two different kinds of "code" here.

    On the one hand, there is the Perl code which actually implements the guts of balanceTags(). All Tags are not created equal! Some Tags work at a BLOCK level and some are IN-LINE. Some require closing Tags, some do not (eg <br>, <br />, and <hr>). Then there are Lists which can contain certain elements, but those elements are not permitted standalone. For example, a <LI> is only permitted within a <UL> or <OL> block. And then again, <DD&rt; and <DT&rt; are only permitted within a <DL&RT;. Then there are limits that one might want to impose such as how many times one wants to permit, say, superscripts or subscripts. e.g. 2<sup>3<sup>4<sup>5</sup></sup></sup> is rendered here as 2345; the final level of superscripting is supressed. (Actually, that is yet another Site Var which limits the maximum supported nesting!)

    In short, the Perl code needs to "know" these things, and what to do with them, and have the code that actually makes it all happen.

    On the other hand, the site presents Site Variables (Site Vars) which serve to turn the knobs, if you will, of what the Perl code is permitted to do. Imagine a speed limiter on a sports car or on a truck/lorrie. So, the Perl code implements what it does, and has its whitelists of how to support various Tags. The Site Operator, (previously: Slashdot; currently: SoylentNews) may have a desire to present only a subset of those. This is where the Site Vars: approvedtags and approvedtags_visible come into play. The Site Operator can "spin the knobs" and restrict which Tags are made available to their users. Just by changing the values of these Site Vars, they could, for example, reduce the apparent complexity of the site to the users and limit the permitted tags to just <P>, <P>, <B>, <I>, and <A>. Paragraphs, Bold text, Italics, and Anchors (aka links). We are a tech-oriented site so pretty much all the tags that the Perl knows how to deal with are whitelisted.

    The problem arose, apparently, when someone added <STRIKE> to the approvedtags and approvedtags_visible Site Vars... but failed to ensure that <STRIKE> tag was "known" to the Perl code.

    So, here's the deal. The Perl had data structures that explicitly supported <DEL> as a block Tag. So, it "knew" that if it ever encountered a <DEL> with a matching </DEL>, it should internally provide one in the proper place. It DID NOT, however, explicitly list <STRIKE> as one it knew what to do with. And, so long as it was allowed through by the Site Vars, AND the user always provided the closing </STRIKE> Tag, everything worked like a champ! But, as soon as someone failed to explicitly provide their own </STRIKE> in their comment... Bad Things Happened.

    Be aware the preceding is all based on what I have been able to glean from looking over the site; I could well have something confused. But, this is my working knowledge of how things work and I can assure you the code has no shortage of oversights and gotchas. We have come a LONG ways from back at the start of SoylentNews when the site crashed, hard, several times every day! I continue to be in awe of what was successfully wrought from an out-of-date and unsupported code base to make this site what it is today.

    --
    Wit is intellect, dancing.
    • (Score: 2) by Muad'Dave on Friday April 17 2020, @11:38AM (2 children)

      by Muad'Dave (1413) on Friday April 17 2020, @11:38AM (#984073)

      I understand the technical challenges of implementing the tag logic (in perl???).

      I'm asking a much simpler, non-technical question - why was the tag named strike removed in favor of the less descriptive 'del' tag? What drove the decision to deprecate the 'strike' tag instead of using that name over 'del' for the desired functionality?

      • (Score: 2) by martyb on Friday April 17 2020, @01:52PM

        by martyb (76) Subscriber Badge on Friday April 17 2020, @01:52PM (#984104) Journal

        I'm asking a much simpler, non-technical question - why was the tag named strike removed in favor of the less descriptive 'del' tag? What drove the decision to deprecate the 'strike' tag instead of using that name over 'del' for the desired functionality?

        DEL is already is in the Perl code.

        DEL works and remains available.

        If STRIKE is not in the Perl code, it will not work correctly.

        STRIKE has been made unavailable.

        --
        Wit is intellect, dancing.
      • (Score: 2) by martyb on Saturday April 18 2020, @04:07PM

        by martyb (76) Subscriber Badge on Saturday April 18 2020, @04:07PM (#984581) Journal

        I'm asking a much simpler, non-technical question - why was the tag named strike removed in favor of the less descriptive 'del' tag? What drove the decision to deprecate the 'strike' tag instead of using that name over 'del' for the desired functionality?

        I may have misunderstood the question, so let me try something else.

        Question: So, when they wrote the Perl code for balanceTags(), why did they explicitly support DEL and fail to explicitly support STRIKE?
        Answer: Good question! I am not sure why. Most likely? Just simple human error.
        What's next? When time permits, we will go back, review the situation, and likely add support for STRIKE into the Perl code, enable it in the Site Vars, and all will be rainbows and unicorns. =)

        What follows is my best guess as to why STRIKE was not explicitly included.

        Some History: What you are looking at now, in your browser, is the product of what is stored in our source code repository as "rehash". It was built upon a fork of the most-recent-at-the-time version of slashcode which was purported to run Slashdot. I suspect it was intended to be, at least. But, it was not maintained and by the time SoylentNews was getting started, slashcode was very much out of date. It had dependencies on versions of things like apache and Perl and MySql that were no longer supported. Those who started SoylentNews worked like crazy to bring things up to the current day, fixed bugs in what they inherited, and finally started adding our own improvements to it. So that is what we have as out code base, now. Rehash.

        Realize that Slashdot got started in October of 1997. Back then, HTML was at version 3.2 or maybe 4.0 or thereabouts. A bunch of new HTML tags have been added since then. Though it looks like they tried to follow good development practices, it sure looks to me that some coding fixes were made less carefully. I started using Slashdot before they even instituted usernames and UIDs. So I witnessed some "situations" where "Trolls" tried to break the site. For example, I remember when page-widening trolls were a thing. If a single word in a comment was long enough (say, over 80 characters), it would result in the "comment box" being wider than the page it was on. The browser would do its best to display things, but it really made a mess of that entire page. It forced the user to have to scroll left and right to read each comment. So, there was not a lot of time to plan, design, and implement a fix. They figured something out and pushed it out to the site in a matter of a day or two. Things seemed to work after they made a couple more fixes. But I seriously doubt that they went back and tested all the potential ramifications of that change.

        When you consider they had a site with millions of users back then, there was a lot of pressure to fix things ASAP!

        That's just an example. Many more on-the-fly fixes were added to the code, deemed to seem to work, and called "good enough" until proven otherwise.

        So, way back when balanceTags() was written, they were trying to fix a problem that had arisen when someone wrote a comment and failed to close a block tag that they had started. IIRC, that was with one of the more common tags like <B> for BOLD text. All text on the page from that point onward would be in bold. Same problem with <I> for ITALIC text.

        Whatever the reason, they failed to include STRIKE. To add it, now, would require all the stuff I mentioned in an earlier comment. Suppressing STRIKE and allowing continued use of DEL which was explicitly included in the Perl code was possible with no Perl coding changes. Given the situation at hand, that path was taken for now. When time permits, we will go back, specifically add STRIKE to the Perl code, test it, and then if all is good, finally push it out to the production server.

        Hope that helps!

        --
        Wit is intellect, dancing.