Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 15 submissions in the queue.
Meta
posted by NCommander on Monday June 01 2015, @07:17AM   Printer-friendly
from the that-sucked dept.

This was by far one of the most painful upgrades we've ever done to this site, and resulted in nearly a three hour downtime. Even as of writing, we're still not back to 100% due to unexpected breakage that did not show up in dev. As I need a break from trying to debug rehash, let me write up what's known, what's new, and what went pear-shaped.

Rehash 15.05 - What's New

  • Rewrote large amounts of the site to migrate to Apache 2, mod_perl 2, and perl 5.20.
    • This was a massive undertaking. I did a large part of the initial work, but paulej72, and TheMightyBuzzard did lots to help fix a lot of the lingering issues. Major props to Bytram for catching many of the bugs pre-release
  • Nexus Support (finally).
    • Currently we have the Meta and Breaking News nexii, with the possibility of adding more in the future, such as a Freshmeat replacement.
    • Nexii can be filtered in the user control panel under the Homepage tab. At the moment, this functionality is hosed due to unexpected breakage, but should be functional within the next 24-48 hours
  • IPv6 support - the AAAA record is live as we speak
  • Themes can be attached to a nexus independent of the "primary theme" setting; user choice overrides this
  • Squashed More UTF-8 Bugs
  • Migration to MySQL Cluster (more on this below)
  • Rewrote site search engine to use sphinx search and (in general) be more useful
  • Long comments properly collaspe now
  • Support for SSL by default (not live yet)
  • Fault tolerance; the site no longer explodes into confetti if a database or webfrontend goes down unexpectedly; allows for much easier system maintenance as we can offline things without manual migration of services
  • Improved editor functionality, including per-article note block
  • Lots of small fixes everywhere, due to the extended development cycle

I want to re-state that this upgrade is by far the most invasive one we've ever done. Nearly every file and function in rehash had to be modifying due to changes in the mod_perl infrastructure, and more than a few ugly hacks had to be written to emulate the original API in places. We knew going into this upgrade it was going to be painful, but we had a load of unexpected hiccups and headaches. Even as I write this, the site is still limping due to some of that breakage. Read more past the break for a full understanding of what has been going on.

Understanding The Rewrite (what makes rehash tick)

Way back at golive, we identified quite a few goals that we needed to reach if we wanted the site to be maintainable in the long run. One of these was getting to a modern version of Apache, and perl; slashcode (and rehash) are tightly tied to the Apache API for performance reasons, and historically only ran against Apache 1.3, and mod_perl 1. This put us in the unfortunate position of having to run on a codebase that had long been EOLed when we launched in 2014. We took precautions to protect the site such as running everything through apparmor, and trying to adhere to the smallest set of permissions possible, but no matter how you looked at it, we were stuck on a dead platform. As such, this was something that *had* to get done for the sake of maintainability, security and support.

This was further complicated by a massive API break between mod_perl 1 -> 2, which many (IMHO) unnecessary changes done to data structures and such that meant such an upgrade was an all-or-nothing affair. There was no way we could piecemeal upgrade the site to the new API. We had a few previous attempts at this port, all of them going nowhere, but over a long weekend in March, I sat down with rehash and our dev server, lithium, and got to the point the main index could be loaded under mod_perl 2. From there, we tried to hammer down whatever bugs we could, but we were effectively maintaining the legacy slashcode codebase, and the newer rehash codebase. Due to limited development time, most of the bug fixes and such were placed on rehash once it reached a state of functionality, and these would be shoehorned in with the stack of bugs we were fixing). I took the opportunity to try and clear out as many of the long-standing wishlist bugs as possible, such as IPv6 support.

In our year and a half of dealing with slashcode, we had also identified several pain points; for example, if the database went down even for a second, the site would lockup, and httpd would hang to the point that it was necessary to kill -9 the process. Although slashcode has support for the native master-slave replication built into MySQL, it had no support for failover. Furthermore, MySQL's native replication is extremely lacking in the area of reliability. Until very recently, there was no support for dynamically changing the master database in case of failure, and the manual process is exceedingly slow and error prone. While MySQL 5.6 has improved the situation with global transactions IDs (GTID), it still required code support in the application to handle failover, and a specific monitoring daemon to manage the process, in effect creating a new single point of failure. It also continues to lack any functionality heal or otherwise recover from replication failures. In my research, I found that there was simply bad and worse options with vanilla MySQL in handling replication and failover. As such, I started looking seriously into MySQL Cluster, which adds multi-master replication to MySQL at the cost of some backwards compatibility.

I was hesitant to make such a large change to the system, but short of rewriting rehash to use a different RDBM, there wasn't a lot of options. After another weekend of hacking, dev.soylentnews.org was running on a two system cluster, which provided the basis for further development. This required removing all the FULLTEXT indexes in the database, and rewriting the entire search engine to use Sphinx Search. Unfortunately, there's no trivial way to migrate from vanilla MySQL to cluster. To prevent a long story from getting even longer, to perform the migration, the site would have to be offlined, a modified schema would have to be loaded into the database, and then the data re-imported in two separate transactions. Furthermore, MySQL Cluster needs to know in advance how many attributes and such are being used in the cluster, adding another tuning step to the entire process. This quirk of cluster caused significant headache when it came to import the production database.

Understanding Our Upgrade Process

To understand why things went so pear shaped on this cluster**** of the upgrade, a little information is needed on how we do upgrades. Normally, after the code has baked for awhile on dev, our QA team (Bytram) gives us an ACK when he feels its ready. If the devs feel we're also up to scratch to deploy, one person, usually me or Paul will push the update out to production. Normally, this is a quick process; git tag/pull and then deploy. Unfortunately, due to the massive amounts of infrastructure changes required by this upgrade, more work than normal would be required. In preparation, I prepared our old webfrontend, hydrogen, which had been down for an extended period following a system break to take the new perl, Apache 2, etc, and loaded a copy of rehash. The upgrade would then just be a matter of moving the database over to cluster, changing the load balancer to point to hydrogen, and then upgrading the current webfrontend to flourine. At 20:00 EDT, I offlined the site to handle the database migration, dumping the schema and tables. Unfortunately, the MaxNoOfAttributes and other tuning variables were too low to handle two copies of the database, and thus the initial import failed. Due to difficulty with internal configuration changes, and other headaches (such as forgetting to exclude CREATE TABLE statements from the original database), it took nearly two hours to simply begin importing the 700 MiB SQL file, and another 30 or so minutes for the import to finish. I admit I nearly gave up the upgrade at this point, but was encouraged to soldier on. In hindsight, I could have better tested this procedure, and had gotten all the snags out of the way prior to upgrade; the blame for the extended downtime solely lies with me. Once the database was updated, I quickly got the mysqld frontend on hydrogen up and running, as well as Apache2, just to learn I had more problems as the site returned to the internet nearly three hours later.

What I didn't realize at the time was hydrogen's earlier failure had not been resolved as I thought, and it gave truly abysmal performance, with 10+ second page loads. As soon as this was realized, I quickly pressed fluorine, our 'normal' frontend server into service, and site performance went from horrific to bad. A review of the logs showed that some of the internal caches used by rehash were throwing errors; this wasn't an issue we had seen on dev, and such was causing excessive amounts of traffic to go to the database, and causing Apache to hang as the system tries to keep up with the load. Two hours of debugging have yet to reveal the root cause of the failure, so I've taken a break to write this up before digging into it again

The End Result

As I write this, site performance remains fairly poor, as the server is excessively smashing against the database. Several features which worked on dev went snap when the site was rolled out on production, and I find myself feeling that I'm responsible for hosing the site. I'm going to keep working for as long as I can stay awake to try and fix as many issues as I can, but it may be a day or two before we're back to business as usual. I truly apologize for the community; this entire site update has gone horribly pear shaped, and I don't like looking incompetent. All I can do now is try and pick up the pieces and get us back to where we were. I'll keep this post updated.

~ NCommander

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by janrinok on Monday June 01 2015, @07:29AM

    by janrinok (52) Subscriber Badge on Monday June 01 2015, @07:29AM (#190593) Journal
    Don't take it too hard NC. You and the rest of dev do a brilliant job and we were bound to have the occasional one not go according to plan. Get some rest.
    • (Score: 2, Informative) by Anonymous Coward on Monday June 01 2015, @07:45AM

      by Anonymous Coward on Monday June 01 2015, @07:45AM (#190600)

      Agreed. This site is meant to be fun, you devs are doing it in your free time, it's not the end of the world if the site doesn't work properly for a few days. (Also the login-system doesn't seem to work at the moment - i guess it has maybe something to do with server side caching? - , so in case this post appears as AC: I'm sudo rm -rf)

  • (Score: 3, Insightful) by wantkitteh on Monday June 01 2015, @07:35AM

    by wantkitteh (3362) on Monday June 01 2015, @07:35AM (#190594) Homepage Journal

    Given how sparse IPv6 support still is across the 'net, support for that alone puts this site head and shoulders above many of it's peers in the technical stakes. As a news platform with zero commercial backing, it's a mighty impressive achievement!

    Let me know when you're done, I foresee a multiplayer Twitch streaming session in the near future to wind down ;)

    • (Score: 3, Interesting) by NCommander on Monday June 01 2015, @07:52AM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday June 01 2015, @07:52AM (#190605) Homepage Journal

      I'm actually going on hiatus to go biking across New York State, and parts of New England for a month after this update stabilizes.

      --
      Still always moving
      • (Score: 3, Funny) by wantkitteh on Monday June 01 2015, @08:39AM

        by wantkitteh (3362) on Monday June 01 2015, @08:39AM (#190619) Homepage Journal

        Pro tip: mentioning biking and stabilizers in the same sentence isn't as macho as you might think ;)

        Enjoy your trip, I'm almost envious!

        • (Score: 4, Interesting) by Anonymous Coward on Monday June 01 2015, @01:32PM

          by Anonymous Coward on Monday June 01 2015, @01:32PM (#190692)

          Note: stabilizers (UK) = training wheels (USA)

          Training wheels are not (imo) a very good way to learn to bicycle -- better to:
            + Learn to pedal on a tricycle
            + Learn to balance on a scooter or a "balance bike" (or any bike with pedals removed)
            + Put two skills together (reinstall pedals on the paddle bike)

          • (Score: 2) by NCommander on Tuesday June 02 2015, @05:05AM

            by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Tuesday June 02 2015, @05:05AM (#191036) Homepage Journal

            And I learned something today. I do a lot of long distance biking, and when I lived in Anchorage, I used to do 20-30 miles a day without a huge sweat. The odometer on my bike (which I bought nine months ago), says its just shy of 500 miles.

            --
            Still always moving
    • (Score: 2) by NCommander on Tuesday June 02 2015, @09:44PM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Tuesday June 02 2015, @09:44PM (#191294) Homepage Journal

      As a second note, getting rehash to support IPv6 was an absolute pain, there was a fairly large delta involved to make sure it didn't suidice with IPv6 addresses. You can see a good chunk of the logic here: https://github.com/SoylentNews/rehash/blob/master/Slash/Utility/Environment/Environment.pm#L3169 [github.com]

      --
      Still always moving
  • (Score: -1, Troll) by Anonymous Coward on Monday June 01 2015, @07:36AM

    by Anonymous Coward on Monday June 01 2015, @07:36AM (#190595)

    That's the real question. How many black lesbian muslim transsexual women did it take to perform the site upgrade for you? Everyone knows white men can't do shit, you must be lying if you say you did anything at all.

    • (Score: 1, Informative) by Tork on Monday June 01 2015, @04:41PM

      by Tork (3914) Subscriber Badge on Monday June 01 2015, @04:41PM (#190773)
      Pro tip: The shotgun approach to trolling doesn't work. It's just desperate.
      --
      🏳️‍🌈 Proud Ally 🏳️‍🌈
  • (Score: 0) by Anonymous Coward on Monday June 01 2015, @07:42AM

    by Anonymous Coward on Monday June 01 2015, @07:42AM (#190598)

    As far as I'm concerned, the site is already operational. Apart from the missing logo (a really minor thing), I experience no apparent problem. So unless there is something important missing that only logged-in users experience, I'd say you don't have to fear looking incompetent; quite the opposite.

    • (Score: 4, Informative) by NCommander on Monday June 01 2015, @07:51AM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday June 01 2015, @07:51AM (#190604) Homepage Journal

      ACs get considerably more caching than logged in users, which is why you're less effected by the slowdowns. In effect, an AC sees static snapshots of the site generated by the backend updating every five minutes. Logged in users bypass the cache of articles/indexes and get "real-time" comments and such.

      --
      Still always moving
      • (Score: 0) by Anonymous Coward on Monday June 01 2015, @07:55AM

        by Anonymous Coward on Monday June 01 2015, @07:55AM (#190607)

        I experienced exactly this:
        One second ago (logged-in) I modded your post interesting, upon site-reload I was logged out (Cookie-content: "nobody" instead my Session-ID) and your Post above disappeared :)
        - sudo rm -rf

        • (Score: 2) by NCommander on Monday June 01 2015, @08:00AM

          by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday June 01 2015, @08:00AM (#190608) Homepage Journal

          This just happened to me too. I think one of the caches ejected its brain as I got booted. -_-;.

          I'm getting really tired. I think the site is stable enough I can sleep for a few hours then get back on it. The staff have my cell phone if it goes up/down in flames.

          --
          Still always moving
          • (Score: 2, Informative) by Anonymous Coward on Monday June 01 2015, @08:12AM

            by Anonymous Coward on Monday June 01 2015, @08:12AM (#190611)

            Take a break, and: DON'T PANIC!

          • (Score: 3, Funny) by isostatic on Monday June 01 2015, @10:53AM

            by isostatic (365) on Monday June 01 2015, @10:53AM (#190635) Journal

            You being paid an on call allowance? Or is it just time and a half?

            • (Score: 3, Informative) by mrcoolbp on Monday June 01 2015, @01:13PM

              by mrcoolbp (68) <mrcoolbp@soylentnews.org> on Monday June 01 2015, @01:13PM (#190679) Homepage

              1.5 * $0 = $0

              None of us are paid. We do this for the community and because we believe in this project.

              --
              (Score:1^½, Radical)
              • (Score: 2) by isostatic on Monday June 01 2015, @01:53PM

                by isostatic (365) on Monday June 01 2015, @01:53PM (#190699) Journal

                I really should use the <sarcasm>sarcasm</sarcasm> tag

                • (Score: 2) by isostatic on Monday June 01 2015, @01:54PM

                  by isostatic (365) on Monday June 01 2015, @01:54PM (#190700) Journal

                  Hmm,
                  Allowed HTML

                  What do these tags do?

                  b
                  i

                  p

                  br
                  a

                  1. ol
                  • ul

                  li

                  dl

                  dt

                  dd

                  em
                  strong
                  tt

                  blockquote

                  div

                  ecode

                  quote

                  strike
                  <sarcasm>sarc</sarcasm><sarcasm>sarcasm</sarcasm>user

                  • (Score: 4, Funny) by danomac on Monday June 01 2015, @03:38PM

                    by danomac (979) on Monday June 01 2015, @03:38PM (#190743)
                    What, no <blink> tag???
                    • (Score: 2) by isostatic on Monday June 01 2015, @05:04PM

                      by isostatic (365) on Monday June 01 2015, @05:04PM (#190781) Journal
                      Hmm...

                      <marquee><blink><monkey era="1999"/></blink></marquee>

                      Obviously more donations are needed
                • (Score: 2) by mrcoolbp on Tuesday June 02 2015, @03:30AM

                  by mrcoolbp (68) <mrcoolbp@soylentnews.org> on Tuesday June 02 2015, @03:30AM (#191006) Homepage

                  No worries.

                  --
                  (Score:1^½, Radical)
        • (Score: 4, Informative) by paulej72 on Monday June 01 2015, @02:33PM

          by paulej72 (58) on Monday June 01 2015, @02:33PM (#190720) Journal

          Fixed I think. NCommander changed the logintokens to remove the location field. For some users if they had logintokens already stored with a location, the old tokens would interfere with the creation of new ones that did not have the location set. I think the code would match various tokens and log you out due to a mismatch. I truncated the user_logintokens table to remove all previous tokens. The unfortunate part is that everyone will need to log in again.

          --
          Team Leader for SN Development
          • (Score: 2) by sudo rm -rf on Monday June 01 2015, @07:53PM

            by sudo rm -rf (2357) on Monday June 01 2015, @07:53PM (#190849) Journal

            Works like a charm, great work, and thanks!

          • (Score: 2) by TLA on Monday June 01 2015, @08:47PM

            by TLA (5128) on Monday June 01 2015, @08:47PM (#190871) Journal

            ah, did wonder why I had to log again... apart from that, no apparent issues here whatsoever. If anything's fucked, I've not hit it yet. Well done guys!

            --
            Excuse me, I think I need to reboot my horse. - NCommander
      • (Score: 2) by FatPhil on Monday June 01 2015, @08:03AM

        by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Monday June 01 2015, @08:03AM (#190610) Homepage
        This is the hilarious thing about sites wanting you to create accounts - it increases their costs considerably to put your name in the corner of the page!
        --
        Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
        • (Score: 3, Interesting) by TheRaven on Monday June 01 2015, @11:50AM

          by TheRaven (270) on Monday June 01 2015, @11:50AM (#190653) Journal
          Depends on how it's implemented. Especially with sites that are happy to rely on JavaScript, you can serve a static page from your CNS and then do an AJAX request for the stuff to personalise it. This can just be a tiny fragment of JSON that needs fetching from memcached or similar, so a single machine can easily serve it to a large number of logged-in users. It's only when you start interacting with the site that it becomes more of an issue.
          --
          sudo mod me up
          • (Score: 0) by Anonymous Coward on Monday June 01 2015, @12:23PM

            by Anonymous Coward on Monday June 01 2015, @12:23PM (#190660)

            Ah, so that is the true reason why all that Ajax stuff is so popular.

      • (Score: 2) by stormwyrm on Monday June 01 2015, @08:31AM

        by stormwyrm (717) on Monday June 01 2015, @08:31AM (#190617) Journal
        Well, it's not that bad at the moment. The site is perfectly usable. A couple of hours ago I was getting 503 errors but after those stopped the site was behaving just fine.
        --
        Numquam ponenda est pluralitas sine necessitate.
    • (Score: 2, Insightful) by Anonymous Coward on Monday June 01 2015, @11:22AM

      by Anonymous Coward on Monday June 01 2015, @11:22AM (#190640)

      Thanks for dragging us to the 21st century! I believe you've earned some sleep! :)

      Love you guys. thanks for rescuing me and rest the community from Dice.

  • (Score: 2) by FatPhil on Monday June 01 2015, @08:13AM

    by FatPhil (863) <{pc-soylent} {at} {asdf.fi}> on Monday June 01 2015, @08:13AM (#190612) Homepage
    "user control panel under the Homepage tab" ? I don't have a "Homepage" tab, you insensitive clod!
    --
    Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
    • (Score: 2) by Yog-Yogguth on Monday June 01 2015, @09:24PM

      by Yog-Yogguth (1862) Subscriber Badge on Monday June 01 2015, @09:24PM (#190883) Journal

      You'll find it under your ‘Preferences’ page; one of the tabs is named ‘Homepage’.

      --
      Bite harder Ouroboros, bite! tails.boum.org/ linux USB CD secure desktop IRC *crypt tor (not endorsements (XKeyScore))
  • (Score: 5, Interesting) by pTamok on Monday June 01 2015, @08:34AM

    by pTamok (3042) on Monday June 01 2015, @08:34AM (#190618)

    Just as pilots say that any landing you can walk away from is a good landing, any upgrade that leaves the site workable is a good upgrade.

    Don't sweat the small stuff: so long as the basic functionality is working, go away, get some sleep, then come back later and play Whack-a-Mole™ with what's left.

    In future, it might be an idea to have a banner header shown for a while before an upgrade saying "Upgrade due, breakage might happen"; and afterwards "Upgrade performed, bugs might be found; report bugs via bug-reporting mechanism"

  • (Score: 2) by TLA on Monday June 01 2015, @08:45AM

    by TLA (5128) on Monday June 01 2015, @08:45AM (#190620) Journal

    I can read the website, that works for me - considering what you have to work with, that's pretty fuckin' amazing magic right there, I can barely write a functional wiki.

    --
    Excuse me, I think I need to reboot my horse. - NCommander
  • (Score: 2) by kaszz on Monday June 01 2015, @09:45AM

    by kaszz (4211) on Monday June 01 2015, @09:45AM (#190624) Journal

    Whenever I try to open the "inbox" which contains replies. I get logged out. Perhaps something to look into?

    • (Score: 2) by kaszz on Monday June 01 2015, @09:47AM

      by kaszz (4211) on Monday June 01 2015, @09:47AM (#190625) Journal

      Replying to an article as above also gets you logged out.

      • (Score: 0) by Anonymous Coward on Monday June 01 2015, @10:56AM

        by Anonymous Coward on Monday June 01 2015, @10:56AM (#190636)

        Even clicking on "login" gets me immediately logged out again. My guess it has something to do with session storage server-side? Every 10th time or so I manage to get a SID (according to cookie), but as soon as I send a request, i.e. click on a link, I am "nobody" again, so my session cannot be picked up again.

        Firefox 38.0 on Ubuntu / Win7
        Chromium Version 41.0.2272.76 Ubuntu 14.04 (64-bit)
        No Add-Ons that meddle with session handling

        - sudo rm -rf

        • (Score: 0) by Anonymous Coward on Monday June 01 2015, @11:45AM

          by Anonymous Coward on Monday June 01 2015, @11:45AM (#190651)

          I and cmn32480 have the same issue.. we're chatting with paulej72 in irc about it...
          ...CoolHand

      • (Score: 0) by Anonymous Coward on Monday June 01 2015, @12:41PM

        by Anonymous Coward on Monday June 01 2015, @12:41PM (#190664)

        I had this happen to me multiple times when using the tor service even before the upgrade

      • (Score: 2) by CoolHand on Monday June 01 2015, @02:20PM

        by CoolHand (438) on Monday June 01 2015, @02:20PM (#190710) Journal
        hoping this may be fixed.. :)
        --
        Anyone who is capable of getting themselves made President should on no account be allowed to do the job-Douglas Adams
        • (Score: 2) by CoolHand on Monday June 01 2015, @02:21PM

          by CoolHand (438) on Monday June 01 2015, @02:21PM (#190713) Journal
          paulej72 is my hero! :)
          --
          Anyone who is capable of getting themselves made President should on no account be allowed to do the job-Douglas Adams
  • (Score: 4, Funny) by deimios on Monday June 01 2015, @09:57AM

    by deimios (201) Subscriber Badge on Monday June 01 2015, @09:57AM (#190627) Journal

    This is why I'm a soylentil and not a member of the green site that shall not be named. Besides this writeup was more interesting and emotionally engaging than most of the clickbait articles in recent memory...

    • (Score: 2) by isostatic on Monday June 01 2015, @10:52AM

      by isostatic (365) on Monday June 01 2015, @10:52AM (#190633) Journal

      Imagine a site without the clickbait, but with fewer, good articles?

      Good evening. Today is Good Friday. There is no news

      We don't need to have a dozen stories a day

      • (Score: 1, Interesting) by Anonymous Coward on Monday June 01 2015, @12:15PM

        by Anonymous Coward on Monday June 01 2015, @12:15PM (#190658)

        Just make the absence of news a summary which we can hook our discussion on.

  • (Score: 0) by Anonymous Coward on Monday June 01 2015, @10:45AM

    by Anonymous Coward on Monday June 01 2015, @10:45AM (#190632)

    What's a nexius and why are there so many of them?

    • (Score: 2) by sudo rm -rf on Monday June 01 2015, @11:00AM

      by sudo rm -rf (2357) on Monday June 01 2015, @11:00AM (#190637) Journal

      Good question, I was wondering the same. The plural of nexus is of course nexus. Not that I know what that's suppposed to be.
      sudo rm -rf

      • (Score: 3, Informative) by pTamok on Monday June 01 2015, @11:39AM

        by pTamok (3042) on Monday June 01 2015, @11:39AM (#190646)

        In English, the plural of nexus is nexuses

        In Latin (the language it was appropriated from) is could be any one of

        Case/Gender   Masculine     Feminine     Neuter
        Nominative    nexī          nexae        nexa
        Genitive      nexōrum       nexārum      nexōrum
        Dative        nexīs         nexīs        nexīs
        Accusative    nexōs         nexās        nexa
        Ablative      nexīs         nexīs        nexīs
        Vocative      nexī          nexae        nexa

        As a speaker of English should not be expected to know the rules of plural formation in a foreign language in order to speak or write English, I think it is sensible that the approach of adding an 's' or 'es' on the end of a word in order to pluralize it is taken. Historically, educated/learned people would be expected to know Latin and Greek, so the use of correctly declined Latin or Greek terms scattered in English text is a kind of shibboleth to demonstrate the writers (or speaker's) learning. Correct use of any foreign-language term is a similar shibboleth.

        • (Score: 4, Informative) by Anonymous Coward on Monday June 01 2015, @01:11PM

          by Anonymous Coward on Monday June 01 2015, @01:11PM (#190676)

          Sorry, I don't want to nitpick, but the table you quote is only right for the passive perfect participle of nectare (verb meaning 'to tie, to bind'). But we're talking about the noun nexus [wikipedia.org], which has the same root, but is inflected according to the fourth declension [wiktionary.org], i.e. for masculine, which nexus is, the correct Latin plural is nexus, though in English it is perfectly alright to add the -es, but not the -ii ;)

          (still don't know what it is.)

          sudo rm -rf

          • (Score: 2, Insightful) by pTamok on Tuesday June 02 2015, @12:40AM

            by pTamok (3042) on Tuesday June 02 2015, @12:40AM (#190972)

            You are quite right. I copied the incorrect table from the Wiktionary entry on 'nexus' : http://en.wiktionary.org/wiki/nexus#Latin [wiktionary.org]

            This is an object lesson in not trying to do too many things at once, in haste. And it also demonstrates that I've forgotten almost all of the Latin I once knew.

            Thank-you for pointing out the mistake. Even when the singular and plural form are spelled the same ( Singular:nexus , Plural:nexūs ), they are still pronounced differently, with the plural form having the long-u.

            • (Score: 2) by sudo rm -rf on Tuesday June 02 2015, @07:33AM

              by sudo rm -rf (2357) on Tuesday June 02 2015, @07:33AM (#191062) Journal

              Honestly, I had to look up, which declension it is and what it is called in English (I learnt it in German where it's called u-Deklination, IIRC). Latin courses have been some time ago for me, too. But even now, almost two decades later, I find it quite useful in understanding loan-words and it helps immensely in learning and understanding (at least written texts) romance languages.
              Latin is not dead, it just smells funny.

    • (Score: 0) by Anonymous Coward on Monday June 01 2015, @12:09PM

      by Anonymous Coward on Monday June 01 2015, @12:09PM (#190655)

      What's a nexius and why are there so many of them?

      A lexus is a poor mans BMW, basically a rebadged Toyota with the price marked up for conspicuous consumption marketing. Sells well to people who are very image oriented but don't have the real money required for a BMW. Every penny goes into the fit and finish and the mechanical parts and performance are not very impressive at all for the price.

    • (Score: 5, Informative) by paulej72 on Monday June 01 2015, @01:37PM

      by paulej72 (58) on Monday June 01 2015, @01:37PM (#190695) Journal

      A nexus is a way to organize stories by topics in a cleaner way than we currently do. A nexus has its own home page that will only display stories from that nexus on its page while at the same time showing on the main home page if you do not set the nexus to be blocked from showing.

      Basically it allows users to not show these Meta articles on the home page.

      --
      Team Leader for SN Development
      • (Score: 2) by maxwell demon on Monday June 01 2015, @05:29PM

        by maxwell demon (1608) on Monday June 01 2015, @05:29PM (#190789) Journal

        So either everything or nothing from a nexus shows on the main page?

        --
        The Tao of math: The numbers you can count are not the real numbers.
        • (Score: 2) by paulej72 on Monday June 01 2015, @11:01PM

          by paulej72 (58) on Monday June 01 2015, @11:01PM (#190932) Journal

          Well topic filtering still works, so it is possible to hide stories based on topic no matter the nexus.

          --
          Team Leader for SN Development
    • (Score: 2) by maxwell demon on Monday June 01 2015, @05:27PM

      by maxwell demon (1608) on Monday June 01 2015, @05:27PM (#190787) Journal

      Read all about the Nexus here! [wikia.com] :-)

      --
      The Tao of math: The numbers you can count are not the real numbers.
  • (Score: 2, Funny) by Anonymous Coward on Monday June 01 2015, @12:27PM

    by Anonymous Coward on Monday June 01 2015, @12:27PM (#190661)

    Wait, a story about computer related problems, 30 comments up to now, and no one mentioned SystemD yet?

    I'm sure all your problems are caused by SystemD! ;-)

  • (Score: 2) by kadal on Monday June 01 2015, @01:03PM

    by kadal (4731) on Monday June 01 2015, @01:03PM (#190673)

    There should be a dev beer fund we can donate to...

    • (Score: 2) by mrcoolbp on Monday June 01 2015, @01:18PM

      by mrcoolbp (68) <mrcoolbp@soylentnews.org> on Monday June 01 2015, @01:18PM (#190682) Homepage

      Though some of us enjoy beers, we would rather pay for hosting then beer, and you've already contributed to that, so our thanks!

      --
      (Score:1^½, Radical)
  • (Score: 1) by Xaemyl on Monday June 01 2015, @03:55PM

    by Xaemyl (1987) on Monday June 01 2015, @03:55PM (#190747)

    Everything goes sideways at some point or another. With this post (going into detail on what fucked up), I think y'all handled/are handling it quite well!

  • (Score: 3, Interesting) by bill_mcgonigle on Monday June 01 2015, @05:02PM

    by bill_mcgonigle (1105) on Monday June 01 2015, @05:02PM (#190779)

    As a casual reader, I just want to voice my opinion that I'd be happy to "suffer" more frequent, short outages so that the development team can put out smaller changesets with less chance of a rabbit hole of infinite depth. I recognize the volunteer nature of SN and at a minimum, at a humanitarian level, I don't expect you to self-sacrifice to minimize outages.

    As a project manager, it's worth pointing out that your three-hour outage wasn't bad, and it could have easily been nine. Also, that developers who are exhausted are less efficient (and it's unkind to volunteers), so that's part of the first point.

    As a DBA, you have my sympathies. MySQL is a devil in all its forms. You can do multi-master with vanilla MySQL, but you have to set your primary key constraints to increment serial numbers by +2 on each (even/odd, basically). It's still a royal pain, and all of my developers are happier on postgresql whenever I can migrate them. If your replication stops mysteriously, you sometimes cannot get unstuck without third-party hacks and once in a while you wind up having to dump/restore the whole damn thing because nothing can fix it. That's the stuff of DBA nightmares.

    As a developer, I'm sure you know that Rehash's community size will be limited as long as people-who-won't-touch-mysql cannot play. Certainly moving to more modern perl will help in this regard for the future, so much kudos on the move to a mod_perl2 stack. Gosh, I haven't touched mod_perl1 in over a decade - it's great that you rolled up your pants and waded into that mess. Well, great for us - thanks for enduring the stress.

    Finally, as a security extremist, thank you for doing the work to enable always-on TLS. That's hugely important for the broader tech community.

    You guys are headed in the right direction technically, even if you encounter the occasional tarpit.

    • (Score: 3, Interesting) by NCommander on Tuesday June 02 2015, @05:13AM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Tuesday June 02 2015, @05:13AM (#191038) Homepage Journal

      I was under the impression the only way to do multimaster with vanilla involved DRDB, and haproxy or something similar. Unless you're referring two machines replicating each other in statement mode which could work, but has the potential to go so horribly wrong. We've already had epic problems just using replication to move the database from point A to point B with a minimal of downtime. Trying to actually use it as designed (async readers) seems to be a good way to get inconsistent and unusual responsibles. I love that you can still execute write queries on slaves, which in turn can prevent further replication options from succeeding. MySQL was designed to be good for quick and dirty websites. When you actually want to do something a database is supposed to be good at, it fails miserably.

      MySQL cluster was simply the least bad of the options that we could execute in the relative shortterm. I intend to gut the entire database layer and move us to postgreSQL, but I'm looking at close to 50k+ lines of code in the database layer alone, plus random SQL spewed throughout the site. What I'd *like* to do is get to the point that the only raw SQL executed by the frontend are EXECUTE statements, possibly gated through multiple user accounts/permissions so subroutine X in perl can only call stored proceedure Y in postgresql. pgSQL's perl bindings look fairly solid, so it would be pretty trivial to have that procedures call back into Slash, and migrate a massive amount of code out of the frontend, and into the backend. But that's a project in the future.

      --
      Still always moving
  • (Score: 3, Insightful) by tynin on Monday June 01 2015, @08:41PM

    by tynin (2013) on Monday June 01 2015, @08:41PM (#190867) Journal

    You all are doing great. This isn't a comment on how I would have done things better, just a view of what I've found to be a very ideal setup the generally avoids this outages, or lessens the time to rollback to a couple minutes. Of course, it costs more money than your current setup.

    3 environments, Prod, Cert, and Dev.

    Prod and Cert should have identical hardware, Dev can be a subset.

    Prod is user facing at all times. Cert takes changes that were promoted from Dev. Dev is your playground.

    Once you are happy with where Cert is, you flip your DNS to make Cert become Prod, and old Prod gets frozen in place in case you need to roll back. Once everything is confirmed working, old Prod becomes Cert and the process continues, or you fail back to old Prod and continue troubleshooting what went wrong.

    It takes SOOO much stress out of upgrades. Now if I could only get my company to agree to this and over look the cost of needing 2 full Prod systems, with one of them sitting mostly idle other than patches and load tests... :)

    • (Score: 3, Insightful) by NCommander on Tuesday June 02 2015, @05:02AM

      by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Tuesday June 02 2015, @05:02AM (#191035) Homepage Journal

      Under normal circumstances, I'd completely agree. There's two problems that would prevent us from doing that.

      The first one is cost. To fully replicate the production environment would require an addition four 4096 linodes, at a cost of $160 a month. That's a really hard cost to justify on our budget. We simply don't do site upgrades very often which helps reduce the problem, at the cost that the site stability is a bit wonky in the few days post-upgrade (though we've only had two major upgrades I can think of that went pearshaped).

      The second, unique to this upgrade, is we changed the underlying storage engine for the site. There's no *good* way to live migrate, aside from replicate from vanilla MySQL to cluster, and as cluster is not 100% compatible with vanilla, we were risking a lot of breakage on something we've already found historically to be buggy (mysql replication),

      --
      Still always moving
  • (Score: 0) by Anonymous Coward on Monday June 01 2015, @08:41PM

    by Anonymous Coward on Monday June 01 2015, @08:41PM (#190868)

    NC makes it sound like everyone who's not brand new to the IT industry has at least heard of Rehash.

    Here's what I get when I do a Google search. [google.com] I gather SN wasn't just refactored to use a new console-based hash calculator.

    • (Score: 3, Informative) by maxwell demon on Monday June 01 2015, @09:43PM

      by maxwell demon (1608) on Monday June 01 2015, @09:43PM (#190896) Journal

      Well, I would have assumed that anyone who is not brand new to this site has at least heard of Rehash. Obviously I'm wrong.

      Rehash is the code this site runs on. It is based on Slashcode.

      --
      The Tao of math: The numbers you can count are not the real numbers.
  • (Score: 2) by kurenai.tsubasa on Tuesday June 02 2015, @02:09AM

    by kurenai.tsubasa (5227) on Tuesday June 02 2015, @02:09AM (#190987) Journal

    Huge success!

    I'm sure you'll get the lingering issues figured out. I certainly appreciate all the effort that goes into running this site. I was grinding G-rank quests in Monster Hunter during the switchover and didn't even notice.

    The only problem I'm having I'm certain is the fault of Charter 6RD. It mostly works (and good on Charter for at least making the effort), but I've found certain websites I just can't establish an HTTPS connection to with IPv6 (ping6 works, but HTTPS times out, haven't done a more detailed packet capture). I can connect IPv6 from my server in the clouds with eLinks just fine (Linode Fremont datacenter), so I'm certain this is not a SoylentNews problem (nor should SoylentNews degrade to HTTP on IPv6—I'd rather degrade to IPv4 HTTPS). I'm thinking about turning my server in the clouds into an IPv6 VPN for the house, but that's a project for another day.

  • (Score: 2) by kadal on Tuesday June 02 2015, @02:26AM

    by kadal (4731) on Tuesday June 02 2015, @02:26AM (#190994)

    Clicking a subject link in the hot comments box when using the tor service takes you to soylentnews.org instead of the .onion address.I don't know if this worked before the upgrade

    • (Score: 2) by Open4D on Tuesday June 02 2015, @11:41AM

      by Open4D (371) on Tuesday June 02 2015, @11:41AM (#191105) Journal

      Please can you check whether it's already reported here [github.com], and if not, raise it?