Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by martyb on Friday May 21 2021, @12:25AM   Printer-friendly

As many of you noticed, we had a site crash today. From around 1300 until 2200 UTC (2021-05-20).

A HUGE thank you goes to mechanicjay who spent the whole time trying to get our ndb (cluster) working again. It's an uncommon configuration, which made recovery especially challenging... there's just not a lot of documentation about it on the web.

I reached out and got hold of The Mighty Buzzard on the phone. Then put him in touch with mechanicjay who got us back up and running using backups.

Unfortunately, we had to go way back until April 14 to get a working backup. (I don't know all the details, but it appears something went sideways on neon).

We're all wiped out right now. When we have rested and had a chance to discuss things, we'll post an update.

In the meantime, please join me in thanking mechanicjay and TMB for all they did to get us up and running again!

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: -1, Flamebait) by Anonymous Coward on Friday May 21 2021, @09:44AM (17 children)

    by Anonymous Coward on Friday May 21 2021, @09:44AM (#1137497)

    Keep making excuses. You don’t need to search for any “documentation “ - the site crashes on a regular basis, and everyone knows it.

    And as a parent poster pointed out, nobody is going to volunteer to fix it if it means continually arguing with Mr “Proud I Don’t Need An Education” Buzztard.

    But stick with the current “plan”. Where you’ve built up so much technical debt that recovery is impossible. Because there’s no fool like an old fool.

    Starting Score:    0  points
    Moderation   -1  
       Flamebait=2, Informative=1, Total=3
    Extra 'Flamebait' Modifier   0  

    Total Score:   -1  
  • (Score: 4, Informative) by janrinok on Friday May 21 2021, @10:21AM (12 children)

    by janrinok (52) Subscriber Badge on Friday May 21 2021, @10:21AM (#1137502) Journal

    You've read something that I didn't say.

    We ARE looking at how to improve both the system configuration and the software that we use. We are not, however, simply throwing everything away to start from scratch again. The system can be simplified which should result in a more robust site. As mechanicjay has stated elsewhere, some of the software that is installed to provide resilience is actually causing more problems than it is intended to solve. We can get rid of that straight away. A content management system should be able to work with any chosen database, and that includes MySQL, so there is nothing that I am aware of to suggest that MySQL cannot fulfil the role we are asking of it. People may have their own personal preferences but changing the database will require changes to the perl code which will all need writing and testing.

    There is a problem with documentation but that is also linked to the lack of staff that we currently have, and that goes back at least 3 years. What few staff we have are currently kept busy keeping the site going and, although we are aware of areas where work needs to be done it can only be done by those who understand the system configuration. You can only write more documentation when you have people who understand what it is they are documenting. We need more sysadmins because system failures do not occur when the only active sysadmin is sat at his computer with nothing better to do.

    We need more programmers who are prepared to volunteer to help support the site. It doesn't matter which language the site is written in, we will still need programmers to do that work. We always need more editors - although even with just the handful that we currently have we a looking relatively well manned compared with every other part of the support team. QA is a one-man team - MartyB again, who actually fills several more roles in the team at the same time.

    You definitely have got a bug about TMB though - in case you missed it he is no longer part of the support team although he remains a member of our community and he kindly gave advice to mechanicjay during the last 24 hours. If you have had your nose put out of joint during your earlier discussions with him then that is a personal matter between you two.

    But stick with the current “plan”. Where you’ve built up so much technical debt that recovery is impossible. Because there’s no fool like an old fool.

    So you can see that, far from your claim, we are not sticking with the current plan. With the limited resources that we have we will make progress at as fast a rate as is possible. When the site first went active there were 20-30 active participants who were all contributing to keeping the site going. I reckon that we have less than 10 available today. Rather than sitting back and criticising as an AC, wouldn't you prefer to join the team and help fix some of the problems?

    • (Score: -1, Redundant) by Anonymous Coward on Friday May 21 2021, @11:05AM (11 children)

      by Anonymous Coward on Friday May 21 2021, @11:05AM (#1137513)
      Geeklog works just fine with MySQL and Marian, among others. Slash, in it’s current state, obviously is way too far gone to bother fixing. There comes a time in many software projects where you have to throw everything out and do it right, using the lessons learned.

      But keep telling yourself that slash can be fixed. You need proper devs, something you haven’t had in years, who wouldn’t put up with wishful thinking but will speak the hard truths borne of the confidence of experience.

      There are other CMS that will also do a decent job. But NONE OF THEM ARE WRITTEN IN PERL. So the language issue is entirely relevant. Because software has to be maintained. And nobody wants to use Perl for large projects any more. Not when there are better alternatives.

      If your code is so great why isn’t anyone else running it? Because it’s an in maintained pile of patches over patches.

      The month of corrupted backups is just a symptom, another red flag. But keep making excuses - reality will continue to bite, and bite increasingly harder. There are hard decisions to be made, and you either make them now or events will make them for you. The future waits for nobody.

      • (Score: 2) by janrinok on Friday May 21 2021, @11:37AM (6 children)

        by janrinok (52) Subscriber Badge on Friday May 21 2021, @11:37AM (#1137520) Journal

        Rehash was not the cause of the latest crash (as far as we can ascertain) - it is not where the focus is at present. That is not to say it will never be replaced but, for the time being, it is still working as expected. Currently, we have not got the resources to replace Rehash with a different language or package. If it ain't broke, don't fix it.

        You are focussing on an area that is not causing us a problem at the moment. The system configuration is where we continue to encounter problems and that is where mechanicjay is currently concentrating his efforts.

        • (Score: 0) by Anonymous Coward on Friday May 21 2021, @01:58PM (5 children)

          by Anonymous Coward on Friday May 21 2021, @01:58PM (#1137534)

          If it ain't broke, don't fix it.

          More often than not, this actually means "the fix is too hard, so it cannot possibly be broken".

          • (Score: 2) by janrinok on Friday May 21 2021, @02:08PM (4 children)

            by janrinok (52) Subscriber Badge on Friday May 21 2021, @02:08PM (#1137536) Journal

            Rehash is working today as advertised, the system configuration isn't - with very limited resources which one would you work on first?

            • (Score: 0) by Anonymous Coward on Friday May 21 2021, @02:34PM (3 children)

              by Anonymous Coward on Friday May 21 2021, @02:34PM (#1137544)
              It broke … again. And nobody else is using the code anyway. Why not? Because it’s fragile, and too much Perl causes brain damage. Someone mentioned pipedot.org as an example - a site that has been inactive since April 2017.

              Another post mentioned using a separate process to update story counts. Someone needs to learn to code better, and brushing up on sql as well. Just because the original devs didn’t know how to do it right is no excuse to preserve shit like that. This is 2021, not 1995.

              TMB fücked up by not using a LIMIT clause in SQL that would have avoided time-outs under load. Experienced devs will ALWAYS seek ways that guarantee the most efficient use of resources because they don’t want intermittent bugs. Rehash is a total hash. Either learn to code or get something that other people are maintaining because it’s widely used, in a language that is widely used for web development. But you won’t. You will continue to ignore the red flags.

              Why the resistance to a clean-sheet rethink of the site? Articles, user comments , and user journals are the only essentials. The polls suck, but most CMS packages contain poll functionality, so keep pills if you must. But do you really want to waste part of your life dealing with stupid complaints about unfair moderation? What a time sink! Dump it. It’s far from essential, and keeping it didn’t preserve slashdot’s ability to generate the slashdot effect.

              If you think that user moderation is the killer feature that keeps people on the site, well, it ain’t working here, same as it didn’t on the green site. Is it SO hard to grab a copy of geeklog and skin it so it looks the way you want while still allowing the essentials - stories, comments, and journals? It’s a one-day job (with breaks).

              What do you have to lose at this point?

              • (Score: 3, Insightful) by janrinok on Friday May 21 2021, @04:39PM (1 child)

                by janrinok (52) Subscriber Badge on Friday May 21 2021, @04:39PM (#1137584) Journal

                We are, at this very moment, discussing options on a private channel. And currently ALL of our resources are currently working on recovering from yesterday, or keeping the site going today.

                The only thing that is causing a problem (repeatedly) is one element of the system configuration that is not providing us with any benefit whatsoever - so that is what we are currently working on removing. The rest of the site is working just as we want it to. Let me explain it in an auto analogy - which is the traditional way of doing things around here. What you are suggesting is that we currently have a flat tire but you are recommending that we also paint the car, change the upholstery and fit a new engine too.

                If it can be done in a day I will await your contribution by, shall we say, Sunday evening? Show me something working to convince me - not just make ridiculous suggestions that we haven't got the resources to complete anyway.

                • (Score: 0) by Anonymous Coward on Friday May 21 2021, @10:46PM

                  by Anonymous Coward on Friday May 21 2021, @10:46PM (#1137643)

                  The only thing that is causing a problem (repeatedly) is one element of the system configuration that is not providing us with any benefit whatsoever - so that is what we are currently working on removing.

                  Perhaps that is for the best since you all apparently don't know how to use it properly. Quite a number of people use it under higher loads with better uptimes, after all.

              • (Score: 0) by Anonymous Coward on Saturday May 22 2021, @05:03PM

                by Anonymous Coward on Saturday May 22 2021, @05:03PM (#1137762)

                ...and too much Perl causes brain damage.

                Ah yes, but only when it comes to inferior brains

      • (Score: 5, Insightful) by martyb on Friday May 21 2021, @02:30PM (3 children)

        by martyb (76) Subscriber Badge on Friday May 21 2021, @02:30PM (#1137543) Journal

        One thing to keep in mind is the "heritage" of our code.

        I was with /. before it even had userids! I've witnessed all kinds of attacks on the site. Page-widening trolls. Actual SPAM comments. Mod bombs. Whatever creative nerds could come up with, they threw it at /. and changes were made to mitigate them. It stood up under heavy fire.

        Slashcode begat rehash which is the open-source, freely available code that powers this site. Our foundation is solid.

        Also, there is MUCH MUCH more going on behind the scenes. I dare say the admin interface has AT LEAST as much going on as what is presented to the community. Quite possibly twice (or thrice) as much. Every once in a while I find yet-another setting or configuration that could be tweaked!

        The foundation is solid.

        Admittedly, the site would benefit from some tuning. When SoylentNews started, it was difficult to foresee what areas would grow fastest and what needed to be allocated. I mean, here is my first comment on the site: comment 255 [soylentnews.org], and here I am replying to comment number 1,137,513!

        Remember, too, this site is run by volunteers in their spare time.

        Sure, it would be wonderful to have paid, full-time staff monitoring the site 24/7/365 like on Reddit or the like. How much would that cost per year? 3 x 8-hour shifts per day x 365 days-per-year is 8760 hours. At $15.00 per hour (dirt cheap for these kinds of skills!) that works out to $131,400 per year! And that does not even include server hosting costs! More realistically, at just $30.00 per hour, that works out to $1,839,600 per year! And that does not even include hosing costs.

        SoylentNews gets by on just $7,000 for an entire year!. And that includes the annual costs of being incorporated, filing taxes,hosting expense, everything!

        .

        --
        Wit is intellect, dancing.
        • (Score: 4, Informative) by martyb on Friday May 21 2021, @02:38PM

          by martyb (76) Subscriber Badge on Friday May 21 2021, @02:38PM (#1137546) Journal

          Oops! His submit instead of preview.

          s/hosing/hosting/

          s/131,400/919,800/

          There's prolly some more; it was a LONG day yesterday!

          --
          Wit is intellect, dancing.
        • (Score: 0) by Anonymous Coward on Friday May 21 2021, @03:07PM (1 child)

          by Anonymous Coward on Friday May 21 2021, @03:07PM (#1137552)
          Marty, I like you, but seriously, there are SO many flaws in your post.

          Yes, it’s expensive keeping a full-time dev on the payroll. That’s why hobby sites like soylent don’t do that - they use widely used open source CMS packages that have proper documentation, a developer community, and use a broadly used language combo - the most popular being written in php and any MySQL or PostgreSQL variant.

          Not slash. Not rehash. Perl lost the race a long time ago.

          What do users want? Articles, the ability to post comments, and journals. Plenty of CMS packages using php and a database server can do that without the legacy of Perl.

          Grab a copy of geeklog and play around with it. You should be able to have a functional site with stories, comments , and journals. And of c, the administrative backend contains all the functionality you want to hide from users.

          About Geeklog

          Geeklog is an open source application for managing dynamic web content. It is written in PHP and supports MySQL or PostgreSQL as the database backend.

          "Out of the box", Geeklog is a CMS, or a blog engine with support for comments, trackbacks, multiple syndication formats, spam protection, and all the other vital features of such a system.

          The core Geeklog distribution can easily be extended by the many community developed plugins and other add-ons to radically alter its functionality. Available plugins include forums, image galleries, and many more.

          This is what you use when you can’t afford to keep a team of developers on staff. Php has a wide user base, so you might actually attract developers, because nobody wants to screw around with Perl. The whole “TMTOWTDI” is a bug, not a feature.

          You might even want to give the site a new, fresher look.

          Seriously, give it a try. Take a shitbox computer, install Linux or FreeBSD on it, and give geeklog a try. It worked for groklaw under traffic you can only dream of. Don’t be fooled by groklaw’s blah appearance. If you know HTML and CSS, and have any graphics talent , you can make it look clean and modern and spiffy as all. Icons for stories? Screw that - real images or graphics that the text wraps around. (You can still keep the topic icons if you must, but they’re really dated).

          As for the whole editorial process, you’d best run a private copy for the editors to edit submissions before someone posts them to the main site. I get that the subs queue is there so people can check before submitting a story, but multiple submissions are a good thing if they contain more information. You’ll probably end up dropping ICQ if editors can see what their proposed stories and edits and included graphics look like, and other editors can cut n paste and change and tweak it, and see the changes right there in the thread in their comments.

          • (Score: 1, Insightful) by Anonymous Coward on Saturday May 22 2021, @11:04AM

            by Anonymous Coward on Saturday May 22 2021, @11:04AM (#1137719)

            Those php sites get pwned every now and then too.

            Geeklog's security track record is crap and the types of vulnerabilities are not confidence inspiring: https://www.google.com/search?q=%22Geeklog%22+exploit [google.com]

            Go get a clue. If you're not going to spend much time and money on a site you don't pick shit that needs to be patched every month.

  • (Score: 1) by khallow on Saturday May 22 2021, @11:35PM (3 children)

    by khallow (3766) Subscriber Badge on Saturday May 22 2021, @11:35PM (#1137836) Journal

    And as a parent poster pointed out, nobody is going to volunteer to fix it if it means continually arguing with Mr “Proud I Don’t Need An Education” Buzztard.

    Even when TMB was in house, nobody was continually arguing with Mr. "Proud". I wonder how much else of your narrative is just as imaginary?

    • (Score: 0) by Anonymous Coward on Sunday May 23 2021, @06:09AM (2 children)

      by Anonymous Coward on Sunday May 23 2021, @06:09AM (#1137905)

      And then you wonder why a system that has a five nines guarantee in a 2/2/2 setup doesn't even have two.

      • (Score: 1) by khallow on Sunday May 23 2021, @01:14PM (1 child)

        by khallow (3766) Subscriber Badge on Sunday May 23 2021, @01:14PM (#1137942) Journal

        And then you wonder why a system that has a five nines guarantee in a 2/2/2 setup doesn't even have two.

        I already know of real world systems - the Space Shuttle, that failed that hard. There's no wondering over here.

        • (Score: 0) by Anonymous Coward on Monday May 24 2021, @12:37AM

          by Anonymous Coward on Monday May 24 2021, @12:37AM (#1138082)

          Because operating on the edge of science and technology at the extremes of risk with single points of failure meeting Swiss Cheese model of reality is directly analogous to running an incorrectly deployed bog-standard cluster deployment that is failing to meet its uptime guarantees despite hundreds of thousands of deployments operating successfully in worse conditions when they do deploy it correctly.

          Right.