Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 15 submissions in the queue.
Meta
posted by martyb on Friday May 21 2021, @12:25AM   Printer-friendly

As many of you noticed, we had a site crash today. From around 1300 until 2200 UTC (2021-05-20).

A HUGE thank you goes to mechanicjay who spent the whole time trying to get our ndb (cluster) working again. It's an uncommon configuration, which made recovery especially challenging... there's just not a lot of documentation about it on the web.

I reached out and got hold of The Mighty Buzzard on the phone. Then put him in touch with mechanicjay who got us back up and running using backups.

Unfortunately, we had to go way back until April 14 to get a working backup. (I don't know all the details, but it appears something went sideways on neon).

We're all wiped out right now. When we have rested and had a chance to discuss things, we'll post an update.

In the meantime, please join me in thanking mechanicjay and TMB for all they did to get us up and running again!

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by janrinok on Friday May 21 2021, @11:37AM (6 children)

    by janrinok (52) Subscriber Badge on Friday May 21 2021, @11:37AM (#1137520) Journal

    Rehash was not the cause of the latest crash (as far as we can ascertain) - it is not where the focus is at present. That is not to say it will never be replaced but, for the time being, it is still working as expected. Currently, we have not got the resources to replace Rehash with a different language or package. If it ain't broke, don't fix it.

    You are focussing on an area that is not causing us a problem at the moment. The system configuration is where we continue to encounter problems and that is where mechanicjay is currently concentrating his efforts.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 0) by Anonymous Coward on Friday May 21 2021, @01:58PM (5 children)

    by Anonymous Coward on Friday May 21 2021, @01:58PM (#1137534)

    If it ain't broke, don't fix it.

    More often than not, this actually means "the fix is too hard, so it cannot possibly be broken".

    • (Score: 2) by janrinok on Friday May 21 2021, @02:08PM (4 children)

      by janrinok (52) Subscriber Badge on Friday May 21 2021, @02:08PM (#1137536) Journal

      Rehash is working today as advertised, the system configuration isn't - with very limited resources which one would you work on first?

      • (Score: 0) by Anonymous Coward on Friday May 21 2021, @02:34PM (3 children)

        by Anonymous Coward on Friday May 21 2021, @02:34PM (#1137544)
        It broke … again. And nobody else is using the code anyway. Why not? Because it’s fragile, and too much Perl causes brain damage. Someone mentioned pipedot.org as an example - a site that has been inactive since April 2017.

        Another post mentioned using a separate process to update story counts. Someone needs to learn to code better, and brushing up on sql as well. Just because the original devs didn’t know how to do it right is no excuse to preserve shit like that. This is 2021, not 1995.

        TMB fücked up by not using a LIMIT clause in SQL that would have avoided time-outs under load. Experienced devs will ALWAYS seek ways that guarantee the most efficient use of resources because they don’t want intermittent bugs. Rehash is a total hash. Either learn to code or get something that other people are maintaining because it’s widely used, in a language that is widely used for web development. But you won’t. You will continue to ignore the red flags.

        Why the resistance to a clean-sheet rethink of the site? Articles, user comments , and user journals are the only essentials. The polls suck, but most CMS packages contain poll functionality, so keep pills if you must. But do you really want to waste part of your life dealing with stupid complaints about unfair moderation? What a time sink! Dump it. It’s far from essential, and keeping it didn’t preserve slashdot’s ability to generate the slashdot effect.

        If you think that user moderation is the killer feature that keeps people on the site, well, it ain’t working here, same as it didn’t on the green site. Is it SO hard to grab a copy of geeklog and skin it so it looks the way you want while still allowing the essentials - stories, comments, and journals? It’s a one-day job (with breaks).

        What do you have to lose at this point?

        • (Score: 3, Insightful) by janrinok on Friday May 21 2021, @04:39PM (1 child)

          by janrinok (52) Subscriber Badge on Friday May 21 2021, @04:39PM (#1137584) Journal

          We are, at this very moment, discussing options on a private channel. And currently ALL of our resources are currently working on recovering from yesterday, or keeping the site going today.

          The only thing that is causing a problem (repeatedly) is one element of the system configuration that is not providing us with any benefit whatsoever - so that is what we are currently working on removing. The rest of the site is working just as we want it to. Let me explain it in an auto analogy - which is the traditional way of doing things around here. What you are suggesting is that we currently have a flat tire but you are recommending that we also paint the car, change the upholstery and fit a new engine too.

          If it can be done in a day I will await your contribution by, shall we say, Sunday evening? Show me something working to convince me - not just make ridiculous suggestions that we haven't got the resources to complete anyway.

          • (Score: 0) by Anonymous Coward on Friday May 21 2021, @10:46PM

            by Anonymous Coward on Friday May 21 2021, @10:46PM (#1137643)

            The only thing that is causing a problem (repeatedly) is one element of the system configuration that is not providing us with any benefit whatsoever - so that is what we are currently working on removing.

            Perhaps that is for the best since you all apparently don't know how to use it properly. Quite a number of people use it under higher loads with better uptimes, after all.

        • (Score: 0) by Anonymous Coward on Saturday May 22 2021, @05:03PM

          by Anonymous Coward on Saturday May 22 2021, @05:03PM (#1137762)

          ...and too much Perl causes brain damage.

          Ah yes, but only when it comes to inferior brains