So, to say the last week has been a dumpster fire is drastically underselling what I've been through. This, combined with having to put things in place to migrate off Twitter, and otherwise deal with all the fallout of that hot mess has, to put it frankly, put free time at something of a premium, hence why this post took so long. For those who missed it, I did fairly long overhaul of our backend, upgrading boxes from Ubuntu 14.04, and rebuilding and replacing others.
At the moment, the site is mostly working, with two exceptions, site search is still down, and IRC is still down. Deucalion has taken up the task of rebuilding the IRCd on modern server software, so it's time to lay down the road going forward past this point.
Read past the fold for more information ...
Right now, the backend is mostly built on an outdated version of mod_perl 2.2, and MySQL cluster, which is very much not a good place to be. Originally as envisioned, I planned this site to be able to be easily scalable, with a larger user base. That's why the infrastructure was designed to be as scalable as it was, with the downside of having a much higher overhead than a more traditional setup has. Furthermore, rehash (the code that powers this site) is, uh, to put it frankly, a beast to work on. It's a 90s era Perl code base and pretty much everything else that implies; if it wasn't for the fact that rehash is one of the main reasons to use SoylentNews, I'd argue it might be time to replace it.
Right now, I'm working on doing another round of server hardening. As it is at the moment, I've got rehash and Apache running in an AppArmor jail, and everything is pretty well sandboxed from everything else, but I still need to go through and adjust a lot of firewalls, and finish decommissioning out a bunch of the boxes. That said, the site is running faster than it has in a long while since a lot of small things got corrected as we went. Sometime this weekend, I'm going to finish adjusting the firewalls to lock it down further, and that should mostly get back to the point where I might have restful sleep again. That being said, there's still a fair bit more to do.
Moving ahead, we need to get off MySQL cluster, and either onto the current mod_perl, or, ideally, FastCGI, to end the Apache dependency entirely. Unfortunately, working on Rehash is quite difficult, and it requires a very specific setup to be viable. My current plan here is to basically get it working in Docker, so its easy to spin up and spin down instances, and return to a less cursed variant of MySQL. This is probably a few hours of work, but I'm hoping that overall it is going to be easy and straightforward to do since most of the backend is fairly well documented at this point. This also leaves me in a decent position to implement a couple of long overdue features, but modernization efforts come first. I'm hoping to livestream my efforts on this on the weeks to come, and I will make stream announcements as I go along.
My intent, based off the policy changes that were made to disallow ACs to post on stories is to sunlight the feature entirely, including in journals and more. The decision to have ACs on SoylentNews was made in 2014, when the Snowden leaks were only a few months old. Furthermore, we've seen from experience that the karma system doesn't go far enough at keeping bad actors from still getting a +2 status. By and large, the numbers underpinning the system need a rework. My general thought is to cap karma at either 10 or 15, and drastically decrease how far into the basement you can go, as well as uncapping posts in moderation to be able to go to -5.
As a rule, incredibly bad takes do get moderated out of existence, but because there's no real penalty for doing so, we get constant shitposts. Time to make this a bit harder to abuse. I've documented the antispam measures on the site before, but the site keeps track of IP addresses and subnets in the form of hashed /24, and /16s (/64 and /48 for IPv6), which has a karma number attached to them. If an IP range goes too far into the basement, it ends up posting at 0 or -1. By adjusting the caps, it should allow this threshold to be reached much more easily, and help bring the signal to noise ratio back to something more "positive".
Furthermore, I believe its generally in the site's interests to allow editors to delete comments. This functionality is actually built into rehash, but has been long disabled. At the time, I felt the community was best self-moderating, but I think on the whole, its better to treat this like a moderated subreddit, and have messages get a notice that they've in-fact been deleted ala reddit. This is a fairly large departure for the site as a whole, but I think one justified given the state of the Internet on 2022. I am open to discussions on all of this, but let me see what all your thoughts are like.
I do intend to keep livestreaming my progress with the site as we go along; and we raised another ~500 dollars towards Trevor Project during the last livestream. I've left that stream unlisted until I've had a chance to finish implementing all the hardening measures I've discussed, but I'm hoping at the end of it, I'll have a pretty good documentary on what it takes to modernize an aging website. As usual, if you want to support me directly: Ko-fi is available for one time donations, or Patreon for a recurring donation.
[ If you are an AC and wish to make a constructive comment, please see my journal. janrinok ]
fingers working faster than my brain:
DELETE: After all, each connection to the server provides the id_rsa.pub anyway.
Far outside my area too, but I remember something that was discussed once here that might be similar?
Assign ACs a unique identifier -- AC001, AC002, ... -- By Article. So if I'm AC005 in this discussion of "Site Updates and The Road Forward ...", my additional posts in this discussion are also AC005. Would help untangle parents and g-parents, etc under the more hotly discussed articles.
That doesn't make them accountable for their actions.
I have no problem with people being entirely anonymous. But the negative side of this is that they all share one account and thus all ACs get penalised for the bad behaviour of the few. I am suggesting that we devise a method of creating accounts where no personally identifiable information is required at all. One method is perhaps use something that is unique to each user but does not identify them, They do not need to have special names they can just use any nickname they choose as every other user does. The benefit is that we will hold no personally identifiable information on ACs, which they say is the the main reason that they do not want to create accounts. We would no longer need Anonymous Coward because everybody can have an account that cannot be linked to them personally in real life, but it is their persistent identity which does not change with each login.
By having invidual accounts only those that refuse to accept the site rules need by sanctioned rather that all accounts. We do not need nor want to know who people actually are, or where they live etc.
The one piece of information that every computer account can have is an id_rsa.pub which is used for SSH connections. the '.pub' element is designed to be given away freely so we are not asking for something that gives away personal information. However, it is not without its own problems. How do we make that item of data transferrable during the log in process to our site? That is something that I haven't solved and would welcome technical advice from the community.
Perhaps we can use the SSL/TLS data that is part of each HTTPS connection but I do not know whether this is feasible. Perhaps somebody with a better understanding of TLS can advise?