Stories
Slash Boxes
Comments

SoylentNews

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 15 submissions in the queue.

Sections

SoylentNews

Log In

Create Account | Retrieve Password

Gift a Subscription

Ongoing Site Server Reboot Schedule [updated]

posted by martyb on Tuesday October 20 2015, @11:12AM

from the wish-us-luck! dept.

martyb writes:

Hello fellow Soylentils!

[Update:] We survived all three days of reboots without major issues. Many thanks to all who prepped the systems, prodded things along, and were on standby to deal with any unforeseen issues!

We were informed by Linode (our hosting provider) that they needed to perform some maintenance on their servers. This forces a reboot of our virtual servers which may cause the site (and other services) to be temporarily unavailable.

Here is the three-day reboot schedule along with what runs on each server:

Status	Day	Date	Time	Server	Affects
Done	Tues	2015-10-20	0200 UTC	boron	DNS, Hesoid, Kerberos, Staff Slash
Done	Tues	2015-10-20	0500 UTC	beryllium	IRC, MySQL, Postfix, Mailman, Yourls
Done	Wed	2015-10-21	0500 UTC	sodium	Primary Load Balancer
Done	Wed	2015-10-21	0500 UTC	magnesium	Backup Load Balancer
Done	Wed	2015-10-21	0700 UTC	neon	Production Back End, MySQL NDB cluster
Done	Thu	2015-10-22	0200 UTC	hydrogen	Production Front End, Varnish, MySQL, Apache, Sphinx
Done	Thu	2015-10-22	0500 UTC	helium	Production Back End, MySQL NDB, DNS, Hesoid, Kerberos
Done	Thu	2015-10-22	0900 UTC	fluorine	Production Front End, slashd, Varnish, MySQL, Apache, ipnd
Done	Thu	2015-10-22	1000 UTC	lithium	Development Server, slashd, Varnish, MySQL, Apache

We apologize in advance for any inconvenience and appreciate your understanding as we try and get things up and running following each reboot.

Original Submission

This discussion has been archived. No new comments can be posted.

Ongoing Site Server Reboot Schedule [updated] | Log In/Create an Account | Top | 31 comments | Search Discussion

The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.

Based on timing Based on timing (Score: 2) by NCommander on Monday October 19 2015, @06:27PM

by NCommander (2) <michael@casadevall.pro> on Monday October 19 2015, @06:27PM (#251929) Homepage Journal

This should have limited impact on the site itself; automatic IP failover with Linode is a bit dodgy, and it doesn't work with IPv6, so when our LBs go down, the site is going to down. soylentnews.org otherwise has no single points of failure, so other than decreased performance, only our secondary services should be impacted by this.

--
Still always moving
- Re:Based on timing Re:Based on timing (Score: 2) by isostatic on Monday October 19 2015, @07:41PM
  
  by isostatic (365) on Monday October 19 2015, @07:41PM (#251974) Journal
  
  Isn't the fact you've got both load balancers going down at the same time an issue?
  Wed 2015-10-21 5:00:00 AM UTC sodium Primary Load Balancer
  Wed 2015-10-21 5:00:00 AM UTC magnesium Backup Load Balancer
  
  Parent
  - Re:Based on timing Re:Based on timing (Score: 2) by NCommander on Monday October 19 2015, @08:37PM
    
    by NCommander (2) <michael@casadevall.pro> on Monday October 19 2015, @08:37PM (#252017) Homepage Journal
    
    It's more complicated than that. The secondary loadbalancer is a manual failover, not automatic. Heartbeat and other hotfailover solutions don't appear to work on Linode's internal network. The secondary is mostly meant if we need to offline the primary loadbalancer for an extended period.
    Furthermore, Linode still doesn't support IPv6 failover, and about 10% of site traffic is v6, which means the only way to redirect that traffic is to change the AAAA record.
    
    --
    Still always moving
    
    Parent
    - Re:Based on timing Re:Based on timing (Score: 2) by sjames on Tuesday October 20 2015, @09:26PM
      
      by sjames (2882) on Tuesday October 20 2015, @09:26PM (#252477) Journal
      
      Reboot magnesium, verify it, do the failover (reversing their roles), let the DNS propagate, and then reboot sodium (now the secondary)?
      
      Parent
      - Re:Based on timing Re:Based on timing (Score: 2) by NCommander on Tuesday October 20 2015, @10:42PM
        
        by NCommander (2) <michael@casadevall.pro> on Tuesday October 20 2015, @10:42PM (#252508) Homepage Journal
        
        Would work great if they weren't both going down at the same time.
        
        --
        Still always moving
        
        Parent
        
        Re:Based on timing Re:Based on timing (Score: 2) by sjames on Tuesday October 20 2015, @11:24PM
        
        by sjames (2882) on Tuesday October 20 2015, @11:24PM (#252514) Journal
        
        Same physical server?
        
        Parent
        
        Re:Based on timing (Score: 2) by NCommander on Wednesday October 21 2015, @12:22AM
        
        by NCommander (2) <michael@casadevall.pro> on Wednesday October 21 2015, @12:22AM (#252535) Homepage Journal
        
        Probably. Linode doesn't let us see the physical node they're both on, which is a problem when they reboot them.
        
        --
        Still always moving
        
        Parent
Uh? Uh? (Score: 2, Interesting) by jon3k on Monday October 19 2015, @06:36PM

by jon3k (3718) on Monday October 19 2015, @06:36PM (#251931)

What's the point of "the cloud" and all this redundancy when you still have reboots that cause outages?
- Re:Uh? Re:Uh? (Score: 0) by Anonymous Coward on Monday October 19 2015, @06:46PM
  
  by Anonymous Coward on Monday October 19 2015, @06:46PM (#251938)
  
  At least they don't take weekends off (no new stories for 2 days) like that other site does.
  
  Parent
  - Re:Uh? Re:Uh? (Score: 1, Flamebait) by wonkey_monkey on Monday October 19 2015, @06:49PM
    
    by wonkey_monkey (279) on Monday October 19 2015, @06:49PM (#251940) Homepage
    
    like that other site does.
    What, Slashdot? You did mean Slashdot, didn't you?
    Huh. Look at that. I said Slashdot twice (three times now) and nothing bad happened.
    
    --
    systemd is Roko's Basilisk
    
    Parent
    - Re:Uh? Re:Uh? (Score: 2, Funny) by Anonymous Coward on Monday October 19 2015, @06:55PM
      
      by Anonymous Coward on Monday October 19 2015, @06:55PM (#251943)
      
      Does commander taco popup behind you and murder you?
      
      Parent
      - Re:Uh? (Score: 2) by isostatic on Monday October 19 2015, @08:42PM
        
        by isostatic (365) on Monday October 19 2015, @08:42PM (#252019) Journal
        
        Who is Commandher taco?
        
        Parent
    - Re:Uh? (Score: 4, Funny) by DECbot on Monday October 19 2015, @07:03PM
      
      by DECbot (832) on Monday October 19 2015, @07:03PM (#251946) Journal
      
      You said the word that the knights of Soylent cannot stand to hear. You said the word again! Stop saying the Worrrd!
      
      --
      cats~$ sudo chown -R us /home/base
      
      Parent
    - Re:Uh? Re:Uh? (Score: 5, Funny) by maxwell demon on Monday October 19 2015, @07:13PM
      
      by maxwell demon (1608) on Monday October 19 2015, @07:13PM (#251952) Journal
      
      and nothing bad happened.
      Say you. But the moment you typed that word, a freak wormhole opened up in the fabric of the space-time continuum and carried thi words far far back in time across almost infinite reaches of space to a distant Galaxy where strange and warlike beings were poised on the brink of frightful interstellar battle.
      The two opposing leaders were meeting for the last time.
      A dreadful silence fell across the conference table as the commander of the Vl'Hurgs, resplendent in his black jewelled battle shorts, gazed levelly at the the G'Gugvuntt leader squatting opposite him in a cloud of green sweet-smelling steam, and, with a million sleek and horribly beweaponed star cruisers poised to unleash electric death at his single word of command, challenged the vile creature to take back what it had said about his mother.
      The creature stirred in his sickly broiling vapour, and at that very moment the word naming that other site drifted across the conference table.
      Unfortunately, in the Vl'Hurg tongue this was the most dreadful insult imaginable, and there was nothing for it but to wage terrible war for centuries.
      Congratulations, you started an interstellar war.
      
      --
      The Tao of math: The numbers you can count are not the real numbers.
      
      Parent
      - Re:Uh? Re:Uh? (Score: 0) by Anonymous Coward on Monday October 19 2015, @08:36PM
        
        by Anonymous Coward on Monday October 19 2015, @08:36PM (#252015)
        
        Ha! I had a 6 hour drive last Thursday and listened to the original radio play version of HHGTTG. Hadn't heard it in years, still very good! A friend digitized and de-noised my original set (12 half-hour episodes), which I originally taped off the local NPR FM station (in USA).
        Note that this is different/better than the commonly available radio play which was remade at some point. We suspect (but don't know for sure) that the original version contains material/music that is (C) by others, which the BBC didn't want to license for public sale...
        
        Parent
        
        Re:Uh? (Score: 2) by wonkey_monkey on Monday October 19 2015, @09:48PM
        
        by wonkey_monkey (279) on Monday October 19 2015, @09:48PM (#252068) Homepage
        
        Note that this is different/better than the commonly available radio play which was remade at some point.
        It wasn't remade, as far as I can ascertain, but bits of episode three were cut because Marvin hums/sings copyrighted tunes(!). Various releases have had the opening theme replaced with a re-recording, and/or been otherwise remastered, but I can't see that they've ever been back and remade any of it.
        Also the original commercial releases had their pitch altered slightly by mistake.
        
        --
        systemd is Roko's Basilisk
        
        Parent
    - Re:Uh? (Score: 3, Funny) by DeathMonkey on Monday October 19 2015, @07:17PM
      
      by DeathMonkey (1380) on Monday October 19 2015, @07:17PM (#251956) Journal
      
      Bloody Mary only comes out at night. (Won't) see you tomorrow!
      
      Parent
    - Re:Uh? (Score: 2) by isostatic on Monday October 19 2015, @08:10PM
      
      by isostatic (365) on Monday October 19 2015, @08:10PM (#251997) Journal
      
      You know thats what Hermoine used to say, before they put a taboo on the name and deatheaters trapped her in an alley and raped her.
      Do you really want dice to do that?
      
      Parent
    - Re:Uh? (Score: 2) by kurenai.tsubasa on Tuesday October 20 2015, @03:05PM
      
      by kurenai.tsubasa (5227) on Tuesday October 20 2015, @03:05PM (#252320) Journal
      
      Candlejac^hj%$#@+++NO CARRIER
      
      Parent
  - Re:Uh? Re:Uh? (Score: 2) by takyon on Monday October 19 2015, @08:23PM
    
    by takyon (881) <takyonNO@SPAMsoylentnews.org> on Monday October 19 2015, @08:23PM (#252009) Journal
    
    I just looked at Slashdot and they had stories this weekend. Is it random weekends?
    
    --
    [SIG] 10/28/2017: Soylent Upgrade v14 [soylentnews.org]
    
    Parent
    - Re:Uh? (Score: 2) by bryan on Tuesday October 20 2015, @12:01AM
      
      by bryan (29) <bryan@pipedot.org> on Tuesday October 20 2015, @12:01AM (#252114) Homepage Journal
      
      The AC was likely referring to the other other site. The one with run by that shady character and, heaven forbid, did have a 2 day gap in the stories.
      
      Parent
- Re:Uh? (Score: 3, Informative) by isostatic on Monday October 19 2015, @07:57PM
  
  by isostatic (365) on Monday October 19 2015, @07:57PM (#251990) Journal
  
  The cloud means many things. If you're trying to get funding from a PHB it means a VM that you run on your laptop (private mobile cloud to leaverage the synergies of multi-modal value-adding fusion!).
  There's then the traditional hosting, but virtualised where it's cheaper to buy a VM from someone who can benefit from economies of scale, rather than run your own servers in a colo. There's no real difference between SN having 9 real HP DL360s using up 9RU and 18 network ports in someones rack, and having SN having 9 VMs, other than the VMs have less unexpected downtime (hardware failures), and have more known downtime (VM security upgrades in this case)
  Then there's what I determine is real cloud, which is designing your entire software end-to-end to avoid any single points, and have it automatically scale and heal itself (and kill bits of itself too, to avoid nasty surprises down the line). This means having multiple servers in multiple physical locations, ideally with multiple providers. You have 3 or 4 machines running in amazon east coast, and 3 or 4 in google west coast, and a few on linode in singapore and a couple in rackspace in London.
  DNS (distributed across multiple servers as normal) points to any of those locations that have "checked in" recently, with a short TTL, and is answered by load balancers. If a location fails to check in, the record is removed and you're back up in a few seconds.
  Tasks are spread across the machines, you have a distributed management that can spin up new servers as load increases, and shuts them down as load decreases. If you had 3 webservers for example, you might need more as the number of concurrent hits on your installation heads up to say 600, so you spin up a couple more. As it drops back to 200, you drop two servers off.
  If a server dies (power outage, software update, etc), that's fine, as new ones spin up automatically in 20 seconds in another area, and depending on your budget you'll overprovision stuff. In the example above, if amazon goes titsup, a new server it automatically created on linode and the service continues. If an earthquake hits the west-coast, traffic is diverted in seconds to the other 3 locations.
  At least that's what I understand as cloud, open for any dissenting views. After a few years mainly using old-school IT (real iron, with most of the kit has physical sdi, gpi and audio interfaces), as a small part of my job doing broadcast I'm now back in a department that is solely infrastructure and devops. I'm trying to learn enough about things like ec2, openstack, docker, puppet, ansible, rabbitmq, etc. to understand the best way to go about things.
  
  Parent
- Re:Uh? (Score: 3, Informative) by NCommander on Monday October 19 2015, @08:39PM
  
  by NCommander (2) <michael@casadevall.pro> on Monday October 19 2015, @08:39PM (#252018) Homepage Journal
  
  The cloud is cheaper for small businesses who can't afford a CoLo solution or similar, or require additional processing power on demand. I've always been spectical of moving from a private server room to the cloud, but at SN has no physical assets, we don't have much of a choice.
  
  --
  Still always moving
  
  Parent
Why so many? Why so many? (Score: 2) by Subsentient on Monday October 19 2015, @09:02PM

by Subsentient (1111) on Monday October 19 2015, @09:02PM (#252031) Homepage Journal

I don't understand why SN needs so many servers. It's not a particularly *high traffic* site, nor is it running some giant computational framework, so, don't know why we have more than 3 at very most.

--
"It is no measure of health to be well adjusted to a profoundly sick society." -Jiddu Krishnamurti
- Re:Why so many? (Score: 5, Informative) by NCommander on Monday October 19 2015, @09:23PM
  
  by NCommander (2) <michael@casadevall.pro> on Monday October 19 2015, @09:23PM (#252044) Homepage Journal
  
  Redundancy. With the exception of the load balancer, we can pull the plug on any machine, and the site stays up and functioning. At a minimium, that requires two web front-ends, and two databases. We've had machines go out of service for an extended period (hydrogen was down for an extended period requiring a full rebuild ultimately). Being able to repair a server without the stress of knowing the site is completely down is worth its weight in gold.
  The rest has been driven by the fact that you can only cram so much in 2-4 GiB of RAM. In total, five machines drive the site normally, one load balancer, two web frontends, two db backends. The rest is the mail server+IRC, an independent development server, and misc services box like tor.
  
  --
  Still always moving
  
  Parent
- Re:Why so many? (Score: 2) by VLM on Tuesday October 20 2015, @12:56PM
  
  by VLM (445) on Tuesday October 20 2015, @12:56PM (#252269)
  
  At $workplace I enjoy being able to halt DB server #7 with transparent failover, clone it in the NAS and rename the clone to "test DB", upgrade or otherwise F around with the test DB, swap it in for DB#2 in production once I trust the changes, or delete the test DB image and start over, then more or less do the same with puppet, then unleash the puppet.
  You could integrate it into one box vertically, but things get complicated when you mix multiple people doing multiple things on multiple projects and then do IP address level operations to swap test/dev/prod all around.
  At legacy companies and sites, the middlemen get in the way with virtual stuff just as much as they used to in the physical era, but cloud-i-ness doesn't have to be as screwed up as the old days. So "its no big deal" to spin up virtual servers as part of day to day operations unless the legacy middlemen are still standing in the way and try to turn creating a simple little image into some kind of insane capex purchase project.
  A good analogy from the old days when I worked in a dinosaur pen is purchasing more mainframe DASD is a major departmental project, but this is the era of a secretary picking up a box of blank floppy disks on the way to work, so there are different mindsets in how you swap things around or otherwise operate.
  
  Parent
Future headlines. Future headlines. (Score: 4, Funny) by Anonymous Coward on Tuesday October 20 2015, @01:57AM

by Anonymous Coward on Tuesday October 20 2015, @01:57AM (#252141)

Physicists hurry to name elements, as soylent's server load increases.
After experiencing phenomenal growth for the past several years, and now reaching a few hundred million unique visitors daily, soylentnews' server farm has expanded from 9 / 114 named elements, to 113 / 114 named elements. Physicists looked on with concern, as soylentnews suggested powering on a 114th server, logically named livermorium. "After the 114 named elements, there are just stupid placeholder names like unununium", explained an a physicist who choose to be called anonymous coward. "And something stupid like 'unununium', just wouldn't do, for a server name"
- Re:Future headlines. (Score: 4, Funny) by NCommander on Tuesday October 20 2015, @05:30AM
  
  by NCommander (2) <michael@casadevall.pro> on Tuesday October 20 2015, @05:30AM (#252181) Homepage Journal
  
  When we ran out of elements we'll switch to fruit.
  
  --
  Still always moving
  
  Parent
- Re:Future headlines. (Score: 2) by Dr Spin on Tuesday October 20 2015, @09:51AM
  
  by Dr Spin (5239) on Tuesday October 20 2015, @09:51AM (#252226)
  
  Clearly using Unobtanium as temporary replacement isn't working well either!
  
  --
  Warning: Opening your mouth may invalidate your brain!
  
  Parent
woot many servers... woot many servers... (Score: 2) by zugedneb on Tuesday October 20 2015, @12:23PM

by zugedneb (4556) on Tuesday October 20 2015, @12:23PM (#252260)

I thought you run the site from pentium 3 with an SSD (why not?)...
Pending Wed 2015-10-21 0500 UTC sodium Primary Load Balancer
Pending Wed 2015-10-21 0500 UTC magnesium Backup Load Balancer
What loads are you guys balancing?
ps. with 9 servers there is room for many more trolls. I will do what I can.

--
old saying: "a troll is a window into the soul of humanity" + also: https://en.wikipedia.org/wiki/Operation_Ajax
- Re:woot many servers... (Score: 2) by cmn32480 on Tuesday October 20 2015, @02:27PM
  
  by cmn32480 (443) <cmn32480NO@SPAMgmail.com> on Tuesday October 20 2015, @02:27PM (#252306) Journal
  
  We appreciate your thoughtfulness!
  
  --
  "It's a dog eat dog world, and I'm wearing Milkbone underwear" - Norm Peterson
  
  Parent

Moderator Help