An idea that came to mind during talk about Soylent hosting on IRC today.
Thanks to Titanium, prospectacle, stderr, useless, swiss, FoobarBazbot and MrBluze for a lively discussion :-)
It started with:
[19:43] * crutchy wonders if a distributed service could be developed... divide the load, build in redundancy and if anyone's host goes down others will pick up the slack
Time is AEDT
The idea has been developed a little bit further since the IRC discussion.
General System
==============
- independent of DNS
- with a distributed model anyone can volunteer to host a node (no single person relied on to front hosting costs)
- system consists of a network of apache host nodes set up by volunteers willing to cover the costs of their host, and users connect to the web service using their web browser with the remote host selected by a launcher program
Host Node
=========
- not required to access web service (only for those who choose to offer to host)
- apache web server configured for web service (mysql, mod_perl, etc as required)
- must periodically execute a script (using crontab?) that requests nodelist from listed nodes and updates local nodelist as required (adds/removes based on some kind of agreement algorithm)
- must respond to nodelist requests from launchers, but can be isolated php/pl/etc script and need not be built into hosted service
- must contain scripts to synchronize data and site source code updates securely with other nodes as required (this will be the tricky bit)
Launcher
========
- user executes launcher to access web service
- no gui
- executable or source downloaded from trusted location (such as debian repository or github) along with nodelist containing one or more known host IP addresses
- purpose is to select a remote host node and open web browser pointed to selected remote host IP address
- before opening browser, a nodelist request is sent to every host in local nodelist, and local nodelist is updated in same way as server nodelist is updated (see above)
- possibly a simple settings file if required
- could be a bash script for Linux and a small Delphi or C program for Windows
On IRC there was concern expressed about security and verification of host nodes.
Since you're using a browser as a client (and all the security features that come with) and you're only receiving normal http responses (otherwise your browser would throw an error), there's only so much bad stuff a host node can do.
Worst case scenario might be that it redirects to goat.cx or some site with driveby downloads (which most browsers will block anyway).
If required a trusted network of host nodes could be formed using signed certificates (perhaps using OpenSSL).
Nodelists may not be that big since there isn't likely to be a huge number of hosts for the same service (such as SoylentNews) but if need be the list could be gzipped. As mentioned earlier, the tricky bit will be synchronizing website data and service application source code, but I don't think it is an insurmountable challenge.
edit: data could be distributed, but would need to be synchronized on all host nodes (the tricky bit mentioned above)
edit: thinking about data synchronizing... would either require modification of the service application to execute a script when data is changed (and script would do the work of sending data to other hosts) or a shell script with a loop that checks for changes to data file timestamps and if change is detected send data files to other hosts. for max efficiency it would be ideal just to post a single mysql insert/modify query whenever data changes, but that would require integration into the main application (slashcode in Soylent's case). you don't want to be sending entire database files around the place whenever there is a change. a good place to start might be to host the data on one or two high performance 'supernodes' until an improved synch system can be developed.
There are also some things to avoid:
As many might know SoylentNews resides on http://li694-22.members.linode.com/¹ and because of this some people were talking and joking about using li694-22 as a new name. It's a cool name, I was tempted myself! Perhaps an even "weirder" inside joke than http//:/..org :)
No need to be tempted any more; a Mr. Watt (not me!) of Washington bought it and pointed it at SoylentNews¹ :)
¹ naturally your cookies are in different jars
Edit: just to practice safe surfing don't log in through redirection or move your cookies manually or anything like that. Not that I would think anything bad would happen in this case but one would never know until it was too late (maybe Mr. Watt suddenly develops an appetite for collecting low UID accounts).
Since I've moderated and can't be bothered to log out I'll write some thoughts here for my own interest. By no means is this meant to be any kind of complete answer or anything of the sort, just some idle thoughts/speculation.
0.a. It is an entirely unknown failure mode that is sudden and immediately cripples everything. Very unlikely.
0.b. It is an entirely unknown phenomenon that is sudden and immediately cripples everything. Extremely unlikely but not zero.
0.c. A confluence of simultaneous and lasting shoddy operation and systems malfunction in two culturally different countries (Malaysia and Viet Nam). This one is hard to judge; I wouldn't think so on behalf of Viet Nam but they hadn't yet taken airspace control/responsibility for the plane and might not have paid much if any attention to it. Malaysia is fully able to fuck anything up beyond rational belief (*cough* bigoted apartheid-style legislation on the use of a word *cough*) but even so Viet Nam should still have the radar records and be fully able to find anything if there in fact was a more normal disaster.
I guess the simplest ad hoc would be 0.b. with some kind of unusual simultaneous failure of radar range for whatever reason: the signal would then simply disappear giving no clues about anything. If this was caused by some freak meteorological event local to the aircraft it might explain the total lack of everything except debris which might be found later. It might not have to last all that long if the electronics in the plane are knocked out before any remaining related blips on now-functioning radars disappear among the noise. Still extremely unlikely. Inverted clear sky sprite plasma bolts (no such thing is known to exist) or time-space warp bubbles (sorry no link to the paper handy and no such thing is known to exist) or alians!!1 (etc.) or whatever, but who knows.
1. Whether or not some terrorist organization claims responsibility doesn't mean much. Some YKW (You Know Who) organizations claim just about everything or are created solely to claim credit for anything new (like happened for the attacks in Oslo before those claims were discredited) and all it takes for the opposite to happen are a few things:
1.a.1. Whoever did it has discovered and understood the meaning of tactics, and the incident while public in nature is also long term in nature (there are several possibilities here, I'm not comfortable with spelling it out). Somewhat likely.
1.a.2. Whoever responsible simply (and without any deeper thought) doesn't want to draw attention to something that is still ongoing. Fairly likely.
and
1.b.1. For whatever reason(s) the incident fails to trigger knee-jerk claims. Doing something to a flight from a YKW nation to China should naturally avoid most if not all such attention because China is kind of outside the horizon of most YKW despite the recent YKW attacks both in Beijing and western China. Not too unlikely.
1.b.2. Someone figured it was stupid and counterproductive to make bullshit claims and has the clout to stop those who still don't get it. Very unlikely but not impossible.
For a 1.a.2. that passes 1.b.1 it seems very likely that some YKW "Chinese" did this to simply kill as many Chinese as possible. Such YKW "Chinese" aren't known to be big on making public statements of responsibility, in fact they seldom say anything at all (probably it makes them very easy to catch and kill) so that fits.
Oil slicks don't mean much on their own but are often the first thing spotted. If nothing else is spotted (lots of debris floats for a fairly long time) then 0.b. increases.
Sometimes there isn't an answer.
had a go at scripting a little quick & dirty irc bot for soylent
requires sic (http://tools.suckless.org/sic)
if you're using debian: sudo apt-get install sic
#!/bin/bash
chan="#test"
log="test.log"
pipe="log-pipe"trap "rm -f $pipe" EXIT
if [[ -f $log ]]; then
rm $log
fiif [[ ! -p $pipe ]]; then
mkfifo $pipe
fisubstr="End of /MOTD command"
joined=""sic -h "irc.sylnt.us" -n "log-bot" <> $pipe | while read line; do
if [[ -n "$line" ]]; then
echo $line >> $log
fi
if [[ -z "$joined" ]] && [[ -z "${line##*$substr*}" ]]; then
joined="1"
echo ":j $chan" > $pipe
fi
doneexit 0
also posted on the wiki @ http://wiki.soylentnews.org/wiki/index.php/User:Crutchy#IRC_logging_bot
After a minor problem with virtualbox (f*ck you nvidia) I got the slashdev virtual machine going. If you're running a 32-bit host OS (as I do), you can probably still run the 64-bit slashdev VM. You just need to make sure your CPU supports it (Intel VT-x or AMD-V) and that it's enabled in your BIOS (usually disabled by default). GIYF.
When you're importing the vm, gotta make sure you don't hit the checkbox that reassigns mac addressses on network interfaces, cos eth0 won't show up in ifconfig and you won't have internet access.
After a quick flick through the bash history I realised that sudo works with the "slash" user.
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install gnome
*hides* (cli is awesome, but on its own is claustrophobic for me)
login under gnome classic session (default ubuntu session fails to login, not that i mind)
Ephiphany works as a web browser, but I prefer firefox/iceweasel:
sudo apt-get install iceweasel
Can also use synaptic with same password as slash user.
To start apache (compiled per slashcode install instructions, not from repositories), open a terminal:
./apache/bin/apachectl start
Full command is (just for the curious):
/srv/slashdev/apache/bin/apachectl start
Start the slashd (slash daemon) - gleaned from bash history:
sudo /etc/init.d/slash start
Close slashd terminal window (will continue to run in background).
Open Firefox:
http://localhost:1337/
Apache public directory:
/srv/slashdev/slash/themes/slashcode/htdocs/
It contains mostly links to files in the /srv/slashdev/slash/ directory.
It was nice of NCommander to make the slash user home directory as /srv/slashdev... thanks for that
Tried to register a new user but doesn't seem to work. Seemed like maybe MTA not configured. I use exim4 normally on my debian boxen (removes postfix):
sudo apt-get install exim4
sudo dpkg-reconfigure exim4-config
During configuration, mostly self-explanatory (select defaults for all except make sure to select option "internet site; mail is sent and received directly using SMTP"). Tested password retrieval with exim4 ok. As per usual check your junk folder in hotmail etc.
Sagasu is an awesome search tool:
sudo apt-get install sagasu
After install, you'll find it under Application -> Accessories
Change your file pattern to *.pl or whatever (can just use * if you want), select "/srv/slashdev/slash" as your search directory, uncheck match case, enter a search string such as "sub displayComments" and click Search.
Couldn't find sub createEnvironment though (is called at the bottom of a lot of perl files). Anyone got any ideas?
Also recommend installing mysql-workbench.
If anyone finds anything wrong with any of this stuff please let me know.
edit: the other reason why i prefer to install gnome is cos gedit is a great little development tool.
edit: thanks heaps to paulej72 for the git advice. here's the script provided by paulej (i just added the git pull, as also mentioned by paulej):
#!/bin/sh
cd /srv/slashdev/slashcode
git pull
make USER=slash GROUP=slash SLASH_PREFIX=/srv/slashdev/slash installrm -rf /srv/slashdev/slash/site/slashdev/htdocs/*.css
/srv/slashdev/slash/bin/symlink-tool -U
/srv/slashdev/slash/bin/template-tool -U/srv/slashdev/apache/bin/apachectl restart
Note: This produced a couple of errors for me. Don't run this under sudo cos the script has a hissy fit (I had to do a "sudo chown slash:slash -R ./slashcode" to recover).
Also, I use this command to execute the script:
bash ./Desktop/deployslash.sh > ./Desktop/deployslash.log
more so that I can have a squiz at what happened if it goes pear shaped.
9-mar-14
paulej72: If you hand install to /srv/slashdev/slash/themes/slashcode/templates/dispComment;misc;default you need to run /srv/slashdev/slash/bin/template-tool -U to update the templates in the database. Should also restart apache when touching the tempates
work in progress
a minor difficulty i'm having with wrapping my head around slashcode is figuring out where functions are declared. i can use a search tool like sagasu, but i've done something similar to this for php so i thought it would be a fun perl project.
objective: parse code files in a directory tree and output page with linked index of files and functions
doc.pl
#!/usr/bin/perl
print "Content-Type: text/html\n\n";
use strict;
use warnings;##########################
sub doc__main {
print "<!DOCTYPE HTML>\n";
print "<html>\n";
print "<head>\n";
print "<title>Slashcode Doc</title>\n";
print "<meta name=\"description\" content=\"\">\n";
print "<meta name=\"keywords\" content=\"\">\n";
print "<meta http-equiv=\"Content-Type\" content=\"text/html;charset=utf-8\">\n";
print "</head>\n";
print "<body>\n";
print "<p>blah</p>\n";
print "</body>\n";
print "</html>\n";
}##########################
sub doc__functionTree {
my($structure, $allDeclaredFunctions, $allFunctions, $allFiles) = @_;
}##########################
sub doc__recurse {
my($structure, $allDeclaredFunctions, $allFunctions, $allFiles, $allTreeItems, $caption, $type, $level, $id) = @_;
}##########################
sub doc__aboutFile {
my($structure, $allFunctions, $allFiles, $fileName) = @_;
}##########################
sub doc__aboutFunction {
my($structure, $allFunctions, $allFiles, $functionName) = @_;
}##########################
sub doc__linkFile {
my($allFiles, $fileName) = @_;
}##########################
sub doc__linkFunction {
my($allFunctions, $functionName) = @_;
}##########################
sub doc__allFiles {
my($structure) = @_;
}##########################
sub doc__allFunctions {
my($structure) = @_;
}##########################
sub doc__declaredFunctions {
my($structure) = @_;
}##########################
sub doc__loadStructure {
}##########################
sub doc__parseFile {
my($structure, $fileName) = @_;
}##########################
doc__main();
1;
I'm a perl noob. Hopefully if I do some journal writing on my experience it will help keep me motivated.
Got some sort of perl server configuration going. Google not very helpful since most guides are for mod_perl pre 2.0 and apache foundation docs are jibberish to me (maybe I'm just stupid).
Anyway, here's a conf that I kinda butchered up based on a bunch of different sources:
<VirtualHost *:80>
ServerName slash
DocumentRoot /var/www/slash/
Redirect 404 /favicon.ico
<Directory />
Order Deny,Allow
Deny from all
Options None
AllowOverride None
</Directory>
<Directory /var/www/slash/>
SetHandler perl-script
PerlResponseHandler ModPerl::Registry
PerlOptions +ParseHeaders
Options +ExecCGI
Order Allow,Deny
Allow from all
</Directory>
LogLevel warn
ErrorLog /var/www/log/slash/error.log
CustomLog /var/www/log/slash/access.log combined
</VirtualHost>
By the way, this is for Debian Squeeze.
My first hellow world script was also a bit more of an adventure than expected. Most tutorials leave out a header in examples.
/var/www/slash/test.pl
#!/usr/bin/perl
print "Content-Type: text/html\n\n";
use strict;
use warnings;
print "Hello world.\n";
I could (probably should) have used a text/plain mime header, but it worked nonetheless.
Also I can apparently use the following to add a path to @INC
use lib "/var/www/slash/Slash";
I downloaded the soylent/slashcode master branch from https://github.com/SoylentNews/slashcode/archive/master.zip so that I could have a squiz and see if I could be of any help with debugging etc, but although I can read some of it, I need to go to perl school before I can contribute.
My bread and butter programming languages are Delphi and PHP.
This explains a lot about the beginning of slashcode functions that aren't familiar to me:
http://stackoverflow.com/questions/17151441/perl-function-declaration
Perl does not have type signatures or formal parameters, unlike other languages like C:// C code
int add(int, int);int sum = add(1, 2);
int add(int x, int y) {
return x + y;
}Instead, the arguments are just passed as a flat list. Any type validation happens inside your code; you'll have to write this manually. You have to unpack the arglist into named variables yourself. And you don't usually predeclare your subroutines:
my $sum = add(1, 2);sub add {
my ($x, $y) = @_; # unpack arguments
return $x + $y;
}
Is it possible to do pass by reference in Perl?
http://www.perlmonks.org/?node_id=6758
Subroutines:
http://perldoc.perl.org/perlsub.html
Lately I've been working on a little tool to allow remote access to some intranet applications I've been working on. Would be interesting to see what others here thought about the concept.
The applications are normally only accessible on a LAN, with the usual NAT router to the internet.
The aim is to be able to access the applications from the internet without port forwarding in the router.
I've heard of things like BOSH (http://en.wikipedia.org/wiki/BOSH) but haven't found much in the way of specifics and I'm not sure if it does what I want.
The general idea I've been working on is to use a publicly accessible host as a relay between the client (connected to the internet) and the application server (connected to a LAN).
This is kinda how it works at the moment:
To allow remote access, a workstation on the LAN must have open a browser to a URL that uses iframe RPC to periodically poll the relay server. I've set this interval to 3 seconds, which seems OK for testing purposes (would need to be reduced for production). Every 3 seconds the LAN server sends a HTTP request (using php's fsockopen/fwrite/fgets/fclose) and the relay server responds with a list of remote client requests. Most of these responses are empty unless a remote client requests something.
From the remote client perspective, if a user opens their browser to a URL on the relay server, they would normally be presented with some kind of authentication process (I've neglected that for testing purposes) and then they would be able to click a link to access an application that would normally be restricted to the LAN. When they click that link, the relay server creates an empty request file. To respond to the LAN server with a list of requests, the relay server reads the filenames from a directory and contructs the requests list based on files with a certain filename convention (for testing i'm just using "request__0.0.0.0_blah" where 0.0.0.0 is the IP address of the remote client and blah is the raw url encoded request (special chars replaced with % codes).
So one job of the relay server is to maintain a list of remote client request files (including deleting them when the requests have been fulfilled). It would probably be best to use a simple mysql table for this, but for testing I've just used a simple text file in a location that can be written to by apache.
After saving the request, the relay server script instance initiated by the remote client doesn't die, but loops until the request file isn't empty. So while the following is going on, this instance is just looping (although it has a timeout of 5 secs).
After a remote client requests an application from the relay server and the LAN client requests the remote client requests from the relay server (asynchronously, hence the need to use a file or database) the LAN server (through the LAN client iframe and a bit of js) constructs a HTTP request and sends it to the application server (for testing purposes the RPC stub sends the request to its own server, which is processed by the application through a dispatch handler). The application response is returned by fgets call and is processed to modify hyperlinks and img sources etc to suit the relay server instead of the LAN server (still working on this bit for testing) and then posts another request to the relay server with the application page content.
The relay server then takes the page content and saves it to a text file.
The relay server script instance mentioned earlier, that is busy looping away, is checking for the existence of this page content in the request file. I tried doing this check with a call to php's filesize function, but didn't seem to work (thought maybe something to do with the writing and filesize processes being asynch but I don't know) but I found that reading the file using file_get_contents and checking if the content length is greater than zero seemed to work (though not very efficiently I'll admit).
So if the LAN server HTTP request to the relay server containing the application page content gets written to the remote client request file on the relay server, the remote client process on the relay server will read it and output it to the remote client.
If the application page content is output, or the content checking loop times out, the request file is deleted.
Except for link/img targets everything works in testing; I can request a page and it renders on the remote client browser as it would on the LAN (minus images).
Does anyone have any thoughts on this?
The code is fairly simple and short; there's a single routine on the relay server with about 150-odd lines of very sparse code, and there's a single routine on the LAN server with about 100 lines of code (will grow a bit when I get the link/img replacement and get/post param forwarding working, but not much). The application that generates the page content being relayed is thousands of lines of code but I've kept the remote stuff separate.
I'm pretty sure there are dedicated appliances that do this kind of stuff, but does anyone have any experience with them?
There's no doubt other ways to skin this cat, but I'm interested in security, simplicity and of course cost. Aspects that I liked about this approach were that I didn't have to punch a hole in the router and that the process was controllable and monitorable from the client within the LAN (every poll outputs a request status summary).
Would be interesting to find out if you think the idea is good or shit, or if there are aspects that could be improved (no doubt there are plenty). Feel free to comment or not.
Thanks to all those who made SoylentNews a reality!
edit: the setup in this case is a little different from the usual dmz/port forwarding case in that there aren't any ports exposed in the LAN router; i get through because the relay server only ever responds to outbound requests originating from the LAN server. there aren't ever any outbound requests originating from the relay server directly