Stories
Slash Boxes
Comments

SoylentNews is people

Meta
posted by NCommander on Monday August 08 2016, @12:00PM   Printer-friendly
from the now-with-a+-scores dept.

So after an extended period of inactivity, I've finally decided to jump back into working on SoylentNews and rehash (the code that powers the site). As such, I've decided to scratch some long-standing itches. The first (and easiest) to deploy was HSTS to SoylentNews. What is HSTS you may ask?

HSTS stands for HTTP Strict Transport Security and is a special HTTP header that signifies that a site should only be connected to over HTTPS and causes the browser to automatically load encrypted versions of a website should it see a regular URL. We've forbid non-SSL connections to SN for over a year, but without HSTS in place, a man-in-the-middle downgrade attack was possible by intercepting the initial insecure page load.

One of the big views I have towards SoylentNews is we should be representative of "best practices" on the internet. To that end, we deployed IPv6 publicly last year, and went HTTPS-by-default not long after that. Deploying HSTS continues this trend, and I'm working towards implementing other good ideas that rarely seem to see the light of day.

Check past the break for more technical details.

[Continues...]

As part of prepping for HSTS deployment, I went through every site in our public DNS records, and made sure they all have valid SSL certificates, and are redirecting to HTTPS by default. Much to my embarrassment, I found that several of our public facing sites lacked SSL support at all, or had self-signed certificates and broken SSL configurations. This has been rectified.

Let this be a lesson to everyone. While protecting your "main site" is always a good idea, make sure when going through and securing your infrastructure that you check every public IP and public hostname to make sure something didn't slip through the gaps. If you're running SSLLabs against your website, I highly recommend you scan all the subjectAlternativeNames listed in your certificate. Apache and nginx can provide different SSL options for different VHosts, and its very important to make sure all of them have a sane and consistent configuration.

Right now, HSTS is deployed only on the main site, without "includeSubdomains". The reason for this is I wanted to make sure I didn't miss any non-SSL capable sites, and I'm still working on getting our CentOS 6.7 box up to best-practices (unfortunately, the version of Apache it ships with is rather dated and doesn't support OSCP stapling. I'll be fixing this, but just haven't gotten around to it yet).

Once I've fixed that, and am happy with the state of the site, SN, and her subdomains will be submitted for inclusion into browser preload lists. I'll run an article when that submission happens and when we're accepted. I hope to have another article this week on backend tinkering and proposed site updates.

Until then, happy hacking!
~ NCommander

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by number6 on Tuesday August 09 2016, @05:06PM

    by number6 (1831) on Tuesday August 09 2016, @05:06PM (#385850) Journal

    @martyb - WOW! your script gives a perfect result and completes the task like a bullet flying out of a gun!

    Comparing the completion times of your script to mine:

        * Your script --> start time: "xx.xx.27.17" seconds | end time: "xx:xx:28.85 " seconds --> total time: 1.68 seconds
        * My script --> start time: "xx:50:35.68" seconds | end time: "xx:51:42.68 " seconds --> total time: 67.00 seconds

    the program causing the most delay in my script was 'httrack' ; it takes forever to download the page ; 'wget' does the same thing in an instant!

    Thanks a lot martyb, that was educational.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by number6 on Tuesday August 09 2016, @07:05PM

    by number6 (1831) on Tuesday August 09 2016, @07:05PM (#385911) Journal

    Upon further testing of your code, I see wrong output...

    Querying the IP address "216.34.181.45" (it is SlashDot)
    your script outputs this to the console:

    slashdot.biz
    0vertime.com
    nillos.com
    slashbi.com
    slashdot.com
    slashdottv.co
    2f2e.me
    slashbi.net
    slashdot.net
    slashbi.org
    slashdot.org
    slash.tv
    2f2e.us
    slashdot.

    but if you visit the robtex webpage to look at the items you will see there are many more!
    here is the webpage: https://www.robtex.com/?dns=216.34.181.45&rev=1 [robtex.com]

    • (Score: 2) by number6 on Wednesday August 10 2016, @05:00AM

      by number6 (1831) on Wednesday August 10 2016, @05:00AM (#386127) Journal

      Ok, I found the problem .....the Windows 'find' command has limitations!
      Read this:

      .http://ss64.com/nt/find.html
      Limitations: "Although FIND can be used to scan large files, it will not detect any string that is positioned more than 1070 characters along a single line (with no carriage return) This makes it of limited use in searching binary or XML file types."

       
      so, when I tried @martyb's code on IP address "45.56.12.192" (SoylentNews.org), the code worked and I got the correct result because the string to be extracted from the HTML file by the 'find' command was small (a 223 character string containing 2 hostnames).

      but, when I tried @martyb's code on IP address "216.34.181.45" (Slashdot.org), the code worked but i got an incorrect result because the string to be extracted from the HTML file by the 'find' command was very large (a 1469 character string containing 19 hostnames).

       
      PROOF .......let's run some commands and display the output (for IP address "216.34.181.45"):

       
      This result is incorrect, only part of the string has been extracted from the HTML file:

      $ type temp.html | find.exe "Reverse DNS record lookup"
       
      <h2>Reverse DNS record lookup</h2><table><tbody><tr><td>1</td><td><a href="/?dns=slashdot.biz">slashdot.biz</a></td></tr><tr><td>2</td><td><a href="/?dns=0vertime.com">0vertime.com</a></td></tr><tr><td>3</td><td><a href="/?dns=nillos.com">nillos.com</a></td></tr><tr><td>4</td><td><a href="/?dns=slashbi.com">slashbi.com</a></td></tr><tr><td>5</td><td><a href="/?dns=slashdot.com">slashdot.com</a></td></tr><tr><td>6</td><td><a href="/?dns=slashdottv.com">slashdottv.com</a></td></tr><tr><td>7</td><td><a href="/?dns=2f2e.me">2f2e.me</a></td></tr><tr><td>8</td><td><a href="/?dns=slashbi.net">slashbi.net</a></td></tr><tr><td>9</td><td><a href="/?dns=slashdot.net">slashdot.net</a></td></tr><tr><td>10</td><td><a href="/?dns=slashbi.org">slashbi.org</a></td></tr><tr><td>11</td><td><a href="/?dns=slashdot.org">slashdot.org</a></td></tr><tr><td>12</td><td><a href="/?dns=slash.tv">slash.tv</a></td></tr><tr><td>13</td><td><a href="/?dns=2f2e.us">2f2e.us</a></td></tr><tr><td>14</td><td><a href="/?dns=slashdot.us">slashdot.

       
      This result is correct, all of the string has been extracted from the HTML file:

      $ sed.exe -n "/Reverse DNS record lookup/p" temp.html
       
      <h2>Reverse DNS record lookup</h2><table><tbody><tr><td>1</td><td><a href="/?dns=slashdot.biz">slashdot.biz</a></td></tr><tr><td>2</td><td><a href="/?dns=0vertime.com">0vertime.com</a></td></tr><tr><td>3</td><td><a href="/?dns=nillos.com">nillos.com</a></td></tr><tr><td>4</td><td><a href="/?dns=slashbi.com">slashbi.com</a></td></tr><tr><td>5</td><td><a href="/?dns=slashdot.com">slashdot.com</a></td></tr><tr><td>6</td><td><a href="/?dns=slashdottv.com">slashdottv.com</a></td></tr><tr><td>7</td><td><a href="/?dns=2f2e.me">2f2e.me</a></td></tr><tr><td>8</td><td><a href="/?dns=slashbi.net">slashbi.net</a></td></tr><tr><td>9</td><td><a href="/?dns=slashdot.net">slashdot.net</a></td></tr><tr><td>10</td><td><a href="/?dns=slashbi.org">slashbi.org</a></td></tr><tr><td>11</td><td><a href="/?dns=slashdot.org">slashdot.org</a></td></tr><tr><td>12</td><td><a href="/?dns=slash.tv">slash.tv</a></td></tr><tr><td>13</td><td><a href="/?dns=2f2e.us">2f2e.us</a></td></tr><tr><td>14</td><td><a href="/?dns=slashdot.us">slashdot.us</a></td></tr><tr><td>15</td><td><a href="/?dns=www.slashdot.biz">www.slashdot.biz</a></td></tr><tr><td>16</td><td><a href="/?dns=%2A.slashdot.net">*.slashdot.net</a></td></tr><tr><td>17</td><td><a href="/?dns=science.slashdot.net">science.slashdot.net</a></td></tr><tr><td>18</td><td><a href="/?dns=tv.slashdot.org">tv.slashdot.org</a></td></tr><tr><td>19</td><td><a href="/?dns=www.2f2e.us">www.2f2e.us</a></td></tr></tbody></table><br/><br/>

       
      So, to get the correct output for use in my batch script, we modify @martyb's code, replacing "type .. | find .." with a "sed" string extraction command,
      giving us this final result, which is correct (19 hostnames):

      $ sed.exe -n "/Reverse DNS record lookup/p" temp.html | sed.exe "s#<[^>]*># #g; s#  *#\n#g" | find.exe "."
       
      slashdot.biz
      0vertime.com
      nillos.com
      slashbi.com
      slashdot.com
      slashdottv.com
      2f2e.me
      slashbi.net
      slashdot.net
      slashbi.org
      slashdot.org
      slash.tv
      2f2e.us
      slashdot.us
      www.slashdot.biz
      *.slashdot.net
      science.slashdot.net
      tv.slashdot.org
      www.2f2e.us

      • (Score: 2) by martyb on Wednesday August 10 2016, @02:44PM

        by martyb (76) Subscriber Badge on Wednesday August 10 2016, @02:44PM (#386270) Journal

        I'm glad you found the code I offered to be helpful... and witness the value of comments expressing the *intent* of the code, too!

        I failed to note that what I provided was NOT a general solution, but instead intended to solve only that specific case. Further, I was mindful of your being in a Windows environment and not having ready access to the full stack of Unixland commands — hence my using find.exe.

        NOTE: Using regular expressions (regexps) to parse HTML is a known folly in the general case [stackoverflow.com]. It can be useful in specific circumstances, and I have done so in many one-off, controlled situations. But, you may well find yourself removing hair in vast quantities if you fail to understand this limitation!

        I did not see the results you did, but on further exploration with other IP addresses, I found that my copy of find.exe (Windows 7 Pro x64) limited the returned string to 4095 bytes. (For the more extreme example, I tried "98.138.253.109" — yahoo.com, which had 62 domain names.). Using your sed pattern did work — I was not aware of that use for sed — many thanks!

        That said, I would much prefer to find a native, command-line solution using userland commands (even GNU/Linux would be fine) instead of needing to query a web site. How DOES robtex do what they do?

        --
        Wit is intellect, dancing.