Stories
Slash Boxes
Comments

SoylentNews is people

posted by NCommander on Monday August 08 2016, @12:00PM   Printer-friendly
from the now-with-a+-scores dept.

So after an extended period of inactivity, I've finally decided to jump back into working on SoylentNews and rehash (the code that powers the site). As such, I've decided to scratch some long-standing itches. The first (and easiest) to deploy was HSTS to SoylentNews. What is HSTS you may ask?

HSTS stands for HTTP Strict Transport Security and is a special HTTP header that signifies that a site should only be connected to over HTTPS and causes the browser to automatically load encrypted versions of a website should it see a regular URL. We've forbid non-SSL connections to SN for over a year, but without HSTS in place, a man-in-the-middle downgrade attack was possible by intercepting the initial insecure page load.

One of the big views I have towards SoylentNews is we should be representative of "best practices" on the internet. To that end, we deployed IPv6 publicly last year, and went HTTPS-by-default not long after that. Deploying HSTS continues this trend, and I'm working towards implementing other good ideas that rarely seem to see the light of day.

Check past the break for more technical details.

[Continues...]

As part of prepping for HSTS deployment, I went through every site in our public DNS records, and made sure they all have valid SSL certificates, and are redirecting to HTTPS by default. Much to my embarrassment, I found that several of our public facing sites lacked SSL support at all, or had self-signed certificates and broken SSL configurations. This has been rectified.

Let this be a lesson to everyone. While protecting your "main site" is always a good idea, make sure when going through and securing your infrastructure that you check every public IP and public hostname to make sure something didn't slip through the gaps. If you're running SSLLabs against your website, I highly recommend you scan all the subjectAlternativeNames listed in your certificate. Apache and nginx can provide different SSL options for different VHosts, and its very important to make sure all of them have a sane and consistent configuration.

Right now, HSTS is deployed only on the main site, without "includeSubdomains". The reason for this is I wanted to make sure I didn't miss any non-SSL capable sites, and I'm still working on getting our CentOS 6.7 box up to best-practices (unfortunately, the version of Apache it ships with is rather dated and doesn't support OSCP stapling. I'll be fixing this, but just haven't gotten around to it yet).

Once I've fixed that, and am happy with the state of the site, SN, and her subdomains will be submitted for inclusion into browser preload lists. I'll run an article when that submission happens and when we're accepted. I hope to have another article this week on backend tinkering and proposed site updates.

Until then, happy hacking!
~ NCommander

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by number6 on Monday August 08 2016, @04:35PM

    by number6 (1831) on Monday August 08 2016, @04:35PM (#385355) Journal

    **** Preamble: I'm on a Windows computer but the CLI tools I used below are available on *nix systems too ****

     
    I spent hours and hours last night trying to find a way --with command-line tools-- to output soylentnews.org using the IP address 45.56.123.192 as input.

    I gave up; I just couldn't do it.

    I have created a batch script ("NetworkInfo-Query-IPAddress-DNS.bat") which I run on demand;
    it runs two commands and displays the info in the console;

    command 1 is:  curl.exe ipinfo.io/%IPADDRESS%
    command 2 is:  dig.exe +short -x %IPADDRESS%     // see note [1]

    Here is what the console displays when I use "45.56.123.192" as input:

    $ curl.exe ipinfo.io/45.56.123.192
    {
      "ip": "45.56.123.192",
      "hostname": "li941-192.members.linode.com",
      "city": "Akhtar Colony",
      "region": "Sindh",
      "country": "PK",
      "loc": "24.8449,67.0727",
      "org": "AS63949 Linode, LLC"
    }
     
    $ dig.exe +short -x 45.56.123.192
    {
      "Reverse DNS": li941-192.members.linode.com.
    }

     
    I am trying to find a third command to add to my script which will allow me to extract all hostnames/aliases of the IP address.

    If I visit this webpage for reference: https://www.robtex.com/?dns=45.56.123.192 [robtex.com]

    I see that "45.56.123.192" points to two host names: "grepnews.org" and "soylentnews.org" .

    So what command-line tool can I use to extract those host names ?

     
    [1] --------------------------------------------------------------------------

    The "curl.exe ipinfo.io" command outputs its info formatted with indentation
    and braces. The "dig.exe +short -x" command does not give the same output
    format; so to make it have the same display formatting I used this command:

      for /f "tokens=*" %%A in ('%CD%\dig.exe +short -x %IPADDRESS%') do (
      echo {
      echo "Reverse DNS": %%A
      echo }
      )

    You may think that my use of the "dig" command is redundant given that the "curl" command
    gave the same "hostname" info, but in practice the "curl" query many times gives me blank
    results for the "hostname" field, whereas the "dig" command _always_ gets this info.

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by Scruffy Beard 2 on Monday August 08 2016, @05:00PM

    by Scruffy Beard 2 (6030) on Monday August 08 2016, @05:00PM (#385361)

    You do know that you can have more than one domain pointing at the same IP, right?

    (Straying outside my knowledge here): The only time you really need the reverse DNS to match is for mail (mainly to combat spam). I suspect there are other use-cases that I am not aware of.

  • (Score: 2) by NCommander on Monday August 08 2016, @05:13PM

    by NCommander (2) Subscriber Badge <michael@casadevall.pro> on Monday August 08 2016, @05:13PM (#385369) Homepage Journal

    Using powershell:

    PS C:\Users\mcasa> resolve-dnsname soylentnews.org | ForEach-Object { Write-Host $_.IPAddress }
    2600:3c00::f03c:91ff:fe98:b8fe
    45.56.123.192

    --
    Still always moving
    • (Score: 2) by number6 on Monday August 08 2016, @05:31PM

      by number6 (1831) on Monday August 08 2016, @05:31PM (#385378) Journal

      Thanks for the reply NCommander +++++

      unfortunately for me....
      I'm on Windows XP and I hate PowerShell and do not have it on my system; I also hate Microsoft .NET and do not have it on my system.
      I hate all post-XP/2003 Microsoft operating systems.....from a control/configuration/GUI preference perspective; nothing to do with kernel/memory/graphics/security subsystems improvements.

      When my system eventually dies, I will be moving to FOSS and the less I have anything to do with Microsoft-centric or Apple-centric solutions the better.

      Do you know of a standalone standard CLI tool which is compiled for all platforms which can extract this information?
      Thanks in advance.

  • (Score: 2) by number6 on Tuesday August 09 2016, @01:02AM

    by number6 (1831) on Tuesday August 09 2016, @01:02AM (#385572) Journal

    ***** Continuation of my first post *****

    Okay..... I eventually found a way to extract the canonical hostnames ("grepnews.org" and "soylentnews.org")
    which are pointing to "45.56.123.192" .

     
    The command-line tools I needed were:
      * httrack.exe - website copier, spider
      * sed.exe - famous powerful text processing utility
      * UnixToDos.exe - to convert line-endings from Unix (LF) to Windows (CR/LF) after processing with 'sed'.

     
    Here is a preliminary draft of what I will be adding to the batch file (mentioned at first post):

    :: REM :: THIS BATCH SCRIPT AND ALL PROGRAMS ARE LOCATED IN THE SAME WORKING FOLDER
     
    :: REM :: GET A SINGLE HTML WEB PAGE WITHOUT RESOURCES AND OUTPUT IT TO THE WORKING FOLDER
    httrack.exe "https://www.robtex.com/?dns=45.56.123.192&rev=1" -O "." -s0 --depth=1 -C0 -Q -I0 %I0 -q -p1 -N "_output.html"
     
    :: REM :: RENAME THE HTML FILE TO "_OUTPUT.HTML" (IN CASE HTTRACK APPENDED MORE TEXT TO THE NAME).
    ren _output*.html _output.html
     
    :: REM :: ISOLATE THE LINE IN THE HTML FILE HAVING THE CANONICAL HOSTNAMES, REMOVING ALL
    :: REM :: OTHER LINES AND SEND OUTPUT TO A NEW TEXT FILE
    for /f "delims=" %%A in ('type "_output.html" ^|find.exe /i "Reverse DNS record lookup"') do echo %%A>_output.txt"
     
    :: REM ::::::::::: contents of _output.txt will look like this (one line):
    :: REM ::::::::::: <h2>Reverse DNS record lookup</h2><table><tbody><tr><td>1</td><td><a href="https://www.robtex.com/?dns=grepnews.org">grepnews.org</a></td></tr><tr><td>2</td><td><a href="https://www.robtex.com/?dns=soylentnews.org">soylentnews.org</a></td></tr></tbody></table><br/><br/>
     
    :: REM :: PROCESS THE TEXT WITH "SED" .........................................
     
    ::REM :: STRIP ALL HTML TAGS
    sed.exe -i -e "s/<[^>]*>//g" _output.txt
     
    :: REM ::::::::::: contents of _output.txt will look like this (one line):
    :: REM ::::::::::: Reverse DNS record lookup1grepnews.org2soylentnews.org
     
    ::REM :: REMOVE STRING "REVERSE DNS RECORD LOOKUP"
    sed.exe -i "s/Reverse DNS record lookup//g" _output.txt
     
    :: REM ::::::::::: contents of _output.txt will look like this (one line):
    :: REM ::::::::::: 1grepnews.org2soylentnews.org
     
    ::REM :: THE REMAINING TEXT WILL BE THE HOSTNAME ITEMS EACH PREFIXED WITH AN
    ::REM :: ASCENDING NUMBER, SO WE SPLIT THE STRINGS TO NEW LINES USING THE
    ::REM :: NUMBERS AS DELIMITERS, ALSO STRIPPING OFF THE NUMBERS.
    sed -i "s/[0-9]/\n/g" _output.txt
     
    :: REM ::::::::::: contents of _output.txt will look like this (one line):
    :: REM ::::::::::: grepnews.org[LF]soylentnews.org[LF]
    :: REM :::::::::::
    :: REM :::::::::::      // note: [LF] = Unix Line Break, therefore the lines will display
    :: REM :::::::::::      // correctly on *nix systems but not on Windows.
     
    :: REM :: REMOVE BLANK LINES
    sed -i -e "/^$/d" _output.txt
     
    :: REM :: CONVERT LINE-ENDINGS FROM UNIX TO DOS .
    :: REM :: UNFORTUNATELY FOR ME 'SED' HAS COMMANDS TO DO THIS BUT THEY DID NOT WORK!
    :: REM :: SO I NEEDED ANOTHER TOOL
    UnixToDOS.exe -u _output.txt
     
    :: REM ::::::::::: finally, contents of _output.txt will look like this:
    :: REM ::::::::::: grepnews.org
    :: REM ::::::::::: soylentnews.org

     
    Using the code and tools above I have captured the canonical hostnames for IP address "45.56.123.192".

    I think the code needs more testing because splitting the string at numbered delimiters may not always work
    for instance in cases where the hostname begins with a number or numbers.

    I think I need some improvement to the regex pattern for splitting the string.
    Do any of you guys have a better pattern?

    • (Score: 2) by martyb on Tuesday August 09 2016, @02:25AM

      by martyb (76) Subscriber Badge on Tuesday August 09 2016, @02:25AM (#385596) Journal
      You might find this to be a bit easier:

      :: Retrieve the main page into a file:
      wget --no-check-certificate --output-document=temp.html "https://www.robtex.com/?dns=45.56.123.192&rev=1"

      :: Find the line in the HTML file which contains the data we want,
      :: replace each HTML element with a space,
      :: replace each sequence of one-or-more spaces with a new-line character,
      :: display only those lines that contain at least one period ("."):
      TYPE temp.html | find "Reverse DNS record lookup" | sed "s#<[^>]*># #g; s#  *#\n#g" | find "."

      The wget utility is available at: eternallybored.org [eternallybored.org]

      Please let me know how that works for you.

      --
      Wit is intellect, dancing.
      • (Score: 2) by number6 on Tuesday August 09 2016, @05:06PM

        by number6 (1831) on Tuesday August 09 2016, @05:06PM (#385850) Journal

        @martyb - WOW! your script gives a perfect result and completes the task like a bullet flying out of a gun!

        Comparing the completion times of your script to mine:

            * Your script --> start time: "xx.xx.27.17" seconds | end time: "xx:xx:28.85 " seconds --> total time: 1.68 seconds
            * My script --> start time: "xx:50:35.68" seconds | end time: "xx:51:42.68 " seconds --> total time: 67.00 seconds

        the program causing the most delay in my script was 'httrack' ; it takes forever to download the page ; 'wget' does the same thing in an instant!

        Thanks a lot martyb, that was educational.

        • (Score: 2) by number6 on Tuesday August 09 2016, @07:05PM

          by number6 (1831) on Tuesday August 09 2016, @07:05PM (#385911) Journal

          Upon further testing of your code, I see wrong output...

          Querying the IP address "216.34.181.45" (it is SlashDot)
          your script outputs this to the console:

          slashdot.biz
          0vertime.com
          nillos.com
          slashbi.com
          slashdot.com
          slashdottv.co
          2f2e.me
          slashbi.net
          slashdot.net
          slashbi.org
          slashdot.org
          slash.tv
          2f2e.us
          slashdot.

          but if you visit the robtex webpage to look at the items you will see there are many more!
          here is the webpage: https://www.robtex.com/?dns=216.34.181.45&rev=1 [robtex.com]

          • (Score: 2) by number6 on Wednesday August 10 2016, @05:00AM

            by number6 (1831) on Wednesday August 10 2016, @05:00AM (#386127) Journal

            Ok, I found the problem .....the Windows 'find' command has limitations!
            Read this:

            .http://ss64.com/nt/find.html
            Limitations: "Although FIND can be used to scan large files, it will not detect any string that is positioned more than 1070 characters along a single line (with no carriage return) This makes it of limited use in searching binary or XML file types."

             
            so, when I tried @martyb's code on IP address "45.56.12.192" (SoylentNews.org), the code worked and I got the correct result because the string to be extracted from the HTML file by the 'find' command was small (a 223 character string containing 2 hostnames).

            but, when I tried @martyb's code on IP address "216.34.181.45" (Slashdot.org), the code worked but i got an incorrect result because the string to be extracted from the HTML file by the 'find' command was very large (a 1469 character string containing 19 hostnames).

             
            PROOF .......let's run some commands and display the output (for IP address "216.34.181.45"):

             
            This result is incorrect, only part of the string has been extracted from the HTML file:

            $ type temp.html | find.exe "Reverse DNS record lookup"
             
            <h2>Reverse DNS record lookup</h2><table><tbody><tr><td>1</td><td><a href="/?dns=slashdot.biz">slashdot.biz</a></td></tr><tr><td>2</td><td><a href="/?dns=0vertime.com">0vertime.com</a></td></tr><tr><td>3</td><td><a href="/?dns=nillos.com">nillos.com</a></td></tr><tr><td>4</td><td><a href="/?dns=slashbi.com">slashbi.com</a></td></tr><tr><td>5</td><td><a href="/?dns=slashdot.com">slashdot.com</a></td></tr><tr><td>6</td><td><a href="/?dns=slashdottv.com">slashdottv.com</a></td></tr><tr><td>7</td><td><a href="/?dns=2f2e.me">2f2e.me</a></td></tr><tr><td>8</td><td><a href="/?dns=slashbi.net">slashbi.net</a></td></tr><tr><td>9</td><td><a href="/?dns=slashdot.net">slashdot.net</a></td></tr><tr><td>10</td><td><a href="/?dns=slashbi.org">slashbi.org</a></td></tr><tr><td>11</td><td><a href="/?dns=slashdot.org">slashdot.org</a></td></tr><tr><td>12</td><td><a href="/?dns=slash.tv">slash.tv</a></td></tr><tr><td>13</td><td><a href="/?dns=2f2e.us">2f2e.us</a></td></tr><tr><td>14</td><td><a href="/?dns=slashdot.us">slashdot.

             
            This result is correct, all of the string has been extracted from the HTML file:

            $ sed.exe -n "/Reverse DNS record lookup/p" temp.html
             
            <h2>Reverse DNS record lookup</h2><table><tbody><tr><td>1</td><td><a href="/?dns=slashdot.biz">slashdot.biz</a></td></tr><tr><td>2</td><td><a href="/?dns=0vertime.com">0vertime.com</a></td></tr><tr><td>3</td><td><a href="/?dns=nillos.com">nillos.com</a></td></tr><tr><td>4</td><td><a href="/?dns=slashbi.com">slashbi.com</a></td></tr><tr><td>5</td><td><a href="/?dns=slashdot.com">slashdot.com</a></td></tr><tr><td>6</td><td><a href="/?dns=slashdottv.com">slashdottv.com</a></td></tr><tr><td>7</td><td><a href="/?dns=2f2e.me">2f2e.me</a></td></tr><tr><td>8</td><td><a href="/?dns=slashbi.net">slashbi.net</a></td></tr><tr><td>9</td><td><a href="/?dns=slashdot.net">slashdot.net</a></td></tr><tr><td>10</td><td><a href="/?dns=slashbi.org">slashbi.org</a></td></tr><tr><td>11</td><td><a href="/?dns=slashdot.org">slashdot.org</a></td></tr><tr><td>12</td><td><a href="/?dns=slash.tv">slash.tv</a></td></tr><tr><td>13</td><td><a href="/?dns=2f2e.us">2f2e.us</a></td></tr><tr><td>14</td><td><a href="/?dns=slashdot.us">slashdot.us</a></td></tr><tr><td>15</td><td><a href="/?dns=www.slashdot.biz">www.slashdot.biz</a></td></tr><tr><td>16</td><td><a href="/?dns=%2A.slashdot.net">*.slashdot.net</a></td></tr><tr><td>17</td><td><a href="/?dns=science.slashdot.net">science.slashdot.net</a></td></tr><tr><td>18</td><td><a href="/?dns=tv.slashdot.org">tv.slashdot.org</a></td></tr><tr><td>19</td><td><a href="/?dns=www.2f2e.us">www.2f2e.us</a></td></tr></tbody></table><br/><br/>

             
            So, to get the correct output for use in my batch script, we modify @martyb's code, replacing "type .. | find .." with a "sed" string extraction command,
            giving us this final result, which is correct (19 hostnames):

            $ sed.exe -n "/Reverse DNS record lookup/p" temp.html | sed.exe "s#<[^>]*># #g; s#  *#\n#g" | find.exe "."
             
            slashdot.biz
            0vertime.com
            nillos.com
            slashbi.com
            slashdot.com
            slashdottv.com
            2f2e.me
            slashbi.net
            slashdot.net
            slashbi.org
            slashdot.org
            slash.tv
            2f2e.us
            slashdot.us
            www.slashdot.biz
            *.slashdot.net
            science.slashdot.net
            tv.slashdot.org
            www.2f2e.us

            • (Score: 2) by martyb on Wednesday August 10 2016, @02:44PM

              by martyb (76) Subscriber Badge on Wednesday August 10 2016, @02:44PM (#386270) Journal

              I'm glad you found the code I offered to be helpful... and witness the value of comments expressing the *intent* of the code, too!

              I failed to note that what I provided was NOT a general solution, but instead intended to solve only that specific case. Further, I was mindful of your being in a Windows environment and not having ready access to the full stack of Unixland commands — hence my using find.exe.

              NOTE: Using regular expressions (regexps) to parse HTML is a known folly in the general case [stackoverflow.com]. It can be useful in specific circumstances, and I have done so in many one-off, controlled situations. But, you may well find yourself removing hair in vast quantities if you fail to understand this limitation!

              I did not see the results you did, but on further exploration with other IP addresses, I found that my copy of find.exe (Windows 7 Pro x64) limited the returned string to 4095 bytes. (For the more extreme example, I tried "98.138.253.109" — yahoo.com, which had 62 domain names.). Using your sed pattern did work — I was not aware of that use for sed — many thanks!

              That said, I would much prefer to find a native, command-line solution using userland commands (even GNU/Linux would be fine) instead of needing to query a web site. How DOES robtex do what they do?

              --
              Wit is intellect, dancing.