Stories
Slash Boxes
Comments

SoylentNews is people

The Fine print: The following are owned by whoever posted them. We are not responsible for them in any way.

Journal by juggs

For a while now I've had a little bot running in the #rss-bot channel on irc.soylentnews.org and a few people have asked how to contact me to suggest additional RSS feeds or possible improvements etc. (Thanks Bytram, for jogging my memory to do something regards that).

So to that end - if you have any such suggestions please reply to this Journal entry with them and provided they are something within the bounds of sanity, reality and my ability then I will endeavour to incorporate them.

Display Options Threshold/Breakthrough Reply to Article Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by crutchy on Tuesday June 10 2014, @11:50AM

    by crutchy (179) on Tuesday June 10 2014, @11:50AM (#53707) Homepage Journal

    i was checking out some google groups feeds.

    url syntax is: https://groups.google.com/forum/feed/$group/msgs/atom.xml?num=15 [google.com]
    i didn't get very far in my browsing, but some $group suggestions that seemed promising are:
    comp.ai.philosophy
    comp.ai.neural-nets
    machine-learning
    comp.ai.genetic
    comp.ai.fuzzy
    comp.ai.nat-lang
    comp.ai.shells
    semantic_web
    comp.ai.games
    networked-robots

    i'm still working on feed parsing atm so might be a while before its ready, but if you want i can eventually join exec.bot to #rss-bot channel to supplement any features (like getting titles from url redirects etc).

    • (Score: 2) by martyb on Wednesday June 11 2014, @02:50PM

      by martyb (76) Subscriber Badge on Wednesday June 11 2014, @02:50PM (#54168) Journal

      Hi Crutchy!

      Those sound interesting. I'm interested in your thoughts on feed scraping. It seems to me that many of the RSS feeds actually contain redirects before you actually get to the destination article.

      For example, take a look at this feed item from The Register:

      http://go.theregister.com/feed/www.theregister.co.uk/2014/06/11/privacy_invasion_by_the_state_is_far_worse_than_by_private_firms_worstall_weds/

      It would be nice to see that de-referenced to be just:

      http://www.theregister.co.uk/2014/06/11/privacy_invasion_by_the_state_is_far_worse_than_by_private_firms_worstall_weds/

      I don't know whether this can be abstracted to just following the link and watching return codes (e.g. 501, 502, etc.(IIRC)) or whether a feed-specific filter would be needed.

      In case of any parsing problems, it might be nice to show the before and after URLs, just in case.

      Maybe send as a channel message? Processing: "raw-url"; it has title: "title-text"; it is [not redirected|redirected to "cleaned-url"].

      This would make the URLs more useful when including in a story based on a feed item.

      --
      Wit is intellect, dancing.
    • (Score: -1, Spam) by Anonymous Coward on Saturday April 25 2015, @09:11PM

      by Anonymous Coward on Saturday April 25 2015, @09:11PM (#175167)

      DnB042 http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com [fylitcl7pf7kjqdduolqouaxtxbj5ing.com]

  • (Score: 2) by martyb on Tuesday June 10 2014, @12:01PM

    by martyb (76) Subscriber Badge on Tuesday June 10 2014, @12:01PM (#53713) Journal

    Thanks a lot for providing this bot! I've pointed out to site editors and it's proving really useful for finding stories!

    Could you please post here a list of the RSS feeds that you are already following? Much obliged!

    --
    Wit is intellect, dancing.
    • (Score: 2) by juggs on Tuesday June 10 2014, @03:20PM

      by juggs (63) on Tuesday June 10 2014, @03:20PM (#53802) Journal

      If you use !rss in channel the bot will list the feeds it's currently pulling in thus:

      !rss
      -Regurgitator- Available feeds: !darpa, !wired-enterprise, !wired-science, !nist-chemistry, !physorg, !securityweek, !taosecurity, !arstechnica, !nist-math, !nist-energy, !nist-electronics, !bugtraq, !nist-standards, !nist-manufacturing, !nist-forensics, !nist-bioscience, !krebs, !computerworld, !nist-it, !bbc-tech, !nist-buildfire, !cnet, !forbes-tech, !nasa, !itworld, !nist-physics, !theregister, !nist-nano.

      Hopefully the names are self-explanatory. You can then go on to use any of those listed to prompt the bot to give you the last 5 items from any given feed:

      !darpa
      -Regurgitator- [DARPA News RSS] 2014/06/09 From Close Air Support to Fire Suppression - http://www.darpa.mil/NewsEvents/Releases/2014/06/09.aspx [darpa.mil]
      -Regurgitator- [DARPA News RSS] 2014/06/05 DARPA Z-Man Program Demonstrates Human Climbing Like Geckos - http://www.darpa.mil/NewsEvents/Releases/2014/06/05.aspx [darpa.mil]
      -Regurgitator- [DARPA News RSS] 2014/06/03 Cyber Grand Challenge Announces 1st Group of Teams, Final Event at DEF CON - http://www.darpa.mil/NewsEvents/Releases/2014/06/03.aspx [darpa.mil]
      -Regurgitator- [DARPA News RSS] 2014/05/28 Microsystems Technologies Office: Creating A New Electronics Revolution For National Defense - http://www.darpa.mil/NewsEvents/Releases/2014/05/28.aspx [darpa.mil]
      -Regurgitator- [DARPA News RSS] 2014/05/27 Journey of Discovery Starts toward Understanding and Treating Networks of the Brain - http://www.darpa.mil/NewsEvents/Releases/2014/05/27a.aspx [darpa.mil]

      The bot's responses to you are done as a NOTICE so as not to flood the channel for others.

      Hope that helps.

      • (Score: 2) by martyb on Wednesday June 11 2014, @01:03PM

        by martyb (76) Subscriber Badge on Wednesday June 11 2014, @01:03PM (#54103) Journal

        Solved before I asked - that's quick feedback!

        1.) If it's not too much trouble, could you please alphabetize the list of feeds returned by !rss?

        2.) I'm curious if there was a technical reason for choosing the last 5 items in response to !feedname? I'm supposing you're keeping them in memory, so there is that limitation. That said, it would be nice to be able to see more entries. Maybe accept an optional parameter for the number of entries to display? Default to 5 to maintain the current behavior, but allow up to say 50? The idea being to be able to retrieve the last day or so of feed entries. If a user requests more entries than exist, then just display what you have (possible with a message indicating that.)

        Granted, I could go the log of the entries, but it's nice to be able to see "everything" from just one source.

        Whatever you decide, this has been a really big help; thanks so much for making it available!

        --
        Wit is intellect, dancing.
        • (Score: 2) by juggs on Thursday June 12 2014, @01:02AM

          by juggs (63) on Thursday June 12 2014, @01:02AM (#54364) Journal

          1.) Shouldn't be a big effort. I took a quick look at that section and I haven't yet figured out what is deciding the order in that response. It certainly isn't based on the order the RSS feeds exist in the config. It appears to read the feed IDs into a list before concatenating them and spitting the message out. So I should be able to sort the list at some point.

          2.) No real reason for 5. I've upped that to 99 for now, served most recent first. I agree it would be good to have it determined by the requestor (so something like !feedid 25 to get the latest 25 for example). It's on my list to do :)

          • (Score: 2) by martyb on Thursday June 12 2014, @02:44AM

            by martyb (76) Subscriber Badge on Thursday June 12 2014, @02:44AM (#54396) Journal

            That's GREAT news!

            We had a nice chat on IRC, but just for safety's sake, here are the feeds I requested.

            1. http://www.sciencedaily.com/newsfeeds.htm
            2. http://www.nasa.gov/content/nasa-rss-feeds/
            3. http://www.jpl.nasa.gov/rss/index.php
            4. http://feeds.nature.com/news/rss/news
            5. http://www.sciencemag.org/rss/

            Thanks again!

            --
            Wit is intellect, dancing.
            • (Score: 2) by juggs on Thursday June 12 2014, @04:44AM

              by juggs (63) on Thursday June 12 2014, @04:44AM (#54429) Journal

              ADDED:
              Science Daily ALL
              URL: http://feeds.sciencedaily.com/sciencedaily?format=xml [sciencedaily.com]
              Trigger: !sciencedaily_all

              NASA Jet Propulsion Lab
              URL: http://www.jpl.nasa.gov/multimedia/rss/news.xml [nasa.gov]
              Trigger: !nasa_jpl

              Nature
              URL: http://feeds.nature.com/news/rss/news [nature.com]
              Trigger: !nature

              Science Mag (wasn't sure which feed to grab so I chose the Daily News feed)
              URL: http://news.sciencemag.org/rss/current.xml [sciencemag.org]
              Trigger: !sciencemag

              • (Score: 2) by juggs on Thursday June 12 2014, @05:10AM

                by juggs (63) on Thursday June 12 2014, @05:10AM (#54434) Journal

                -juggs- !rss
                -Regurgitator- Available feeds: !arstechnica !bbc-tech !bugtraq !cnet !computerworld !darpa !forbes-tech !itworld !krebs !nasa !nasa_jpl !nature !nist-bioscience !nist-buildfire !nist-chemistry !nist-electronics !nist-energy !nist-forensics !nist-it !nist-manufacturing !nist-math !nist-nano !nist-physics !nist-standards !physorg !sciencedaily_all !sciencemag !securityweek !taosecurity !theregister !wired-enterprise !wired-science.

                Sorting sorted and additions included.

                There's something a bit off with the NASA JPL feed though, when a new article is announced in channel it has a load of linebreaks (or something) in it so it outputs like this:

                -Regurgitator- [News and Features - NASA's Jet Propulsion Laboratory]
                -Regurgitator-
                -Regurgitator- NASA Beams 'Hello, World!' Video from Space via Laser
                -Regurgitator-
                -Regurgitator- - http://www.jpl.nasa.gov/news/news.php?release=2014-177&rn=news.xml&rst=4169 [nasa.gov]

                That's just ugly.

                But it parses OK in response to !nasa_jpl oddly enough.

                I'll keep an eye on that and have put it in my todo to investigate. But for now, I must rest!

                • (Score: 2) by juggs on Thursday June 12 2014, @05:45AM

                  by juggs (63) on Thursday June 12 2014, @05:45AM (#54444) Journal

                  Had to remove the nasa_jpl feed for now, the bot kept spewing the same articles into channel again and again.

                  NOW I shall rest :D

  • (Score: 2) by martyb on Tuesday June 10 2014, @12:14PM

    by martyb (76) Subscriber Badge on Tuesday June 10 2014, @12:14PM (#53717) Journal

    I don't know if you are the one to contact? Would really appreciate it if the channel were logged to a file, much like what is already being done for #Soylent, #staff, etc.

    Logging in first thing in the morning when the story and submissions queue are "light", would be nice to be able to see what's been posted to this channel over the past several hours.

    I tried to do a "/invite Loggie" (guessing that is the bot which does the work), but got an error message: "rss-bot :You're not a channel operator". I tried to ".op" and got the same message.

    Would much appreciate your assistance!

    --
    Wit is intellect, dancing.
  • (Score: 2) by martyb on Tuesday June 17 2014, @10:09PM

    by martyb (76) Subscriber Badge on Tuesday June 17 2014, @10:09PM (#56674) Journal

    It may sound silly, at first, but would you please add the RSS feed for SoylentNews.org into #rss-bot?

    I was trying to compare what time we posted a story compared to when another site posted their story on a certain subject. Realized that if we logged our feed into the #rss-bot channel, the task would be much simplified!

    So, please add: http://soylentnews.org/index.rss [soylentnews.org] as feed: !SoylentNews

    Thanks in advance!

    --
    Wit is intellect, dancing.
    • (Score: 2) by juggs on Friday June 20 2014, @01:35AM

      by juggs (63) on Friday June 20 2014, @01:35AM (#57728) Journal

      Tis done.

      -Regurgitator- Available feeds: !SoylentNews !arstechnica !bbc-tech !bugtraq !cnet !computerworld !darpa !forbes-tech !itworld !krebs !nasa !nature !nist-bioscience !nist-buildfire !nist-chemistry !nist-electronics !nist-energy !nist-forensics !nist-it !nist-manufacturing !nist-math !nist-nano !nist-physics !nist-standards !physorg !sciencedaily_all !sciencemag !securityweek !taosecurity !theregister !wired-enterprise !wired-science.

      • (Score: 2) by juggs on Friday June 20 2014, @02:03AM

        by juggs (63) on Friday June 20 2014, @02:03AM (#57736) Journal

        I didn't do it originally as #rss-bot was originally designed to be a source for news for would be submitters to SN - and SN articles are announced in #Soylent anyway.

        However, you made a good case so there it is :D

        • (Score: 1) by martyb on Friday June 20 2014, @06:44PM

          by martyb (76) Subscriber Badge on Friday June 20 2014, @06:44PM (#58111) Journal

          To this (and the your other reply to my request) I can only say "Wow... thanks for the quick response!"

          --
          Wit is intellect, dancing.
  • (Score: 2) by martyb on Saturday June 21 2014, @10:03PM

    by martyb (76) Subscriber Badge on Saturday June 21 2014, @10:03PM (#58531) Journal

    Hi! I have a couple more suggestions for RSS feeds:

    This is a feed from a web site: !mosaicscience = http://mosaicscience.com/feed/rss.xml [mosaicscience.com]

    This is a feed offerred by pipedot: !pipedot-feed = http://pipedot.org/feed/ [pipedot.org]

    NOTE: I'm not sure how well that will work with your code, Might be best to look at what feeds IT follows, and add that those?

    Thanks!

    --
    Wit is intellect, dancing.
    • (Score: 2) by juggs on Sunday June 22 2014, @04:21AM

      by juggs (63) on Sunday June 22 2014, @04:21AM (#58592) Journal

      That pipedot page is a conglomeration of different feeds parsed into an html page. I'll look through them and add some in.

      For now I've added:
      !mosaicscience = http://mosaicscience.com/feed/rss.xml [mosaicscience.com]
      !pipedot = http://pipedot.org/atom [pipedot.org]

      The latter being pipedot's own atom feed.

      • (Score: 1) by martyb on Sunday June 22 2014, @12:17PM

        by martyb (76) Subscriber Badge on Sunday June 22 2014, @12:17PM (#58673) Journal

        Juggs wrote:

        That pipedot page is a conglomeration of different feeds parsed into an html page. I'll look through them and add some in.

        Yeah, I realized that *after* I submitted the request. Thanks for doing what I meant, instead of what I asked!

        --
        Wit is intellect, dancing.
  • (Score: 2) by martyb on Tuesday October 14 2014, @02:16PM

    by martyb (76) Subscriber Badge on Tuesday October 14 2014, @02:16PM (#105938) Journal

    It's a seemingly minor thing, but I would appreciate it if the bracketed feed names were not so long. I use "Unifont CSUR" as my font in HexChat (so I can see all the international characters in UTF-8) and for it to be legible, I need to set it to 14 point. Even with a relatively large display (1920x1200), I very occasionally get line-wrap which makes it much harder to scan down what has been posted.

    For example: "[Phys.org - latest science and technology news stories]" takes up about half of the width available.

    It seems to me that the "long version" of the source does not need to be displayed every time; maybe have it displayed only when a user does: "!rss phys.org" — otherwise, "[phys.org]" should suffice.

    Thanks in advance!

    --
    Wit is intellect, dancing.
    • (Score: 2) by juggs on Wednesday October 15 2014, @05:29PM

      by juggs (63) on Wednesday October 15 2014, @05:29PM (#106328) Journal

      Adding your other suggestion here too...

      Regurgitator> [Latest Science News -- ScienceDaily]

      "[ScienceDaily]" would be sufficient

      These descriptions are coming from the feeds themselves. But I can surely do something, even if it's to add a config option for each feed to specify this instead of pull it out of the rss / atom feed itself.

    • (Score: 2) by juggs on Wednesday October 15 2014, @05:56PM

      by juggs (63) on Wednesday October 15 2014, @05:56PM (#106335) Journal

      [PhysOrg] and [ScienceDaily] now shortened as requested.

    • (Score: 2) by juggs on Wednesday October 15 2014, @10:33PM

      by juggs (63) on Wednesday October 15 2014, @10:33PM (#106444) Journal

      Additionally:
      [SecurityFocus Vulnerabilities] is now [Bugtraq] to match its !trigger

      [SecurityWeek RSS Feed] is now just [Security Week]

      Hope that keeps things cleaner.

      • (Score: 1) by martyb on Thursday October 16 2014, @11:30AM

        by martyb (76) Subscriber Badge on Thursday October 16 2014, @11:30AM (#106576) Journal

        That is *so* much better! Thank-you!!!

        --
        Wit is intellect, dancing.
        • (Score: 2) by juggs on Friday October 17 2014, @11:23PM

          by juggs (63) on Friday October 17 2014, @11:23PM (#107189) Journal

          I noticed all the !nist-xxx feeds were prefixed with:
          [The National Institute of Standards and Technology]

          now:
          [NIST]

  • (Score: 2) by juggs on Tuesday January 06 2015, @03:26AM

    by juggs (63) on Tuesday January 06 2015, @03:26AM (#132096) Journal

    Regurgitator has been retired in favour of Bender. Now Bender will serve you all you rss goodies in #rss-bot.

    (Regurgitator had trouble regurgitating https feeds, Bender didn't. Bender won as it just got on with it rather than demand attention, life's too short etc.)

    Still.. #rss-bot suggestions / improvement ideas here please.

  • (Score: 2) by martyb on Thursday January 22 2015, @10:19PM

    by martyb (76) Subscriber Badge on Thursday January 22 2015, @10:19PM (#137068) Journal
    Hi!

    First off, I apologize for the formatting, but wanted to ensure that there was no inadvertent translation of any text pasted in this comment; hence my use of "code" for this reply.

    I see that Bender has taken over the #rss-bot duties.  Anything that makes things easier is a great idea!

    An observation, and a request, if I may?  Items appearing in the feed present a link that often requires a number of redirects before it gets to the final destination.  Oftentimes, I've seen the redirection being used for tracking purposes.  I'd really appreciate it if the bot could pre-resolve any redirections and present just the final destination path in the entry which is presented in the feed.

    Taking a look in the #Soylent channel, chromas' "Hedonism" bot, does this whenever it resolves a URL -- maybe his code/methodolgy can be merged in?

    As a concrete example, I just now saw this presented in the #rss-bot channel:

       [SecurityWeek] - Anonymous-linked Journalist Barrett Brown Gets Five Years' Prison - http://feedproxy.google.com/~r/Securityweek/~3/d3fQig0JEb4/anonymous-linked-journalist-barrett-brown-gets-five-years-prison

    I just copied that entire text, went to the #Soylent channel, and entered it there.  The "Hedonism" bot responded with:

    ^ Barrett Brown Gets Five Years' Prison | SecurityWeek.Com ( http://www.securityweek.com/anonymous-linked-journalist-barrett-brown-gets-five-years-prison?&utm_medium=feed&utm_campaign=Feed%3A+Securityweek+%28SecurityWeek+RSS+Feed%29 )

    You can see it here: <a href="http://logs.sylnt.us/%23soylent/2015-01-22.html#22:11:42"> http://logs.sylnt.us/%23soylent/2015-01-22.html#22:11:42 </a>

    If there were any way to also remove the &utm_medium... text, that would be an added bonus and muchly appreciated!

    --
    Wit is intellect, dancing.
  • (Score: -1, Spam) by Anonymous Coward on Friday April 24 2015, @03:57PM

    by Anonymous Coward on Friday April 24 2015, @03:57PM (#174699)

    PmyQr5 http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com [fylitcl7pf7kjqdduolqouaxtxbj5ing.com]