For a while now I've had a little bot running in the #rss-bot channel on irc.soylentnews.org and a few people have asked how to contact me to suggest additional RSS feeds or possible improvements etc. (Thanks Bytram, for jogging my memory to do something regards that).
So to that end - if you have any such suggestions please reply to this Journal entry with them and provided they are something within the bounds of sanity, reality and my ability then I will endeavour to incorporate them.
(Score: 2) by crutchy on Tuesday June 10 2014, @11:50AM
i was checking out some google groups feeds.
url syntax is: https://groups.google.com/forum/feed/$group/msgs/atom.xml?num=15 [google.com]
i didn't get very far in my browsing, but some $group suggestions that seemed promising are:
comp.ai.philosophy
comp.ai.neural-nets
machine-learning
comp.ai.genetic
comp.ai.fuzzy
comp.ai.nat-lang
comp.ai.shells
semantic_web
comp.ai.games
networked-robots
i'm still working on feed parsing atm so might be a while before its ready, but if you want i can eventually join exec.bot to #rss-bot channel to supplement any features (like getting titles from url redirects etc).
(Score: 2) by martyb on Wednesday June 11 2014, @02:50PM
Hi Crutchy!
Those sound interesting. I'm interested in your thoughts on feed scraping. It seems to me that many of the RSS feeds actually contain redirects before you actually get to the destination article.
For example, take a look at this feed item from The Register:
It would be nice to see that de-referenced to be just:
I don't know whether this can be abstracted to just following the link and watching return codes (e.g. 501, 502, etc.(IIRC)) or whether a feed-specific filter would be needed.
In case of any parsing problems, it might be nice to show the before and after URLs, just in case.
Maybe send as a channel message? Processing: "raw-url"; it has title: "title-text"; it is [not redirected|redirected to "cleaned-url"].
This would make the URLs more useful when including in a story based on a feed item.
Wit is intellect, dancing.
(Score: -1, Spam) by Anonymous Coward on Saturday April 25 2015, @09:11PM
DnB042 http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com [fylitcl7pf7kjqdduolqouaxtxbj5ing.com]
(Score: 2) by martyb on Tuesday June 10 2014, @12:01PM
Thanks a lot for providing this bot! I've pointed out to site editors and it's proving really useful for finding stories!
Could you please post here a list of the RSS feeds that you are already following? Much obliged!
Wit is intellect, dancing.
(Score: 2) by juggs on Tuesday June 10 2014, @03:20PM
If you use !rss in channel the bot will list the feeds it's currently pulling in thus:
Hopefully the names are self-explanatory. You can then go on to use any of those listed to prompt the bot to give you the last 5 items from any given feed:
The bot's responses to you are done as a NOTICE so as not to flood the channel for others.
Hope that helps.
(Score: 2) by martyb on Wednesday June 11 2014, @01:03PM
Solved before I asked - that's quick feedback!
1.) If it's not too much trouble, could you please alphabetize the list of feeds returned by !rss?
2.) I'm curious if there was a technical reason for choosing the last 5 items in response to !feedname? I'm supposing you're keeping them in memory, so there is that limitation. That said, it would be nice to be able to see more entries. Maybe accept an optional parameter for the number of entries to display? Default to 5 to maintain the current behavior, but allow up to say 50? The idea being to be able to retrieve the last day or so of feed entries. If a user requests more entries than exist, then just display what you have (possible with a message indicating that.)
Granted, I could go the log of the entries, but it's nice to be able to see "everything" from just one source.
Whatever you decide, this has been a really big help; thanks so much for making it available!
Wit is intellect, dancing.
(Score: 2) by juggs on Thursday June 12 2014, @01:02AM
1.) Shouldn't be a big effort. I took a quick look at that section and I haven't yet figured out what is deciding the order in that response. It certainly isn't based on the order the RSS feeds exist in the config. It appears to read the feed IDs into a list before concatenating them and spitting the message out. So I should be able to sort the list at some point.
2.) No real reason for 5. I've upped that to 99 for now, served most recent first. I agree it would be good to have it determined by the requestor (so something like !feedid 25 to get the latest 25 for example). It's on my list to do :)
(Score: 2) by martyb on Thursday June 12 2014, @02:44AM
That's GREAT news!
We had a nice chat on IRC, but just for safety's sake, here are the feeds I requested.
Thanks again!
Wit is intellect, dancing.
(Score: 2) by juggs on Thursday June 12 2014, @04:44AM
ADDED:
Science Daily ALL
URL: http://feeds.sciencedaily.com/sciencedaily?format=xml [sciencedaily.com]
Trigger: !sciencedaily_all
NASA Jet Propulsion Lab
URL: http://www.jpl.nasa.gov/multimedia/rss/news.xml [nasa.gov]
Trigger: !nasa_jpl
Nature
URL: http://feeds.nature.com/news/rss/news [nature.com]
Trigger: !nature
Science Mag (wasn't sure which feed to grab so I chose the Daily News feed)
URL: http://news.sciencemag.org/rss/current.xml [sciencemag.org]
Trigger: !sciencemag
(Score: 2) by juggs on Thursday June 12 2014, @05:10AM
Sorting sorted and additions included.
There's something a bit off with the NASA JPL feed though, when a new article is announced in channel it has a load of linebreaks (or something) in it so it outputs like this:
That's just ugly.
But it parses OK in response to !nasa_jpl oddly enough.
I'll keep an eye on that and have put it in my todo to investigate. But for now, I must rest!
(Score: 2) by juggs on Thursday June 12 2014, @05:45AM
Had to remove the nasa_jpl feed for now, the bot kept spewing the same articles into channel again and again.
NOW I shall rest :D
(Score: 2) by martyb on Tuesday June 10 2014, @12:14PM
I don't know if you are the one to contact? Would really appreciate it if the channel were logged to a file, much like what is already being done for #Soylent, #staff, etc.
Logging in first thing in the morning when the story and submissions queue are "light", would be nice to be able to see what's been posted to this channel over the past several hours.
I tried to do a "/invite Loggie" (guessing that is the bot which does the work), but got an error message: "rss-bot :You're not a channel operator". I tried to ".op" and got the same message.
Would much appreciate your assistance!
Wit is intellect, dancing.
(Score: 2) by juggs on Tuesday June 10 2014, @03:23PM
Loggie is now there :)
(Score: 2) by martyb on Wednesday June 11 2014, @01:04PM
Wow, that was quick! Thank-you!
Wit is intellect, dancing.
(Score: 2) by martyb on Tuesday June 17 2014, @10:09PM
It may sound silly, at first, but would you please add the RSS feed for SoylentNews.org into #rss-bot?
I was trying to compare what time we posted a story compared to when another site posted their story on a certain subject. Realized that if we logged our feed into the #rss-bot channel, the task would be much simplified!
So, please add: http://soylentnews.org/index.rss [soylentnews.org] as feed: !SoylentNews
Thanks in advance!
Wit is intellect, dancing.
(Score: 2) by juggs on Friday June 20 2014, @01:35AM
Tis done.
-Regurgitator- Available feeds: !SoylentNews !arstechnica !bbc-tech !bugtraq !cnet !computerworld !darpa !forbes-tech !itworld !krebs !nasa !nature !nist-bioscience !nist-buildfire !nist-chemistry !nist-electronics !nist-energy !nist-forensics !nist-it !nist-manufacturing !nist-math !nist-nano !nist-physics !nist-standards !physorg !sciencedaily_all !sciencemag !securityweek !taosecurity !theregister !wired-enterprise !wired-science.
(Score: 2) by juggs on Friday June 20 2014, @02:03AM
I didn't do it originally as #rss-bot was originally designed to be a source for news for would be submitters to SN - and SN articles are announced in #Soylent anyway.
However, you made a good case so there it is :D
(Score: 1) by martyb on Friday June 20 2014, @06:44PM
To this (and the your other reply to my request) I can only say "Wow... thanks for the quick response!"
Wit is intellect, dancing.
(Score: 2) by martyb on Saturday June 21 2014, @10:03PM
Hi! I have a couple more suggestions for RSS feeds:
This is a feed from a web site: !mosaicscience = http://mosaicscience.com/feed/rss.xml [mosaicscience.com]
This is a feed offerred by pipedot: !pipedot-feed = http://pipedot.org/feed/ [pipedot.org]
NOTE: I'm not sure how well that will work with your code, Might be best to look at what feeds IT follows, and add that those?
Thanks!
Wit is intellect, dancing.
(Score: 2) by juggs on Sunday June 22 2014, @04:21AM
That pipedot page is a conglomeration of different feeds parsed into an html page. I'll look through them and add some in.
For now I've added:
!mosaicscience = http://mosaicscience.com/feed/rss.xml [mosaicscience.com]
!pipedot = http://pipedot.org/atom [pipedot.org]
The latter being pipedot's own atom feed.
(Score: 1) by martyb on Sunday June 22 2014, @12:17PM
Juggs wrote:
Yeah, I realized that *after* I submitted the request. Thanks for doing what I meant, instead of what I asked!
Wit is intellect, dancing.
(Score: 2) by martyb on Tuesday October 14 2014, @02:16PM
It's a seemingly minor thing, but I would appreciate it if the bracketed feed names were not so long. I use "Unifont CSUR" as my font in HexChat (so I can see all the international characters in UTF-8) and for it to be legible, I need to set it to 14 point. Even with a relatively large display (1920x1200), I very occasionally get line-wrap which makes it much harder to scan down what has been posted.
For example: "[Phys.org - latest science and technology news stories]" takes up about half of the width available.
It seems to me that the "long version" of the source does not need to be displayed every time; maybe have it displayed only when a user does: "!rss phys.org" — otherwise, "[phys.org]" should suffice.
Thanks in advance!
Wit is intellect, dancing.
(Score: 2) by juggs on Wednesday October 15 2014, @05:29PM
Adding your other suggestion here too...
Regurgitator> [Latest Science News -- ScienceDaily]
"[ScienceDaily]" would be sufficient
These descriptions are coming from the feeds themselves. But I can surely do something, even if it's to add a config option for each feed to specify this instead of pull it out of the rss / atom feed itself.
(Score: 2) by juggs on Wednesday October 15 2014, @05:56PM
[PhysOrg] and [ScienceDaily] now shortened as requested.
(Score: 2) by juggs on Wednesday October 15 2014, @10:33PM
Additionally:
[SecurityFocus Vulnerabilities] is now [Bugtraq] to match its !trigger
[SecurityWeek RSS Feed] is now just [Security Week]
Hope that keeps things cleaner.
(Score: 1) by martyb on Thursday October 16 2014, @11:30AM
That is *so* much better! Thank-you!!!
Wit is intellect, dancing.
(Score: 2) by juggs on Friday October 17 2014, @11:23PM
I noticed all the !nist-xxx feeds were prefixed with:
[The National Institute of Standards and Technology]
now:
[NIST]
(Score: 2) by juggs on Tuesday January 06 2015, @03:26AM
Regurgitator has been retired in favour of Bender. Now Bender will serve you all you rss goodies in #rss-bot.
(Regurgitator had trouble regurgitating https feeds, Bender didn't. Bender won as it just got on with it rather than demand attention, life's too short etc.)
Still.. #rss-bot suggestions / improvement ideas here please.
(Score: 2) by martyb on Thursday January 22 2015, @10:19PM
First off, I apologize for the formatting, but wanted to ensure that there was no inadvertent translation of any text pasted in this comment; hence my use of "code" for this reply.
I see that Bender has taken over the #rss-bot duties. Anything that makes things easier is a great idea!
An observation, and a request, if I may? Items appearing in the feed present a link that often requires a number of redirects before it gets to the final destination. Oftentimes, I've seen the redirection being used for tracking purposes. I'd really appreciate it if the bot could pre-resolve any redirections and present just the final destination path in the entry which is presented in the feed.
Taking a look in the #Soylent channel, chromas' "Hedonism" bot, does this whenever it resolves a URL -- maybe his code/methodolgy can be merged in?
As a concrete example, I just now saw this presented in the #rss-bot channel:
[SecurityWeek] - Anonymous-linked Journalist Barrett Brown Gets Five Years' Prison - http://feedproxy.google.com/~r/Securityweek/~3/d3fQig0JEb4/anonymous-linked-journalist-barrett-brown-gets-five-years-prison
I just copied that entire text, went to the #Soylent channel, and entered it there. The "Hedonism" bot responded with:
^ Barrett Brown Gets Five Years' Prison | SecurityWeek.Com ( http://www.securityweek.com/anonymous-linked-journalist-barrett-brown-gets-five-years-prison?&utm_medium=feed&utm_campaign=Feed%3A+Securityweek+%28SecurityWeek+RSS+Feed%29 )
You can see it here: <a href="http://logs.sylnt.us/%23soylent/2015-01-22.html#22:11:42"> http://logs.sylnt.us/%23soylent/2015-01-22.html#22:11:42 </a>
If there were any way to also remove the &utm_medium... text, that would be an added bonus and muchly appreciated!
Wit is intellect, dancing.
(Score: -1, Spam) by Anonymous Coward on Friday April 24 2015, @03:57PM
PmyQr5 http://www.FyLitCl7Pf7kjQdDUOLQOuaxTXbj5iNG.com [fylitcl7pf7kjqdduolqouaxtxbj5ing.com]