Stories
Slash Boxes
Comments

SoylentNews is people

posted by martyb on Wednesday July 31 2019, @06:36AM   Printer-friendly
from the Arachnophobia dept.

"Come on, I worked so hard on this project! And this is publicly accessible data! There's certainly a way around this, right? Or else, I did all of this for nothing... Sigh..."

Yep - this is what I said to myself, just after realizing that my ambitious data analysis project could get me into hot water. I intended to deploy a large-scale web crawler to collect data from multiple high profile websites. And then I was planning to publish the results of my analysis for the benefit of everybody. Pretty noble, right? Yes, but also pretty risky.

Interestingly, I've been seeing more and more projects like mine lately. And even more tutorials encouraging some form of web scraping or crawling. But what troubles me is the appalling widespread ignorance on the legal aspect of it.

So this is what this post is all about - understanding the possible consequences of web scraping and crawling. Hopefully, this will help you to avoid any potential problem.

Disclaimer: I'm not a lawyer. I'm simply a programmer who happens to be interested in this topic. You should seek out appropriate professional advice regarding your specific situation.

https://benbernardblog.com/web-scraping-and-crawling-are-perfectly-legal-right/


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by looorg on Wednesday July 31 2019, @02:21PM

    by looorg (578) on Wednesday July 31 2019, @02:21PM (#873547)

    I'm not some fancy lawyer either but I'm fairly certain they'll be shit out of luck upholding that somehow. If I can browse it I can scrape it. Even if using the program, script, bot or whatnot was somehow illegal how is ctrl+a ctrl+c ctrl+v going to be deemed illegal? Seriously if you don't want people to use your information then don't put it out there. All it does is saving me time, only grabbing what I really need and saving it in a format that I like it. It's like selective browsing if anything.

    I just glanced the blogpost and it seems to be summed up as one giant perhaps, perhaps, perhaps and that it's some kind of grey area. I gather that Linkedin isn't happy about others taking part of all that data that their drone followers that want to "network" is falling into the competitions hands, but then it's just so much they can do about that. Asking for data and then saying it's all secret and shit? That must be some kind of a joke. Still they might be in a better spot then others since I seem to recall they requiring some kind of account and login to take part in certain actions -- that is really as much as I know about Linkedin, it seemed like a stupid idea overall and I didn't fancy giving them any free information just as I don't think it's a great idea to feed Facebook et al any data either.

    That said if you have data, someone will scrape it so you might as well just make it all easier on yourself then and get an API for accessing said data. Cause that publicly available data is getting scraped one way or another so why make it harder then you have to really. I care about their Robot.txt and scraping speeds etc as much as search engines etc do (ie not at all).

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2