In a case involving LinkedIn, a federal appeals court reaffirmed Monday that web scraping likely doesn't violate the Computer Fraud and Abuse Act (CFAA).
The ruling by the US Court of Appeals for the Ninth Circuit drew a distinction between data that is password-protected and data that is publicly available. That means hiQ Labs—a data analytics company that uses automated technology to scrape information from public LinkedIn profiles—can continue accessing LinkedIn data, a three-judge panel at the appeals court ruled:
This discussion has been archived.
No new comments can be posted.
LinkedIn Can't Use Anti-Hacking Law to Block Web Scraping, Judges Rule
|
Log In/Create an Account
| Top
| 11 comments
| Search Discussion
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
(1)
(Score: 5, Insightful) by Revek on Thursday April 21 2022, @12:58PM (4 children)
I've always felt linkedin was rather scummy. I started to sign up a few times but overall the whole site and their ethos leave me feeling like I just met a boss who likes to steal lunches from the breakroom fridge.
This page was generated by a Swarm of Roaming Elephants
(Score: 5, Informative) by Anonymous Coward on Thursday April 21 2022, @01:08PM (3 children)
They are a Microsoft company.
(Score: 5, Insightful) by ikanreed on Thursday April 21 2022, @02:56PM
I think more than that, they're a social media company.
All the big ones have moved past "How can we attract more people" to "how can we further exploit the people we've got".
At this point every person who's been wary of joining facebook or twitter has been vindicated a dozen times over.
(Score: 2, Informative) by pTamok on Thursday April 21 2022, @03:19PM (1 child)
LinkedIn was bought by Microsoft in 2016. [wikipedia.org]
This is not to say they were pure as the driven snow before then, but several less than beneficial behaviours have happened since Microsoft bought the company. I said at the time that Microsoft's long term strategy would be to monetise the information people had lodged with them and use it to increase control and influence in other areas. Which Microsoft has - Microsoft bought LinkedIn for USD 26.2 billion, so presumably believe they could get more than that back - probably considerably more.
Who would have thought that a CV/resumé publishing service could be 'worth' so much?
(Score: 1) by anubi on Thursday April 21 2022, @11:58PM
A database of people's social status in invaluable to those who sell things to them.
Why offer a price right above the marginal cost of production if a few seconds lookup time will tell you your mark will gladly pay orders of magnitude more for the thing?
Conversely, in hiring, it gives you a heads up on who has likely gotten used to an extravagant lifestyle, forgotten his roots, and is apt to be a huge liability should you hire it.
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
(Score: 3, Insightful) by DannyB on Thursday April 21 2022, @01:27PM (4 children)
I want to publish information on the web. People should only be allowed to use that information in ways that I approve of.
I want to post information on the bulletin board. People should only be allowed to use that information in ways that I approve of.
I want to post flapdoodle piffle on SN. People should only be allowed to use that blarney gobbldygook in ways that I approve of.How often should I have my memory checked? I used to know but...
(Score: 4, Informative) by canopic jug on Thursday April 21 2022, @01:48PM (2 children)
The article linked is at a crap site, but it did manage to point to what it claims is the opinion by Judge Berzon [courtlistener.com]. However, skimming through that PDF, I do not see even a single reference to the IETF's RFC 2616 on HTTP 1.1 [ietf.org]. Nor do I see RFC, Request for Comment, Internet Engineering Task Force, or IETF mentioned anywhere. The court's decision doesn't even mention of either HyperText Transfer Protocol or HTTP. In fact, the string "protocol" is not even mentioned in the body of the document, just in a few footnotes.
RFC 2616 and the others should have been featured first and foremost in a short discussion leading to dismissal of the case against "scraping". Microsoft's LinkedIn published the material via the web. It is then up to the visitors to choose what kind of HTTP client to use.
Money is not free speech. Elections should not be auctions.
(Score: 3, Insightful) by pTamok on Thursday April 21 2022, @03:36PM (1 child)
I agree.
An HTTP request is precisely that, a request, and it is up to the server what it does with it. It doesn't have to reply. Or can reply with a one of several status codes [wikipedia.org], e.g.:
- 401 Unauthorized
- 403 Forbidden
Things get more murky if you have to register for an account which has attached terms and conditions, and can only access a website having authenticated yourself in some manner - so for example, if an account is set up for a bot, which is then able to view/scrape things that only members with accounts could see.
There is a general problem of 'what is public?'. For example, in the past in the UK, if you wanted to check if someone were on the Electoral Roll, you needed to go to the nearest reference library and physically check a paper print out. It took effort, and didn't scale. However, the information was public: anyone could go and look. Publishing on the internet makes the same information far more accessible and amenable to bulk searching etc. Sometimes this enables unwanted behaviours (e.g. checking all the Electoral Rolls in the country to find someone who doesn't want to be stalked), and there is a difference between public, but requires non-trivial effort to obtain, and public but trivial to do bulk operations on. Sometimes this increased accessibility is good, sometimes it enables problematic behaviours, and as far as I can tell, many, if not all, societies have not come to terms with the increased accessibility of public information and how to deal with the poor consequences of this increased accessibility as well as taking advantage of the good consequences.
(Score: 2) by krishnoid on Thursday April 21 2022, @04:39PM
And what is "public" vs what is "copyrighted" (which would run afoul of another law, probably). Still, some people feel [wikipedia.org] that everything not expressly 401 Forbidden is 200 OK, and some feel that what's not expressly 200 OK is 401 Forbidden.
(Score: 2, Interesting) by Anonymous Coward on Thursday April 21 2022, @08:22PM
On the one hand you almost HAVE to maintain some online presence - for career or business, esp in the tech or creative fields.
On the other hand, scraping is going to happen and then you lose control of what you put out there.
I was encouraged to join Linked-In around 2006, but then started finding my entire profile copied to some other website a couple of years later. At that point I closed my account and will not touch Linked-In or any similar 'service' ever again.
But humans are dumb, and I am one of them. A second lesson - with UnSplash followed in 2018. I put up some images, got ZERO business leads, ONE person who made contact to Ok using my image in a certain context, and other wise nothing upon nothing. Then a surprise - I found two images being used on a website. No new discovery, this has hapenned as per license and I normally smile - "there's one of mine, enjoy." BUT, these images are being linked to from a 3-rd party site ... who had scraped UnSplash and now monetize the images one a click-per-view. Unethical, absolutely. Genius - in a perverse way, yes. I've bailed on UnSplash and please spread the word. What images are already out there loose, are lost. What bothers me is this pay-per-click... are UnSplash contributors maybe funding ISIS or the RU mafia?
(Score: 5, Interesting) by bradley13 on Thursday April 21 2022, @02:38PM
"Look at all this great information! See!! Look at it!!!" Now pay me.
No. If you want to put information on the public internet, guess what, it's going to be public. Automated processes should pay attention to the "robots.txt" file, but even that is really only a courtesy.
It's really no different from Google using snippets from web sites. If they don't want that to happen, they should put the information behind a paywall. Where, of course, no one at all will see it, but at least their copyrights will be safe.
Everyone is somebody else's weirdo.