Microsoft's CEO of AI said that content on the open web can be copied and used to create new content:
Microsoft may have opened a can of worms with recent comments made by the tech giant's CEO of AI Mustafa Suleyman. The CEO spoke with CNBC's Andrew Ross Sorkin at the Aspen Ideas Festival earlier this week. In his remarks, Suleyman claimed that all content shared on the web is available to be used for AI training unless a content producer says otherwise specifically.
"With respect to content that is already on the open web, the social contract of that content since the 90s has been that it is fair use. Anyone can copy it, recreate with it, reproduce with it. That has been freeware, if you like. That's been the understanding," said Suleyman.
"There's a separate category where a website or a publisher or a news organization had explicitly said, 'do not scrape or crawl me for any other reason than indexing me so that other people can find that content.' That's a gray area and I think that's going to work its way through the courts."
[...] Generative AI is one of the hottest topics in tech in 2024. It's also a hot button topic among creators. Some claim that AI trained on other people's work is a form of theft. Others equate training AI on existing work to artists studying at school. Contention often circles around monetizing work that's derivative of other content.
YouTube has reportedly offered "lumps of cash" to train its AI models on music libraries from major record labels. The difference in that situation is that record labels and YouTube will have agreed to terms. Suleyman claims that a company could use any content on the web to train AI, as long as there was not an explicit statement demanding that not be done.
[...] Assuming I've understood Suleyman correctly, the CEO claimed that any content is freeware that anyone can use to make new content, unless the creator says otherwise. I'm not a lawyer, but Suleyman's claims sound a lot like those viral chain messages that get forwarded around Facebook and Instagram saying, "I DO NOT CONSENT TO MY CONTENT BEING USED." I always assumed copyright law was more complicated than a Facebook post.
(Score: 4, Insightful) by Anonymous Coward on Sunday June 30 2024, @01:47PM (8 children)
If Microsoft's AI stuff truly can't be doing any infringing they can go train their AI on their own source code (windows, office etc) and then guarantee to the world their AI's output is 100% AOK to use. Even if one day the output resembles their own copyrighted stuff (especially if someone is using it to make a Windows and/or Office clone).
They want to be the ones to copy stuff for free BUT everyone else has to subscribe aka pay them rent.
(Score: 5, Insightful) by stormreaver on Sunday June 30 2024, @02:02PM (1 child)
Microsoft has concluded that OpenAI is massively infringing copyrights, so wants to try to redefine it.
(Score: 3, Interesting) by anubi on Monday July 01 2024, @01:32AM
Sounds like the distinction between "plagiarism" and "research".
"Steal from ONE", that's "plagiarism".
"Steal from MANY", that's "research".
"Prove all things; hold fast that which is good." [KJV: I Thessalonians 5:21]
(Score: 4, Disagree) by RamiK on Sunday June 30 2024, @04:53PM (4 children)
It's fair use to mine open web content as well as copyrighted works when training AI models in the same way it's fair use to cite excerpts from books and build search engines.
The one potential problem for generative AI companies in the US* was the various TRIPS clauses dealing with commercial use and harm. e.g. https://www.cambridge.org/core/books/big-data-and-global-trade-law/trips-meets-big-data/5BBC9D440FC583634A6C08B72CB0FA30#B-sec-035 [cambridge.org]
However, open web content is freely available so mining it for building up datasets isn't a violation.
The more annoying issue is that even most copyrighted books and music can be mined since the publishers just spent the last 2-3 decades arguing in courts that they need instant mass take-downs for new releases since they aren't making any money off anything but their top-1000 best sellers and are only making that money during the first couple of years thus establishing there's no commercial harm in mining 99.999% of what's out there. So, at the very least, they can't claim mass infringement and must show specific damages per infringed work.
*there's more explicit rules in the EU for all of this but they have their own share of problems: http://eulawanalysis.blogspot.com/2024/05/will-ai-act-bring-more-clarity-to.html [blogspot.com]
compiling...
(Score: 0) by Anonymous Coward on Sunday June 30 2024, @11:49PM (3 children)
(Score: 2) by RamiK on Monday July 01 2024, @11:23AM (2 children)
The problem is that Google's search already established this as fair use.
Regardless, there's other legal shenanigans companies can pull to workaround the commercial clause. e.g. OpenAI isolated its non-profit data mining arm from its for-profit ChatGPT developing arm so its data mining efforts will be exempted as "scientific research" which is an explicit clause in the TRIPS and the EU laws. Then Microsoft positioned itself as a customer of their for-profit arm, a donor of their non-profit arm and a minority share holder so now they too are isolated (possible even at the tax level...).
Again, it's not that legislators aren't aware of all this. It's just that the geopolitics prevents fixing the loopholes at the treaty level and trying to regulate at the national level will disadvantage western liberal democracies at the expense of China & co. And seeing how we're already marching through sanctions and proxy wars towards direct military confrontation anyhow, nothing is going to get done on minor civil rights issues like intellectual property.
compiling...
(Score: 0) by Anonymous Coward on Tuesday July 02 2024, @02:15AM (1 child)
So as long as the AI bunch allow people to use their AI for free (maybe ad supported just like the websites they scrape from) then that seems fair to me.
e.g. if the content is free for them, then it should remain free when they re-publish/cite it to others.
Of course in many cases the stuff on the internet is already infringing - lots of warez, novels, music, etc out there. So you shouldn't be able to convert it to free just because "AI".
(Score: 2) by RamiK on Tuesday July 02 2024, @06:39AM
You're making principled arguments on letter-of-the-law issues. Again, everyone knows the laws are broken on this issue and need amending.
Third time's the charm: I'm not saying what the data miners are doing is right. I'm saying it's legal.
compiling...
(Score: 3, Insightful) by mcgrew on Monday July 01 2024, @02:34PM
Indeed. It's only freeware if its creator says it is. Courts have already ruled that just because it's on the internet doesn't make it public domain.
Donald Trump isn't the only billionaire felon, they're all felons, just unindicted. They just haven't gotten themselves caught like the idiot did.
Yes, Trump is a fucking idiot.
Have you read the Nooze [nooze.org]?
(Score: 2, Interesting) by pTamok on Sunday June 30 2024, @02:40PM
"If wishes were horses, beggars would ride."
What you wish to be so ain't necessarily so.
Is there any case law that shows social contracts to be enforceable?
(Score: 4, Insightful) by Anonymous Coward on Sunday June 30 2024, @02:42PM (2 children)
Great. That is how I have always viewed any and all Microsoft products. They are all freeware since I can find them out and about in the open. They are not worth the sticker price. The only reason to use it is that either it's free, or you can get it for free or your company provides it to you. It's not worth buying or paying for otherwise.
Software piracy or information piracy (or AI training material by another name) is the same then. All free if it's out in the open. Thanks!
(Score: 5, Insightful) by Gaaark on Sunday June 30 2024, @02:49PM
And Jim Jones gave out free Kool-aid... doesn't mean it's good to drink it.
Avoid MS products like it's the cancer it is.
--- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
(Score: 2, Funny) by Runaway1956 on Sunday June 30 2024, @03:03PM
I was going to say very much the same thing. Microsoft products are freeware, readily available via Bittorrent. We have paid for exactly two Microsoft licenses in my home, one of them WinME, the other Win7. Well, my wife paid for those, not me. Everything else, we've "pirated", but you can't pirate freeware, can you? Fair use is fair use.
“I have become friends with many school shooters” - Tampon Tim Walz
(Score: 4, Touché) by Revek on Sunday June 30 2024, @03:20PM (2 children)
I bet their products found online are not part of the daft calculation.
This page was generated by a Swarm of Roaming Elephants
(Score: 5, Insightful) by Thexalon on Sunday June 30 2024, @05:51PM (1 child)
Like all corporations: What's theirs is theirs, and what's yours is also theirs unless you too can hire a lawyer and spend years in litigation to prove it's yours.
"Think of how stupid the average person is. Then realize half of 'em are stupider than that." - George Carlin
(Score: 0) by Anonymous Coward on Sunday June 30 2024, @11:43PM
(Score: 3, Touché) by Anonymous Coward on Sunday June 30 2024, @06:03PM
I've been using Windows as "freeware" for the better part of 25 years. Seems like a fair trade.
(Score: 2) by Spamalope on Monday July 01 2024, @01:54AM
If you're Getty, pay or hands off.
If you're Getty's freelance competition with your own website to see your work with a price list? Why - that's 'free'.
(Score: 2) by JustNiz on Monday July 01 2024, @04:50PM
You've heard it directly from Microsofts's CEO that it's OK to copy and freely use anything Microsoft has on the internet.
(Score: 2) by hendrikboom on Monday July 01 2024, @11:17PM
But how does one make such an explicit statement to ensure that Suleyman's web crawler will recognise it?
(Score: 3, Insightful) by ShovelOperator1 on Tuesday July 02 2024, @09:41AM
First of all, there is the Web and the web. I totally understand Microsoft pushing Internet users to walled gardens, as more and more legislative requirements go for these, so MS will start to offer the law services, usually in the form of a ready scripts (see cookie consents in the EU), with the data stealing (err... analytics!, analytics!) addons of course. This is an ordinary order of any company - create a need, bribe for it and get profits of it.
However, the web is not in its 90s anymore, unfortunately. I have a small website on CC-BY-NC-SA. So, if it's not commercial and there's a reference somewhere, I allow to copy it, even mirror it. More - as local providers for small websites are really poor in terms of uptime, I even encourage saving it, as I mirror websites too. After a significant downtime when I was not sure will it come back, I uploaded a copy to the IA... what I encourage too.
IMHO the problem starts when it's a commercial operation, like AI training whose resulting datasets are locked behinds ANY commercial paywall. The paywall may be a literal contract, like in a proprietary software, but it also may be posted online and still be locked because some components to use it are proprietary. Then, it's no different from ripping a disc or stream, compressing it with another codec and torrenting it to the kingdom come. This is in fact what these algorithms are doing - they compress data on a specific level.
And it was still not a problem! In 90s and early 2000s mirroring sites, useful documents and media was quite extensive. However, now the specific thing was introduced - turning ISPs and various content searching utilities into tools of corporate censorship. This introduced the capitalism known from American gangster movies - I have a cash, the law is not for me - to the Net. The disbalance has been set and it's clearly visible.
Unfortunately, nobody seems to do anything with it.