The day after The New York Times sued OpenAI for copyright infringement, the author and systems architect Daniel Jeffries wrote an essay-length tweet arguing that the Times "has a near zero probability of winning" its lawsuit. As we write this, it has been retweeted 288 times and received 885,000 views.
"Trying to get everyone to license training data is not going to work because that's not what copyright is about," Jeffries wrote. "Copyright law is about preventing people from producing exact copies or near exact copies of content and posting it for commercial gain. Period. Anyone who tells you otherwise is lying or simply does not understand how copyright works."
[...] Courts are supposed to consider four factors in fair use cases, but two of these factors tend to be the most important. One is the nature of the use. A use is more likely to be fair if it is "transformative"—that is, if the new use has a dramatically different purpose and character from the original. Judge Rakoff dinged MP3.com as non-transformative because songs were merely "being retransmitted in another medium."
In contrast, Google argued that a book search engine is highly transformative because it serves a very different function than an individual book. People read books to enjoy and learn from them. But a search engine is more like a card catalog; it helps people find books.
The other key factor is how a use impacts the market for the original work. Here, too, Google had a strong argument since a book search engine helps people find new books to buy.
[...] In 2015, the Second Circuit ruled for Google. An important theme of the court's opinion is that Google's search engine was giving users factual, uncopyrightable information rather than reproducing much creative expression from the books themselves.
[...] Recently, we visited Stability AI's website and requested an image of a "video game Italian plumber" from its image model Stable Diffusion.
[...] Clearly, these models did not just learn abstract facts about plumbers—for example, that they wear overalls and carry wrenches. They learned facts about a specific fictional Italian plumber who wears white gloves, blue overalls with yellow buttons, and a red hat with an "M" on the front.
These are not facts about the world that lie beyond the reach of copyright. Rather, the creative choices that define Mario are likely covered by copyrights held by Nintendo.
We are not the first to notice this issue. When one of us (Tim) first wrote about these lawsuits last year, he illustrated his story with an image of Mickey Mouse generated by Stable Diffusion. In a January piece for IEEE Spectrum, cognitive scientist Gary Marcus and artist Reid Southen showed that generative image models produce a wide range of potentially infringing images—not only of copyrighted characters from video games and cartoons but near-perfect copies of stills from movies like Black Widow, Avengers: Infinity War, and Batman v Superman.
In its lawsuit against OpenAI, the New York Times provided 100 examples of GPT-4 generating long, near-verbatim excerpts from Times articles
[...] Those who advocate a finding of fair use like to split the analysis into two steps, which you can see in OpenAI's blog post about The New York Times lawsuit. OpenAI first categorically argues that "training AI models using publicly available Internet materials is fair use." Then in a separate section, OpenAI argues that "'regurgitation' is a rare bug that we are working to drive to zero."
But the courts tend to analyze a question like this holistically; the legality of the initial copying depends on details of how the copied data is ultimately used.
Previously on SoylentNews:
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement - 20231228
Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over - 20230821
Related stories on SoylentNews:
Microsoft in Deal With Semafor to Create News Stories With Aid of AI Chatbot - 20240206
AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead - 20240112
Writers and Publishers Face an Existential Threat From AI: Time to Embrace the True Fans Model - 20230415
Related Stories
Writers and publishers face an existential threat from AI: time to embrace the true fans model:
Walled Culture has written several times about the major impact that generative AI will have on the copyright landscape. More specifically, these systems, which can create quickly and cheaply written material on any topic and in any style, are likely to threaten the publishing industry in profound ways. Exactly how is spelled out in this great post by Suw Charman-Anderson on her Word Count blog. The key point is that large language models (LLMs) are able to generate huge quantities of material. The fact that much of it is poorly written makes things worse, because it becomes harder to find the good stuff[.]
[...] One obvious approach is to try to use AI against AI. That is, to employ automated vetting systems to weed out the obvious rubbish. That will lead to an expensive arms race between competing AI software, with unsatisfactory results for publishers and creators. If anything, it will only cause LLMs to become better and to produce material even faster in an attempt to fool or simply overwhelm the vetting AIs.
The real solution is to move to an entirely different business model, which is based on the unique connection between human creators and their fans. The true fans approach has been discussed here many times in other contexts, and once more reveals itself as resilient in the face of change brought about by rapidly-advancing digital technologies.
OpenAI could be fined up to $150,000 for each piece of infringing content:
Weeks after The New York Times updated its terms of service (TOS) to prohibit AI companies from scraping its articles and images to train AI models, it appears that the Times may be preparing to sue OpenAI. The result, experts speculate, could be devastating to OpenAI, including the destruction of ChatGPT's dataset and fines up to $150,000 per infringing piece of content.
NPR spoke to two people "with direct knowledge" who confirmed that the Times' lawyers were mulling whether a lawsuit might be necessary "to protect the intellectual property rights" of the Times' reporting.
Neither OpenAI nor the Times immediately responded to Ars' request to comment.
If the Times were to follow through and sue ChatGPT-maker OpenAI, NPR suggested that the lawsuit could become "the most high-profile" legal battle yet over copyright protection since ChatGPT's explosively popular launch. This speculation comes a month after Sarah Silverman joined other popular authors suing OpenAI over similar concerns, seeking to protect the copyright of their books.
[...] In April, the News Media Alliance published AI principles, seeking to defend publishers' intellectual property by insisting that generative AI "developers and deployers must negotiate with publishers for the right to use" publishers' content for AI training, AI tools surfacing information, and AI tools synthesizing information.
Previously:
Sarah Silverman Sues OpenAI, Meta for Being "Industrial-Strength Plagiarists" - 20230711
Related:
The Internet Archive Reaches An Agreement With Publishers In Digital Book-Lending Case - 20230815
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
The New York Times on Wednesday filed a lawsuit against Microsoft and OpenAI, the company behind popular AI chatbot ChatGPT, accusing the companies of creating a business model based on "mass copyright infringement," stating their AI systems "exploit and, in many cases, retain large portions of the copyrightable expression contained in those works:"
Microsoft both invests in and supplies OpenAI, providing it with access to the Redmond, Washington, giant's Azure cloud computing technology.
The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the "billions of dollars in statutory and actual damages" it believes it is owed for the "unlawful copying and use of The Times's uniquely valuable works."
[...] The Times said in an emailed statement that it "recognizes the power and potential of GenAI for the public and for journalism," but added that journalistic material should be used for commercial gain with permission from the original source.
"These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise," the Times said.
Media outlets are calling foul play over AI companies using their content to build chatbots. They may find friends in the Senate:
Logo text More than a decade ago, the normalization of tech companies carrying content created by news organizations without directly paying them — cannibalizing readership and ad revenue — precipitated the decline of the media industry. With the rise of generative artificial intelligence, those same firms threaten to further tilt the balance of power between Big Tech and news.
On Wednesday, lawmakers in the Senate Judiciary Committee referenced their failure to adopt legislation that would've barred the exploitation of content by Big Tech in backing proposals that would require AI companies to strike licensing deals with news organizations.
Richard Blumenthal, Democrat of Connecticut and chair of the committee, joined several other senators in supporting calls for a licensing regime and to establish a framework clarifying that intellectual property laws don't protect AI companies using copyrighted material to build their chatbots.
[...] The fight over the legality of AI firms eating content from news organizations without consent or compensation is split into two camps: Those who believe the practice is protected under the "fair use" doctrine in intellectual property law that allows creators to build upon copyrighted works, and those who argue that it constitutes copyright infringement. Courts are currently wrestling with the issue, but an answer to the question is likely years away. In the meantime, AI companies continue to use copyrighted content as training materials, endangering the financial viability of media in a landscape in which readers can bypass direct sources in favor of search results generated by AI tools.
[...] A lawsuit from The New York Times, filed last month, pulled back the curtain behind negotiations over the price and terms of licensing its content. Before suing, it said that it had been talking for months with OpenAI and Microsoft about a deal, though the talks reached no such truce. In the backdrop of AI companies crawling the internet for high-quality written content, news organizations have been backed into a corner, having to decide whether to accept lowball offers to license their content or expend the time and money to sue in a lawsuit. Some companies, like Axel Springer, took the money.
It's important to note that under intellectual property laws, facts are not protected.
Also at Courthouse News Service and Axios.
Related:
- New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
- Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over
- Writers and Publishers Face an Existential Threat From AI: Time to Embrace the True Fans Model
Microsoft is working with media startup Semafor to use its artificial intelligence chatbot to help develop news stories—part of a journalistic outreach that comes as the tech giant faces a multibillion-dollar lawsuit from the New York Times.
As part of the agreement, Microsoft is paying an undisclosed sum of money to Semafor to sponsor a breaking news feed called "Signals." The companies would not share financial details, but the amount of money is "substantial" to Semafor's business, said a person familiar with the matter.
[...] The partnerships come as media companies have become increasingly concerned over generative AI and its potential threat to their businesses. News publishers are grappling with how to use AI to improve their work and stay ahead of technology, while also fearing that they could lose traffic, and therefore revenue, to AI chatbots—which can churn out humanlike text and information in seconds.
The New York Times in December filed a lawsuit against Microsoft and OpenAI, alleging the tech companies have taken a "free ride" on millions of its articles to build their artificial intelligence chatbots, and seeking billions of dollars in damages.
[...] Semafor, which is free to read, is funded by wealthy individuals, including 3G capital founder Jorge Paulo Lemann and KKR co-founder Henry Kravis. The company made more than $10 million in revenue in 2023 and has more than 500,000 subscriptions to its free newsletters. Justin Smith said Semafor was "very close to a profit" in the fourth quarter of 2023.
Related stories on SoylentNews:
AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead - 20240112
New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement - 20231228
Microsoft Shamelessly Pumping Internet Full of Garbage AI-Generated "News" Articles - 20231104
Google, DOJ Still Blocking Public Access to Monopoly Trial Docs, NYT Says - 20231020
After ChatGPT Disruption, Stack Overflow Lays Off 28 Percent of Staff - 20231017
Security Risks Of Windows Copilot Are Unknowable - 20231011
Microsoft AI Team Accidentally Leaks 38TB of Private Company Data - 20230923
Microsoft Pulls AI-Generated Article Recommending Ottawa Food Bank to Tourists - 20230820
A Jargon-Free Explanation of How AI Large Language Models Work - 20230805
the Godfather of AI Leaves Google Amid Ethical Concerns - 20230502
The AI Doomers' Playbook - 20230418
Ads Are Coming for the Bing AI Chatbot, as They Come for All Microsoft Products - 20230404
Deepfakes, Synthetic Media: How Digital Propaganda Undermines Trust - 20230319
Suchir Balaji, a former OpenAi employee, helped gather and organize the enormous amounts of internet data used to train the startup's ChatGPT chatbot:
A former OpenAI researcher known for whistleblowing the blockbuster artificial intelligence company facing a swell of lawsuits over its business model has died, authorities confirmed this week.
Suchir Balaji, 26, was found dead inside his Buchanan Street apartment on Nov. 26, San Francisco police and the Office of the Chief Medical Examiner said. Police had been called to the Lower Haight residence at about 1 p.m. that day, after receiving a call asking officers to check on his well-being, a police spokesperson said.
The medical examiner's office determined the manner of death to be suicide and police officials this week said there is "currently, no evidence of foul play."
[...] In a Nov. 18 letter filed in federal court, attorneys for The New York Times named Balaji as someone who had "unique and relevant documents" that would support their case against OpenAI. He was among at least 12 people — many of them past or present OpenAI employees — the newspaper had named in court filings as having material helpful to their case, ahead of depositions.
Previously:
- AI Companies Are Finally Being Forced To Cough Up For Training Data
- OpenAI Says New York Times 'Hacked' ChatGPT to Build Copyright Lawsuit
- Why the New York Times Might Win its Copyright Lawsuit Against OpenAI
- New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
[We have had several complaints recently (polite ones, not a problem) regarding the number of AI stories that we are printing. I agree, but that reflects the number of submissions that we receive on the subject. So I have compiled a small selection of AI stories into one and you can read them or ignore them as you wish. If you are making a comment please make it clear exactly which story you are referring to unless your comment is generic. The submitters each receive the normal karma for a submission. JR]
Image-scraping Midjourney bans rival AI firm for scraping images
On Wednesday, Midjourney banned all employees from image synthesis rival Stability AI from its service indefinitely after it detected "botnet-like" activity suspected to be a Stability employee attempting to scrape prompt and image pairs in bulk. Midjourney advocate Nick St. Pierre tweeted about the announcement, which came via Midjourney's official Discord channel.
[...] Siobhan Ball of The Mary Sue found it ironic that a company like Midjourney, which built its AI image synthesis models using training data scraped off the Internet without seeking permission, would be sensitive about having its own material scraped. "It turns out that generative AI companies don't like it when you steal, sorry, scrape, images from them. Cue the world's smallest violin."
[...] Shortly after the news of the ban emerged, Stability AI CEO Emad Mostaque said that he was looking into it and claimed that whatever happened was not intentional. He also said it would be great if Midjourney reached out to him directly. In a reply on X, Midjourney CEO David Holz wrote, "sent you some information to help with your internal investigation."
[...] When asked about Stability's relationship with Midjourney these days, Mostaque played down the rivalry. "No real overlap, we get on fine though," he told Ars and emphasized a key link in their histories. "I funded Midjourney to get [them] off the ground with a cash grant to cover [Nvidia] A100s for the beta."
Midjourney stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Midjourney&sort=2
Stable Diffusion (Stability AI) stories on SoylentNews: https://soylentnews.org/search.pl?tid=&query=Stable+Diffusion&sort=2
OpenAI has asked a federal judge to dismiss parts of the New York Times' copyright lawsuit against it, arguing that the newspaper "hacked" its chatbot ChatGPT and other artificial-intelligence systems to generate misleading evidence for the case:
OpenAI said in a filing in Manhattan federal court on Monday that the Times caused the technology to reproduce its material through "deceptive prompts that blatantly violate OpenAI's terms of use."
"The allegations in the Times's complaint do not meet its famously rigorous journalistic standards," OpenAI said. "The truth, which will come out in the course of this case, is that the Times paid someone to hack OpenAI's products."
OpenAI did not name the "hired gun" who it said the Times used to manipulate its systems and did not accuse the newspaper of breaking any anti-hacking laws.
[...] Courts have not yet addressed the key question of whether AI training qualifies as fair use under copyright law. So far, judges have dismissed some infringement claims over the output of generative AI systems based on a lack of evidence that AI-created content resembles copyrighted works.
Also at The Guardian, MSN and Forbes.
Previously:
- New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement
- Report: Potential NYT lawsuit could force OpenAI to wipe ChatGPT and start over
- Why the New York Times Might Win its Copyright Lawsuit Against OpenAI
- AI Threatens to Crush News Organizations. Lawmakers Signal Change Is Ahead
(Score: 2) by DrkShadow on Thursday February 22 2024, @07:03AM (5 children)
So, requesting that a tool create an image based on your query means that the entity that created the tool is guilty of infringement?
What does this say about a 5th grader who draws a superhero? (optionally: very well?)
(Score: 4, Interesting) by Anonymous Coward on Thursday February 22 2024, @08:46AM
Possibly if the image and/or the usage of the image is infringing. In some places the copyright laws are such that downloading isn't infringing if it's for private and domestic use, but it's infringing if you distribute.
It still could be copyright infringement. Whether you try to launder your copyright infringement via a 5th grader or AI shouldn't make it legal.
Same goes for getting a 5th grader to retype GPLed code and change the variable names and comments. It's still copyright infringement and you should not be able to defeat GPL that way.
Seems like various companies are trying to push for a future where they can get away with copyright infringement while the rest of us can't unless we pay them a subscription.
(Score: 2) by stormreaver on Thursday February 22 2024, @02:19PM (2 children)
If that superhero is Superman (or other well-known characters), it says that the 5th-grader had better not sell copies on the open market if he doesn't want to defend against an expensive copyright suit he will likely lose.
(Score: 2) by DrkShadow on Saturday February 24 2024, @05:38AM (1 child)
How is that like what OpenAI is doing? It's not selling generated things on the open market.
It's not selling anything more than Gimp's support subscription would be selling. What you make with it is your own responsibility.
(Score: 2) by stormreaver on Sunday February 25 2024, @12:49AM
It's trawling the Internet, sucking in copyrighted data, and regurgitating said copyrighted data for profit.
(Score: 3, Interesting) by Freeman on Thursday February 22 2024, @02:48PM
To fall under fair use, you essentially can't be making a profit from the infringing content. Thus, someone who makes their own stuffed Twilight Sparkle pony is fine (generally). Someone selling them is infringing on copyright and can be held liable. I.E. they're going to get sued. Even sue happy companies like Nintendo don't sue frivolously. The moment you start distributing their digital content without permission/selling anything Nintendo without their permission, you're definitely going to get sued.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 5, Insightful) by bzipitidoo on Thursday February 22 2024, @08:26AM (2 children)
One thing to clear up: the "I" in OpenAI is a lie. It's typical corporate hype, aided by an overexcited press too eager to swallow such corporate dramatizing, for their own purposes of generating more clicks. These chatbots are NOT intelligent. Not even artificially intelligent. LLMs are not the sole secret sauce needed to create intelligence. Daniel Jeffries whole argument is based on the idea that these bots are AI, and are being creative in their reuse of copyrighted material. No, they are merely copying it, and their human masters are refusing to see that. If the bots had any intelligence, they'd change things up.
Having said that, this is yet another case where too much is being made of copyright. So the bot copies things, so what? The problem is more one of plagiarism. The bot ought to make and show citations. If OpenAI makes that change, then I think they can avoid this lawsuit altogether, and maybe not even have to settle. As it is, they're headed towards a rude surprise.
(Score: 0) by Anonymous Coward on Thursday February 22 2024, @08:52AM (1 child)
(Score: 3, Funny) by Anonymous Coward on Thursday February 22 2024, @01:18PM
> maybe OpenAI should be charged with slavery, and also murder every time they destroy an instance of each chatbot.
Only in Alabama.
(Score: 5, Funny) by stormreaver on Thursday February 22 2024, @02:21PM
Daniel's response is hilarious:
"Copyright law is about preventing people from producing exact copies or near exact copies of content and posting it for commercial gain"
Translated to: "Copyright law is about preventing people from [doing exactly what we're doing]."
(Score: 4, Interesting) by ElizabethGreene on Thursday February 22 2024, @02:34PM (1 child)
A specific image or frame from a moving image of Mario can be copyrighted. AFAIK, "Italian plumber who wears white gloves, blue overalls with yellow buttons, and a red hat with an "M" on the front" would be a trademark, not a copyright.
Trademark is a whole different set of legal battles, and IMHO the studios will have a much stronger case against that. The copyright case should focus on the training data used for the models. They pirated those; They built a billion-dollar product on stolen IP. That is not okay. If we go back to the RIAA lawsuits, they went after individuals for $1500 per song, but the law gave them the option to go for 100 times that. Those are the kind of damages that, multiplied by 187,000 works*, will bring OpenAI to the negotiating table with a very open mind for settlements.
(*This is the number of full-text works in the now-DMCA'd books3 dataset. If the models were trained on the sci-hub dataset, the number could be ten times that.)
(Score: 2) by Freeman on Thursday February 22 2024, @02:57PM
Assuming they used actual pirate sites like sci-hub, they could easily have trained their data on tens of millions infringing content.
https://en.wikipedia.org/wiki/Sci-Hub [wikipedia.org]
Hit 'em with $1,000 per infringing asset, easily 80 Million plus items, that translates to 80 billion dollars worth of damages. That's just for scholarly publications.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 5, Insightful) by Freeman on Thursday February 22 2024, @03:01PM (5 children)
Change a single data point, instead of the company being based in the USA, have it based in China/Russia. Literally the entirety of the "Free World" would be upset and be putting sanctions on China/Russia. Why is this one stupid company based in the USA any different? They hoovered up the data from basically every single thing they could and then created a giant dataset with it. If that's not infringing content, I don't know what is.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 0, Disagree) by Anonymous Coward on Thursday February 22 2024, @08:50PM (4 children)
Because it is copyright. If they are using data that is not infringing. Copyright controls the right to make copies not the right to possess a copy.
Copyright is not the ownership of information. The very concept is nonsensical. Copyright is the legal right to make and sell copies of a particular expression of information.
(Score: 3, Interesting) by Freeman on Friday February 23 2024, @02:26PM (3 children)
So what you're saying is that they're not selling ChatGPT services? Seriously, the content was copied. Just because they obfuscated the data, doesn't mean they didn't use it all. It also doesn't mean that they're not also reproducing said infringing content. They didn't just scrape what was freely available. They trained it on copyrighted works. They shouldn't just get a free pass, because it's big and don't know the internal workings of their own dataset.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 0) by Anonymous Coward on Saturday February 24 2024, @06:29AM (2 children)
So if you write a book detailing the use of woodworking tools and techniques, does any carpenter who reads it owe you royalties on their work?
Obviously not.
The USE of information is not controlled, only the production of literal copies of a work.
(Score: 2) by Freeman on Monday February 26 2024, @02:23PM (1 child)
What I'm saying is that they just reformatted the data. All the while making whole sale copies of an unknown number of copyrighted works.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 0) by Anonymous Coward on Tuesday February 27 2024, @07:58AM
Here's another one. That carpenter reads that book, and a lot of other ones over his career. he becomes a real expert in all aspects of carpentry. When he retires, he writes a book with all the best techniques he's learned.
Is he infringing the copyright of the owners of the books he learned from ?
(Score: 3, Interesting) by DannyB on Thursday February 22 2024, @04:50PM (4 children)
Opinion:
Humans do copyright infringement. Not machines.
Just because I, one person in billions, makes a query to either an AI or a search engine, and gets a result about some copyrighted character, story, music, art, poetry, etc, does not mean that my search result, or an AI generated result, is copyright infringement.
What is copyright infringement is if I take that result I obtain, and publish or use it in some way that violates copyright law.
My private result of seeing a query result does not mean any copyright infringement occurred. The search engine, nor an AI are widely distributing that result unbidden. I, and not the search engine, initiated the action which resulted in me getting a plumber with the letter M.
I have asked for pictures of puppies sleeping on a rug on a wood floor. Or I ask for pictures of kittens driving miniature race cars on a wood floor. In both cases I get amazing results. Astonishing even. It is my assumption that these actual images do not exist anywhere in the real world. The AI knows what kittens and puppies are. It knows what carpets are. I knows that miniature race cars are. It knows what wood floors are. Etc.
It is up to a human to avoid publishing something that IS copyright infringement. Or if notified of such infringement to determine their response to such a notification. It is not up to a machine.
In the case of AI, the AI learns about the world through images, similarly to how your human meat based neural network does. Why should we discriminate against AI learning from images? It seems okay for you and I to be informed by looking at paintings in a museum. Or images in a book of art. Or movies and games, etc. Why is it copyright infringement if a machine looks at and learns from an image, but it is okay if I look at and learn from that same image? The way we all know about the real world is how our brains were trained by seeing everything in the world, including things under copyright.
--
Remember that March is the week when we all remember and celebrate the blessings and joys that procrastination brings to our lives. It is intended to be celebrated in the first week of March, but people don't always get around to it right away.
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
(Score: 2, Touché) by Anonymous Coward on Friday February 23 2024, @02:06AM (1 child)
You missed out on applying your own reasoning on one important step: the humans in charge in OpenAI are using AIs to publish potentially infringing stuff to the public (and perhaps try to make money from doing so).
(Score: 2) by DannyB on Friday February 23 2024, @03:24PM
That should definitely be prevented. However I see it as equivalent to asking a human a question and getting a near-verbatim reply without a notice that this is a quotation from a specific Times article. That is something that should be fixed in the AI.
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.
(Score: 2) by Freeman on Friday February 23 2024, @02:35PM (1 child)
So what you're saying is that as long as I use "AI" models, I can freely copy all of the academic journals I want? So long as I can't prove that my "AI" isn't making verbatim copies, we're good? Any verbatim regurgitation is "accidental" and should be fine? How about, the act of copying said content in the first place was likely infringement enough? No matter what I ended up doing with it. Your honor, the "script" is the one that stole all the data. I am completely innocent.
Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
(Score: 2) by DannyB on Friday February 23 2024, @03:21PM
Should it be a problem if an individual does a query and the response in whole or in part is a quotation from an academic journal, magazine, newspaper, book, etc? Don't ancient search engines from the previous millennium do that?
Should it be a problem if I asked a professor instead of an AI, and the professor made use of things he read in an academic journal in order to formulate his response? Why should an AI be any different? IMO any questions like this should not discriminate between meat based brains and simulated neural networks. Machines simply amplify what humans can do. Machines can lift heaver things. Dig holes faster. Wash dishes better. Process payrolls more efficiently.
I could see a case where AIs should be filtered to not produce quotations without clearly labeling them as such. Similar, in my case of an image generating AI, it should not produce an outright copy of an image it was trained on.
Hypothetically, it might be interesting if the weights of the interconnections between neurons, during training, recorded what percentage of every input training example was part of setting this particular synapse weight. Then it might be possible to take the result of puppies sleeping on carpet on wood floor and see which input images were used, and what percent of the resulting image was influenced from that training image data. Similarly imagine if a chatbot AI could produce which text works influenced the probabilities in generating a text response to a prompt, and by how much each source material influenced it.
As for scraping the internet to train AI's I have no problem with that. I see it no differently than what search engines do. Anyone is free to browse the web, including with automation. It's what you do with material online that matters.
It will be interesting to see how all this works out over the coming years. A lot of new questions. Just as seems to happen with many new technologies.
The Centauri traded Earth jump gate technology in exchange for our superior hair mousse formulas.