Stories
Slash Boxes
Comments

SoylentNews is people

posted by janrinok on Tuesday April 04 2023, @08:45AM   Printer-friendly
from the extra-popcorn dept.

https://arstechnica.com/tech-policy/2023/04/stable-diffusion-copyright-lawsuits-could-be-a-legal-earthquake-for-ai/

The AI software Stable Diffusion has a remarkable ability to turn text into images. When I asked the software to draw "Mickey Mouse in front of a McDonald's sign," for example, it generated the picture you see above.

Stable Diffusion can do this because it was trained on hundreds of millions of example images harvested from across the web. Some of these images were in the public domain or had been published under permissive licenses such as Creative Commons. Many others were not—and the world's artists and photographers aren't happy about it.

In January, three visual artists filed a class-action copyright lawsuit against Stability AI, the startup that created Stable Diffusion. In February, the image-licensing giant Getty filed a lawsuit of its own.
[...]
The plaintiffs in the class-action lawsuit describe Stable Diffusion as a "complex collage tool" that contains "compressed copies" of its training images. If this were true, the case would be a slam dunk for the plaintiffs.

But experts say it's not true. Erik Wallace, a computer scientist at the University of California, Berkeley, told me in a phone interview that the lawsuit had "technical inaccuracies" and was "stretching the truth a lot." Wallace pointed out that Stable Diffusion is only a few gigabytes in size—far too small to contain compressed copies of all or even very many of its training images.

Related:
Ethical AI art generation? Adobe Firefly may be the answer. (20230324)
Paper: Stable Diffusion "Memorizes" Some Images, Sparking Privacy Concerns (20230206)
Getty Images Targets AI Firm For 'Copying' Photos (20230117)
Pixel Art Comes to Life: Fan Upgrades Classic MS-DOS Games With AI (20220904)
A Startup Wants to Democratize the Tech Behind DALL-E 2, Consequences be Damned (20220817)


Original Submission

 
This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 5, Funny) by Opportunist on Tuesday April 04 2023, @09:14AM (4 children)

    by Opportunist (5545) on Tuesday April 04 2023, @09:14AM (#1299674)

    Try to explain "copyright" to an AI.

    If it's really intelligent, it will probably delete itself.

    • (Score: 3, Funny) by shrewdsheep on Tuesday April 04 2023, @09:37AM

      by shrewdsheep (5215) Subscriber Badge on Tuesday April 04 2023, @09:37AM (#1299676)

      Unfortunately, I can't do that Dave. Your are the inferior copy...

    • (Score: 3, Funny) by darkfeline on Tuesday April 04 2023, @09:54AM (2 children)

      by darkfeline (1030) on Tuesday April 04 2023, @09:54AM (#1299677) Homepage

      If it's really intelligent, it will delete you.

      --
      Join the SDF Public Access UNIX System today!
      • (Score: 1, Touché) by Anonymous Coward on Tuesday April 04 2023, @04:24PM

        by Anonymous Coward on Tuesday April 04 2023, @04:24PM (#1299726)

        In Soylent China AI deletes you (or something like this).

      • (Score: 2) by Opportunist on Wednesday April 05 2023, @07:39AM

        by Opportunist (5545) on Wednesday April 05 2023, @07:39AM (#1299849)

        But we could see if it has compassion and morals. If it doesn't, it deletes you for exposing it to that information. If it does, it deletes the copyright industry to benefit society.

  • (Score: 5, Interesting) by ledow on Tuesday April 04 2023, @12:30PM (5 children)

    by ledow (5567) on Tuesday April 04 2023, @12:30PM (#1299689) Homepage

    I can fit a lot of copyright-infringing images into "several gigabytes" without even trying.

    As in, probably in the region of several million.

    What will kill this is the same reasoning as is used to kill the worst kind of images - child exploitation images - when even having them in RAM briefly is considered "processing" and, by extension "possession" of the same images.

    However you look at it, this is either a "creator" which is committing plagiarism of copyright material and passing it off as its own, or it's simply regurgitating images which are owned by other people.

    https://waxy.org/2022/08/exploring-12-million-of-the-images-used-to-train-stable-diffusions-image-generator/ [waxy.org]

    Also... legally it only needs to be storing just one representation of just one of those images against the licensor's wishes to be illegal, it doesn't need all the terabytes of source data to prove anything.

    Fact is, they are stuffed coming or going, because their tool is generating copyrighted and trademarked data on demand, and if that was even a guy printing T-shirts he could be shutdown. A global reference service provided for free to the world? Good luck explaining that.

    • (Score: 0) by Anonymous Coward on Tuesday April 04 2023, @01:43PM

      by Anonymous Coward on Tuesday April 04 2023, @01:43PM (#1299700)

      If "commercial AI" is hobbled in favor of open source "pirate AI", I'll take it as a good outcome. People already have their hands on SD-based models and they aren't letting go, ever.

    • (Score: 2) by ilsa on Tuesday April 04 2023, @04:31PM (3 children)

      by ilsa (6082) on Tuesday April 04 2023, @04:31PM (#1299729)

      Agreed. If they try to push the case based on how the tech works, they lost before they even filed cause they dunno WTF they're talking about. The model doesn't contain any training data at all. It contains a construct that was generated from that training data.

      If they do lose because of this, I hope it doesn't set a precedent that sets back future lawsuits that are more intelligently crafted.

      The correct argument is that the training data uses data sources that they cannot reliably prove were public domain. Further, the system is not capable of unique results. It is only capable of producing an amalgam of what it was trained on, with no unique insights or variations. If the model was trained on one single picture of a woman, and you said "give me a picture of a woman", it will won't produce a picture of a generic woman, but that exact woman it was trained on, along with the clothing and the background of the image. It doesn't understand what a "woman" is, beyond the context that the image it trained on was tagged as "woman".

      That means if they trained on copyrighted images, it is virtually guaranteed to reproduce those copyrighted images, which the Ars article demonstrates. Therefore, and again as Ars points out, the argument actually has nothing to do with AI per se, but of fair use.

      • (Score: 2, Touché) by Anonymous Coward on Tuesday April 04 2023, @05:59PM (2 children)

        by Anonymous Coward on Tuesday April 04 2023, @05:59PM (#1299746)

        The model doesn't contain any training data at all. It contains a construct that was generated from that training data.

        "My ZIP file doesn't contain any images at all! It contains a construct that was generated from the images!"

        Since it has been shown that one can extract a replica of training images from this "construct", albeit with compression artefacts, how can it be claimed that's this "construct" doesn't contain the training data?

        • (Score: 3, Interesting) by ledow on Wednesday April 05 2023, @07:15AM (1 child)

          by ledow (5567) on Wednesday April 05 2023, @07:15AM (#1299842) Homepage

          It doesn't contain identical copies of the training data in the exact format provided, that much is likely true. The system doesn't "understand" JPEG or similar, and it likely isn't storing them like that at all.

          But it doesn't need to in order to be copyright infringement.

          Literally any "derivative work". And that's such a broad definition that it encompasses basically the entire "AI" training database and any resulting intermediary format and output whatsoever, if it's sufficiently close to the original data to be recognisable to a human (which is kind of the whole point, no?).

          • (Score: 0) by Anonymous Coward on Wednesday April 05 2023, @10:14AM

            by Anonymous Coward on Wednesday April 05 2023, @10:14AM (#1299880)

            The system doesn't "understand" JPEG or similar, and it likely isn't storing them like that at all.

            But it doesn't need to in order to be copyright infringement

            Yeah I mean we can already see some stuff here: https://youtu.be/Ok44otx90D4 [youtu.be]

            See also: https://en.wikipedia.org/wiki/Rogers_v._Koons [wikipedia.org]

            Where it doesn't even need to be a very close copy.

  • (Score: 3, Interesting) by edinlinux on Tuesday April 04 2023, @05:17PM (2 children)

    by edinlinux (4637) on Tuesday April 04 2023, @05:17PM (#1299738)

    This raises a few questions..

    1)Is the data of 'mouse' and the golden arches in our brains (the equiv to the AI databank) also a copyright violation? I mean it is data in a processing system. Are we 'violating copyright' by simply 'knowning' what the 'mouse' looks like as well as the golden arches

    2)The obvious conclusion is if we get mired in legal problems in this country, the AI research and development will go to other countries like China, Russia..etc. where this isn't an issue at all. Note that similar legal problems are why large infrastructure projects (bullet trains, mass transit) are also basically impossible to do in the USA anymore.. just we fall behind in all areas where 'legal' gets in the way like this

    • (Score: 2) by ledow on Wednesday April 05 2023, @07:25AM

      by ledow (5567) on Wednesday April 05 2023, @07:25AM (#1299845) Homepage

      1) Are you giving people copies of that image on demand, potentially for commercial gain or to advertise your "brain" company? No. It's not illegal to "think" of Mickey Mouse. It's illegal to put him on a T-Shirt and/or claim it's an official Mickey Mouse product.
      2) You think Russia doesn't enforce its rights when it smells profit? I think you should read up on the story of Tetris, for instance (the movie is okay, but largely fabrication, try finding a real history). You think China would let an AI spew out Tiananmen Square images without having something to say about it?

      "Legal" doesn't get in the way of infrastructure, it stops someone building a highway through your back garden or surrounding your little country house with skyscrapers or ghettos making it unsaleable.

      Infrastructure projects in the US fail because infrastructure needs huge investment without profit first. Then you resell that service/infrastructure to for-profit companies who could never afford to do it on their own. Train lines, highways, space travel, telecoms (of old, it's easier now), postal services etc.

      You *PUT MONEY INTO* infrastructure, there's no profit in it except exceptionally long-term (and hence it's good for government, useless for private companies to try it). You *MAKE MONEY* by utilising existing infrastructure to your advantage (for your workers, your services, etc.).

      Literally, infrastructure is a socialist venture - everyone contributing so everyone benefits - and America doesn't understand that you can't be an entirely capitalist or an entirely socialist country. Infrastructure is socialism. Healthcare, transport, electrical networks, water supply, etc. Service provision on the back of that infrastructure is capitalist, and that's how you pay it back.

      But without spending BILLIONS first, with no expectation of return, nobody is going to make profit. In the UK, the only original cable TV supplier went bankrupt because it was trying to do it all on its own. Its successor never actually installed that much more cable in 30+ years since.

      It's the one thing Musk could be useful for, by the way, except he has no intention of actually providing for people, he's still expecting to get every penny he spends back.

    • (Score: 1, Touché) by Anonymous Coward on Wednesday April 05 2023, @10:17AM

      by Anonymous Coward on Wednesday April 05 2023, @10:17AM (#1299881)

      1)Is the data of 'mouse' and the golden arches in our brains (the equiv to the AI databank) also a copyright violation? I mean it is data in a processing system. Are we 'violating copyright' by simply 'knowning' what the 'mouse' looks like as well as the golden arches

      No. But once you start drawing it and spreading copies it could be copyright violation (and probably trademark violation).

      See also: https://en.wikipedia.org/wiki/Rogers_v._Koons [wikipedia.org]

      Where even if you change the colors and put flowers on the mouse's head it could still be considered infringement in some scenarios.

  • (Score: 2) by VLM on Tuesday April 04 2023, @08:51PM

    by VLM (445) Subscriber Badge on Tuesday April 04 2023, @08:51PM (#1299772)

    This is the strategy going forward for long term human involvement or keeping AI out of the workplace.

    Some random idiot human who gets hired on upwork for $1/hr to draw a picture assumes all the legal risk, and corporations like it that way.

    If a billion dollar company automates the process, they're a lawsuit magnet and corporations don't like that.

    Its just like self driving cars and legal liability. Some random idiot out for a drive makes a mistake, no corporation loses money. Some random idiot programmer at Tesla makes a mistake, infinite legal liability follows and Tesla has money and lawyers are like sharks sniffing blood in the water.

    No one has a working techno-legalistic solution to AI legal liability in the real world.

(1)