Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 14 submissions in the queue.
posted by janrinok on Friday January 06 2023, @03:51PM   Printer-friendly
from the 17-USC-§§-1201-1205 dept.

As projected here back in October, there is now a class action lawsuit, albeit in its earliest stages, against Microsoft over its blatant license violation through its use of the M$ GitHub Copilot tool. The software project, Copilot, strips copyright licensing and attribution from existing copyrighted code on an unprecedented scale. The class action lawsuit insists that machine learning algorithms, often marketed as "Artificial Intelligence", are not exempt from copyright law nor are the wielders of such tools.

The $9 billion in damages is arrived at through scale. When M$ Copilot rips code without attribution and strips the copyright license from it, it violates the DMCA three times. So if olny 1% of its 1.2M users receive such output, the licenses were breached 12k times with translates to 36k DMCA violations, at a very low-ball estimate.

"If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000," the litigants stated.

Besides open-source licenses and DMCA (§ 1202, which for­bids the removal of copy­right-man­age­ment infor­ma­tion), the lawsuit alleges violation of GitHub's terms of ser­vice and pri­vacy poli­cies, the Cal­i­for­nia Con­sumer Pri­vacy Act (CCPA), and other laws.

The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– Fraud.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Negligence.
– Civil conspiracy.
– Declaratory relief.

Furthermore, these actions are contrary to what GitHub stood for prior to its sale to M$ and indicate yet another step in ongoing attempts by M$ to undermine and sabotage Free and Open Source Software and the supporting communities.

Previously:
(2022) GitHub Copilot May Steer Microsoft Into a Copyright Lawsuit
(2022) Give Up GitHub: The Time Has Come!
(2021) GitHub's Automatic Coding Tool Rests on Untested Legal Ground


Original Submission

Related Stories

GitHub’s Automatic Coding Tool Rests on Untested Legal Ground 73 comments

GitHub’s automatic coding tool rests on untested legal ground:

The Copilot tool has been trained on mountains of publicly available code

[...] When GitHub announced Copilot on June 29, the company said that the algorithm had been trained on publicly available code posted to GitHub. Nat Friedman, GitHub’s CEO, has written on forums like Hacker News and Twitter that the company is legally in the clear. “Training machine learning models on publicly available data is considered fair use across the machine learning community,” the Copilot page says.

But the legal question isn’t as settled as Friedman makes it sound — and the confusion reaches far beyond just GitHub. Artificial intelligence algorithms only function due to massive amounts of data they analyze, and much of that data comes from the open internet. An easy example would be ImageNet, perhaps the most influential AI training dataset, which is entirely made up of publicly available images that ImageNet creators do not own. If a court were to say that using this easily accessible data isn’t legal, it could make training AI systems vastly more expensive and less transparent.

Despite GitHub’s assertion, there is no direct legal precedent in the US that upholds publicly available training data as fair use, according to Mark Lemley and Bryan Casey of Stanford Law School, who published a paper last year about AI datasets and fair use in the Texas Law Review.

[...] And there are past cases to support that opinion, they say. They consider the Google Books case, in which Google downloaded and indexed more than 20 million books to create a literary search database, to be similar to training an algorithm. The Supreme Court upheld Google’s fair use claim, on the grounds that the new tool was transformative of the original work and broadly beneficial to readers and authors.

Microsoft’s GitHub Copilot Met with Backlash from Open Source Copyright Advocates:

Give Up GitHub: The Time Has Come! 51 comments

From Software Freedom Conservancy

Those who forget history often inadvertently repeat it. Some of us recall that twenty-one years ago, the most popular code hosting site, a fully Free and Open Source (FOSS) site called SourceForge, proprietarized all their code — never to make it FOSS again. Major FOSS projects slowly left SourceForge since it was now, itself, a proprietary system, and antithetical to FOSS. FOSS communities learned that it was a mistake to allow a for-profit, proprietary software company to become the dominant FOSS collaborative development site.

SourceForge slowly collapsed after the DotCom crash, and today, SourceForge is more advertising link-bait than it is code hosting. We learned a valuable lesson that was a bit too easy to forget — especially when corporate involvement manipulates FOSS communities to its own ends. We now must learn the SourceForge lesson again with Microsoft's GitHub.

GitHub has, in the last ten years, risen to dominate FOSS development. They did this by building a user interface and adding social interaction features to the existing Git technology. (For its part, Git was designed specifically to make software development distributed without a centralized site.) In the central irony, GitHub succeeded where SourceForge failed: they have convinced us to promote and even aid in the creation of a proprietary system that exploits FOSS. GitHub profits from those proprietary products (sometimes from customers who use it for problematic activities).

Specifically, GitHub profits primarily from those who wish to use GitHub tools for in-house proprietary software development. Yet, GitHub comes out again and again seeming like a good actor — because they point to their largess in providing services to so many FOSS endeavors. But we've learned from the many gratis offerings in Big Tech: if you aren't the customer, you're the product. The FOSS development methodology is GitHub's product, which they've proprietarized and repackaged with our active (if often unwitting) help.

GitHub Copilot May Steer Microsoft Into a Copyright Lawsuit 7 comments

GitHub Copilot may steer Microsoft into a copyright lawsuit:

GitHub Copilot – a programming auto-suggestion tool trained from public source code on the internet – has been caught generating what appears to be copyrighted code, prompting an attorney to look into a possible copyright infringement claim.

On Monday, Matthew Butterick, a lawyer, designer, and developer, announced he is working with Joseph Saveri Law Firm to investigate the possibility of filing a copyright claim against GitHub. There are two potential lines of attack here: is GitHub improperly training Copilot on open source code, and is the tool improperly emitting other people's copyrighted work – pulled from the training data – to suggest code snippets to users?

Butterick has been critical of Copilot since its launch. In June he published a blog post arguing that "any code generated by Copilot may contain lurking license or IP violations," and thus should be avoided.

That same month, Denver Gingerich and Bradley Kuhn of the Software Freedom Conservancy (SFC) said their organization would stop using GitHub, largely as a result of Microsoft and GitHub releasing Copilot without addressing concerns about how the machine-learning model dealt with different open source licensing requirements.

Copilot's capacity to copy code verbatim, or nearly so, surfaced last week when Tim Davis, a professor of computer science and engineering at Texas A&M University, found that Copilot, when prompted, would reproduce his copyrighted sparse matrix transposition code.

Asked to comment, Davis said he would prefer to wait until he has heard back from GitHub and its parent Microsoft about his concerns.

Netflix Stirs Fears by Using AI-Assisted Background Art in Short Anime Film 15 comments

https://arstechnica.com/information-technology/2023/02/netflix-taps-ai-image-synthesis-for-background-art-in-the-dog-and-the-boy/

Over the past year, generative AI has kicked off a wave of existential dread over potential machine-fueled job loss not seen since the advent of the industrial revolution. On Tuesday, Netflix reinvigorated that fear when it debuted a short film called Dog and Boy that utilizes AI image synthesis to help generate its background artwork.

Directed by Ryotaro Makihara, the three-minute animated short follows the story of a boy and his robotic dog through cheerful times, although the story soon takes a dramatic turn toward the post-apocalyptic. Along the way, it includes lush backgrounds apparently created as a collaboration between man and machine, credited to "AI (+Human)" in the end credit sequence.

[...] Netflix and the production company WIT Studio tapped Japanese AI firm Rinna for assistance with generating the images. They did not announce exactly what type of technology Rinna used to generate the artwork, but the process looks similar to a Stable Diffusion-powered "img2img" process than can take an image and transform it based on a written prompt.

Related:
ChatGPT Can't be Credited as an Author, Says World's Largest Academic Publisher
90% of Online Content Could be 'Generated by AI by 2025,' Expert Says
Getty Images Targets AI Firm For 'Copying' Photos
Controversy Erupts Over Non-consensual AI Mental Health Experiment
Microsoft's New AI Can Simulate Anyone's Voice With Three Seconds of Audio
AI Everything, Everywhere
Microsoft, GitHub, and OpenAI Sued for $9B in Damages Over Piracy
Adobe Stock Begins Selling AI-Generated Artwork
AI Systems Can't Patent Inventions, US Federal Circuit Court Confirms


Original Submission

Robots Let ChatGPT Touch the Real World Thanks to Microsoft 15 comments

https://arstechnica.com/information-technology/2023/02/robots-let-chatgpt-touch-the-real-world-thanks-to-microsoft/

Last week, Microsoft researchers announced an experimental framework to control robots and drones using the language abilities of ChatGPT, a popular AI language model created by OpenAI. Using natural language commands, ChatGPT can write special code that controls robot movements. A human then views the results and adjusts as necessary until the task gets completed successfully.

The research arrived in a paper titled "ChatGPT for Robotics: Design Principles and Model Abilities," authored by Sai Vemprala, Rogerio Bonatti, Arthur Bucker, and Ashish Kapoor of the Microsoft Autonomous Systems and Robotics Group.

In a demonstration video, Microsoft shows robots—apparently controlled by code written by ChatGPT while following human instructions—using a robot arm to arrange blocks into a Microsoft logo, flying a drone to inspect the contents of a shelf, or finding objects using a robot with vision capabilities.

To get ChatGPT to interface with robotics, the researchers taught ChatGPT a custom robotics API. When given instructions like "pick up the ball," ChatGPT can generate robotics control code just as it would write a poem or complete an essay. After a human inspects and edits the code for accuracy and safety, the human operator can execute the task and evaluate its performance.

In this way, ChatGPT accelerates robotic control programming, but it's not an autonomous system. "We emphasize that the use of ChatGPT for robotics is not a fully automated process," reads the paper, "but rather acts as a tool to augment human capacity."

New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement 51 comments

New York Times Sues Microsoft, ChatGPT Maker OpenAI Over Copyright Infringement

The New York Times on Wednesday filed a lawsuit against Microsoft and OpenAI, the company behind popular AI chatbot ChatGPT, accusing the companies of creating a business model based on "mass copyright infringement," stating their AI systems "exploit and, in many cases, retain large portions of the copyrightable expression contained in those works:"

Microsoft both invests in and supplies OpenAI, providing it with access to the Redmond, Washington, giant's Azure cloud computing technology.

The publisher said in a filing in the U.S. District Court for the Southern District of New York that it seeks to hold Microsoft and OpenAI to account for the "billions of dollars in statutory and actual damages" it believes it is owed for the "unlawful copying and use of The Times's uniquely valuable works."

[...] The Times said in an emailed statement that it "recognizes the power and potential of GenAI for the public and for journalism," but added that journalistic material should be used for commercial gain with permission from the original source.

"These tools were built with and continue to use independent journalism and content that is only available because we and our peers reported, edited, and fact-checked it at high cost and with considerable expertise," the Times said.

This discussion was created by janrinok (52) for logged-in users only, but now has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
(1)
  • (Score: 0, Insightful) by Rosco P. Coltrane on Friday January 06 2023, @04:15PM (19 children)

    by Rosco P. Coltrane (4757) on Friday January 06 2023, @04:15PM (#1285487)

    its use of the M$ GitHub Copilot tool

    Dude... Saying M$ was cool amongst hip teens 20 years sgo. Using it now only pegs you as immature. Grow up.

    • (Score: 3, Informative) by Anonymous Coward on Friday January 06 2023, @04:29PM (3 children)

      by Anonymous Coward on Friday January 06 2023, @04:29PM (#1285491)

      Sure. But at the same time, for SN readers, M$ disambiguates from "MS" which is used in other ways...

      • (Score: 3, Insightful) by Rosco P. Coltrane on Friday January 06 2023, @05:02PM (2 children)

        by Rosco P. Coltrane (4757) on Friday January 06 2023, @05:02PM (#1285500)

        Yeah. Or you could write Microsoft, to resolve the potential confusion.

        • (Score: 5, Funny) by DannyB on Friday January 06 2023, @05:27PM (1 child)

          by DannyB (5839) Subscriber Badge on Friday January 06 2023, @05:27PM (#1285506) Journal

          Look, it is simple. There are two kinds of MS.

          1. An affliction suffered by millions which makes even the simplest tasks difficult.

          2. A medical condition.

          How to fix pylint error: modify pylint until it no longer complains. Rinse. Repeat.

          --
          The most difficult part of the art of fencing is digging the holes and carrying the fence posts.
    • (Score: 4, Insightful) by sjames on Friday January 06 2023, @05:02PM (7 children)

      by sjames (2882) on Friday January 06 2023, @05:02PM (#1285499) Journal

      Is it really that huge a deal? In addition to disambiguating from the academic achievement and the degenerative disease, it's a long-time nickname that is instantly recognizable and whose commentary is still on point.

      Side note, disambiguating from the degenerative disease may be a sort of false distinction :-)

      • (Score: 1, Redundant) by Rosco P. Coltrane on Friday January 06 2023, @05:08PM (6 children)

        by Rosco P. Coltrane (4757) on Friday January 06 2023, @05:08PM (#1285501)

        No disambiguation needed. TFA is about Microsoft. The abbreviation "MS" in the article - if you really insist on shortening the word "Microsoft" for some reason - can in no way refer to multiple sclerosis. It would only need disambiguation if the article was about Microsoft curing multiple sclerosis or funding multiple sclerosis research. It isn't the case here.

        At any rate, in 2023, when you style it as M$, you sound immature.

        • (Score: 3, Insightful) by HiThere on Friday January 06 2023, @05:59PM

          by HiThere (866) Subscriber Badge on Friday January 06 2023, @05:59PM (#1285513) Journal

          I accept that it has that meaning to you. I didn't even notice it until it got pointed out, the usage is so common.

          --
          Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 4, Funny) by Anonymous Coward on Friday January 06 2023, @07:08PM (2 children)

          by Anonymous Coward on Friday January 06 2023, @07:08PM (#1285532)

          Thank you gramps for scolding those darned kids. Your harsh and demeaning words have helped make the world a better place.

          • (Score: 3, Funny) by Gaaark on Saturday January 07 2023, @03:41AM (1 child)

            by Gaaark (41) on Saturday January 07 2023, @03:41AM (#1285601) Journal

            HEY! Get off his lawwwwwnnnn.... mumble mumble can't find my hanky....

            --
            --- Please remind me if I haven't been civil to you: I'm channeling MDC. ---Gaaark 2.0 ---
            • (Score: 0) by Anonymous Coward on Sunday January 08 2023, @12:41AM

              by Anonymous Coward on Sunday January 08 2023, @12:41AM (#1285747)

              It's that sticky dripping thing in your, oh nevermind.

        • (Score: 3, Insightful) by Ingar on Saturday January 07 2023, @11:07AM (1 child)

          by Ingar (801) on Saturday January 07 2023, @11:07AM (#1285658) Homepage Journal

          At any rate, in 2023, when you style it as M$, you sound immature.

          If in 2023, you're still not styling it M$, you obviously missed what Microsoft has been doing the past three decades.

          • (Score: 3, Insightful) by RS3 on Sunday January 08 2023, @12:46AM

            by RS3 (6367) on Sunday January 08 2023, @12:46AM (#1285748)

            I didn't want to engage this any more, but since you broke the ice, exactly my thoughts: "M$" references Microsoft being greedy controlling a-holes. Point being, sometimes they seem to be pretty okay, but I've learned to be very skeptical, maybe moreso cynical of them, as this whole GitHub thing bears out.

    • (Score: 4, Interesting) by canopic jug on Friday January 06 2023, @06:47PM (2 children)

      by canopic jug (3949) Subscriber Badge on Friday January 06 2023, @06:47PM (#1285526) Journal

      Among the other reasons, some of which are covered above, it is a nod to M$ Team99 and its successors which used their own custom web spiders to crawl forums and detect the use of the string "Microsoft", case-insensitive. IF you avoided that string, then you had a few hours or unimpaired dialog on whatever topic before they came along anyway. However, if someone was foolish enough to use the string naming the Beast of Redmond directly, then astroturfers and shills swooped in within minutes and filled the thread with trolling up to and including gay and or coprophillic porn text, whatever it took to drag the threads off topic -- including whining about the string "m$" too.

      Their teams, whatever they are called now, are sneakier than ever and still doing what they can to disrupt use of FOSS and Open Standards or even discussion of either topic.\

      If you're going to troll about the summary, why not pick on the typos or the bad math which looks to be off by three orders of magnitude?

      Either way, you have some companies hiding behind algorithms which they set up and are using to strip both licensing information and author attribution, both of which are DMCA violations. Can we sic the Business Software Alliance after its master? Or will it not bite the hand that feeds it?

      --
      Money is not free speech. Elections should not be auctions.
      • (Score: -1, Troll) by Anonymous Coward on Sunday January 08 2023, @05:10AM (1 child)

        by Anonymous Coward on Sunday January 08 2023, @05:10AM (#1285787)

        So anyone who uses "M$" is an incel.

        • (Score: -1, Troll) by Anonymous Coward on Sunday January 08 2023, @06:17AM

          by Anonymous Coward on Sunday January 08 2023, @06:17AM (#1285794)

          That was supposed to be sarcastic humor. Oh well. Whoosh it is.

    • (Score: 2, Insightful) by Runaway1956 on Friday January 06 2023, @07:24PM

      by Runaway1956 (2926) Subscriber Badge on Friday January 06 2023, @07:24PM (#1285535) Journal

      No one confuses M$ with for an article about multiple sclerosis. M$ is a quite fitting designator for Evil Corp.

      --
      ‘Never trust a man whose uncle was eaten by cannibals’
    • (Score: 0) by Anonymous Coward on Friday January 06 2023, @07:27PM (2 children)

      by Anonymous Coward on Friday January 06 2023, @07:27PM (#1285536)

      Ain't you caught them pesky Duke boys yet?

      • (Score: 3, Funny) by Anonymous Coward on Friday January 06 2023, @08:37PM (1 child)

        by Anonymous Coward on Friday January 06 2023, @08:37PM (#1285547)

        He's pissed off that they keep jumping the General Lee over him and making him look like an idiot trifler.

        • (Score: 0) by Anonymous Coward on Sunday January 08 2023, @07:59PM

          by Anonymous Coward on Sunday January 08 2023, @07:59PM (#1285851)

          shoulda wrote "trifling idiot"

  • (Score: 5, Interesting) by looorg on Friday January 06 2023, @04:35PM (7 children)

    by looorg (578) on Friday January 06 2023, @04:35PM (#1285492)

    Just imagine how long and complex the license text would be if they did somehow manage to attribute everything to everyone that their AI "learned" from. It's not that it should be that hard -- it should remember or could take note where it borrowed all the code-snippets from. But it's not like most people would read it anyway and if they had to click anything they would just click the big OK/Accept button.

    Still it's amusing now that they are using the piracy-math on how many violations there have been, even tho they are clearly low-balling it instead of high-balling it like copyright lawyers for various entertainment-outfits do.

    That said if Copilot doesn't attribute then shouldn't the same be said for more or less ANY "AI" out there? ChatGPT? That even apparently now then write papers and everything and it doesn't cite or attribute anything to anyone. It should instantly classify anything it produces as plagiarism and be a giant lawsuit waiting to happen. How about the "AI" that draws pictures, composes music, creates medicines etc? Seems AI-tech is basically piracy-tech as it ripsoff everything and everyone and shares non of the glory and credits with anyone, possibly with the exception of the one that made them. But fucks everyone that provided the learning data over completely.

    • (Score: 4, Insightful) by Rich on Friday January 06 2023, @05:17PM

      by Rich (945) on Friday January 06 2023, @05:17PM (#1285502) Journal

      It's the same for search engines. Search engines make a local copy of the web (remember "Google Cache")? This is fundamentally required, because a.) the indexing needs to do a differential update to the word indexes and b.) search needs to access the sequence for sentence matching. Yet, when the first search engined (including Google) came up, copyright law did not allow for that. Not that anyone cared (or even understood), yet there were even discussions whether mere copying something into the browser cache might be illegal. But basically they didn't care about any licence and just grabbed everything accessible and copied it to make a profit. "robots.txt" only came after the fact.

      The saying goes something like "when you do something borderline that's novel, you infringe the law, but of big money does the same, it becomes the law."

    • (Score: 2) by HiThere on Friday January 06 2023, @06:08PM (5 children)

      by HiThere (866) Subscriber Badge on Friday January 06 2023, @06:08PM (#1285516) Journal

      On the one hand, this suit looks absolutely proper by the existing laws, and like if cases were decided on their merits it should win.
      On the other hand, the actions being protested look perfectly reasonable.

      So, to me, this is more an argument that the existing copyright law should be totally thrown out. And I rather agree. Go back to the copyright laws if 1823, and then update as needed. But explicitly include a working definition of "fair use" so it isn't just "whatever the lawyers can convince the court is reasonable". (I would have said "1723", but the US didn't exist at that point, and the worst changes happened after the civil war. But maybe 1800 would be a better date.)

      --
      Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
      • (Score: 5, Insightful) by canopic jug on Friday January 06 2023, @06:57PM (4 children)

        by canopic jug (3949) Subscriber Badge on Friday January 06 2023, @06:57PM (#1285531) Journal

        On the other hand, the actions being protested look perfectly reasonable.

        How is taking copyrighted code and removing references to both the author and the original license ok? It is ok with a machine learning algorithm, would it still be ok with another algorithm like a complex regex with a few conditional statements thrown in? What about hiring unskilled staff that don't understand the code but can read enough to strip both attribution and licensing information? Where is the line drawn?

        --
        Money is not free speech. Elections should not be auctions.
        • (Score: 2) by HiThere on Friday January 06 2023, @09:40PM

          by HiThere (866) Subscriber Badge on Friday January 06 2023, @09:40PM (#1285567) Journal

          The idea that there should be a line is a real problem, but learning, whether human or machine, involves processing lots and lots of stuff, and extracting what is deemed useful. How can you argue against this and yet support spelling bees. Some of those words being spelled are less than 30 years old, so SOMEBODY invented the spelling.

          Consider, e.g., the program "Hello, World.". There are lots of variations of it printed, Should you need to respect the copyrights of the authors? But most of them are trivial variations of the original, so maybe only the original should be deemed worthy of copyright. But this would mean that you couldn't illustrate a new language by doing a translation of "Hello, World" into that language.

          Basically, what I think ChatGPT *should* be able to argue is that it's productions are functional, and therefore not deserving of copyright. But that kind of argument requires expensive lawyers, and you've got to be ready to pay for appeals. It would be much better if copyright clearly didn't cover that. And didn't cover anything over 20 years old. (Including renewals! Make it 10 years of copyright and up to two renewals for 5 years each, and they've got to be a continuous span of time.)

          --
          Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
        • (Score: 2) by wisnoskij on Saturday January 07 2023, @04:28AM (2 children)

          by wisnoskij (5149) <jonathonwisnoskiNO@SPAMgmail.com> on Saturday January 07 2023, @04:28AM (#1285611)

          How do you think human programs do their job?

          I went to school for years to have professors write code on the board for me to copy without attribution.

          • (Score: 4, Insightful) by maxwell demon on Saturday January 07 2023, @06:38AM (1 child)

            by maxwell demon (1608) on Saturday January 07 2023, @06:38AM (#1285627) Journal

            But your professor carefully selected the code he wrote on the blackboard. He didn't copy some random code from the internet without looking at its copyright.

            --
            The Tao of math: The numbers you can count are not the real numbers.
            • (Score: 0) by Anonymous Coward on Saturday January 07 2023, @02:54PM

              by Anonymous Coward on Saturday January 07 2023, @02:54PM (#1285683)

              He didn't copy some random code from the internet without looking at its copyright.

              AND if he did, he could get sued for it.

              So if Microsoft is illegally copying random code without permission and getting sued for it, then I'm all for them getting sued.

              That said I'm the one who wants people in Microsoft to be imprisoned for "upgrading" people's Windows machines to Windows 10 without their permission. After all I'd probably end up in prison if I did a similar thing to hundreds or thousands of other people's PCs.

  • (Score: 4, Interesting) by sjames on Friday January 06 2023, @04:59PM (14 children)

    by sjames (2882) on Friday January 06 2023, @04:59PM (#1285497) Journal

    Just what we need, another lawyer more than happy to support a greedy or butthurt individual who doesn't even understand what they're suing over.

    This is just another take of the patent trend years ago of claiming that 'on a computer' makes an old thing entirely novel. AI learns from existing things, much like human students. If I write a book, I have done so based on learning from the many books I have read, but I have not violated copyright. Neither have the various art generators or AI text tools, or CoPilot.

    It's one thing for a layman to mis-understand and believe the things store everything they learned from and do a cut-paste job, UNLESS they want to take legal action based on that woeful mis-understanding.

    This is right up there with idiot corporations claiming #include "stdio.h" is a copyright violation.

    I say this as an anything but a fan of MS.

    Note that in practice, CoPilot doesn't seem to be all that good (even marketing only claims it to produce the right code 50% of the time) and it sounds like even an improved version is a great way to introduce a nearly ubiquitous exploitable security flaw.

    • (Score: 3, Interesting) by shrewdsheep on Friday January 06 2023, @05:24PM (6 children)

      by shrewdsheep (5215) on Friday January 06 2023, @05:24PM (#1285505)

      Your point is well taken and I tend to agree. Just to point out the counter-argument: Language models tend to be so good because they are in part nearest-neighbor learners. They literally copy portions from the training data over to the output which can be deduced from mistakes they make. Certainly this argument is not water-tight as there is no investigation (to my knowledge) showing how widespread this behavior is for any response, but it can certainly be posited that this behavior does happen. I am actually surprised RMS has not yet jumped up to claim all auto-pilot code is GPL.

      • (Score: 3, Interesting) by sjames on Friday January 06 2023, @05:43PM (5 children)

        by sjames (2882) on Friday January 06 2023, @05:43PM (#1285510) Journal

        I have heard the counter-argument, but both programming and natural language are filled with pat phrases. For example, I'll bet you read "Your point is well taken" at some point before the first time you said or wrote it. That's not an accusation, it's just how language works. It's also to be expected of a synthetic neural network.

        The person who wrote the code that CoPilot learned from probably did so because their own naturally occurring neural net distilled down many variants that it was exposed to and identified that particular variation as a canonical phrase.

        I'm almost 100% certain that I have written a line of code at some point in my career that was identical to a line someone else wrote and that neither of us is aware of it, simply because it was a good concise way to express the thought.

        • (Score: 2, Interesting) by shrewdsheep on Friday January 06 2023, @05:52PM (4 children)

          by shrewdsheep (5215) on Friday January 06 2023, @05:52PM (#1285512)

          Indeed one line wouldn't qualify and it would be impossible to find the primordial line anyhow. This is precisely what will be tested in court: how many lines qualify as plagiarism. For me, it would be about 10 lines of code but it is really statistics: how many lines make a unique snippet of code.

          • (Score: 4, Interesting) by HiThere on Friday January 06 2023, @06:13PM

            by HiThere (866) Subscriber Badge on Friday January 06 2023, @06:13PM (#1285518) Journal

            It probably will end up being something that stupid, but that's a really stupid measure of whether anything significant was copied. And even more of whether anything significant that was original to the author was copied.

            And if it's successful, Knuth can sue every programmer in existence for their entire worth.

            --
            Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
          • (Score: 1, Informative) by Anonymous Coward on Friday January 06 2023, @06:44PM

            by Anonymous Coward on Friday January 06 2023, @06:44PM (#1285524)

            > about 10 lines of code

            Or about a half-line of APL...

          • (Score: 3, Interesting) by turgid on Friday January 06 2023, @09:30PM

            by turgid (4318) Subscriber Badge on Friday January 06 2023, @09:30PM (#1285564) Journal

            Something that troubles me is the concept of accessors in OOP languages, getters and setters. They're boiler plate, yet they consume many lines of code. Does copying one of those constitute plagiarism? My next questions is "What have we been doing for the last 40 years?" Why don't OOP languages provide operators for this purpose? Why are we writing this code by hand?

          • (Score: 3, Interesting) by RS3 on Sunday January 08 2023, @05:27AM

            by RS3 (6367) on Sunday January 08 2023, @05:27AM (#1285790)

            All this is making me wonder: are they looking at source code? Or object / binary executable? Cause you could copy binary, toss in some nops here and there, recalculate checksum, and I think it'd be difficult to detect? But make the source look very different. I dunno, it's a mess; there are no easy or simple answers.

    • (Score: 2) by SomeRandomGeek on Friday January 06 2023, @06:01PM (1 child)

      by SomeRandomGeek (856) on Friday January 06 2023, @06:01PM (#1285514)

      You need to think about it legally rather than technically. I would like to hear Microsoft's legal theory of the case. Do they think that an AI trained from data is not a derived work of that data? Do they think that their AI is based on so many different pieces of stolen code that attribution can't be traced to any one source and therefore they don't owe anyone anything? Did they hide a license to do this in the GitHub ToS? Do they think they have deeper pockets and they can wait out any challenge regardless of its validity? The details matter. Legal stuff is funny that way.

      • (Score: 4, Insightful) by sjames on Friday January 06 2023, @06:27PM

        by sjames (2882) on Friday January 06 2023, @06:27PM (#1285520) Journal

        How about if an AI trained on publicly available source code is infringing copyright than so is literally every single human programmer who has ever lived except Lady Ada. Extending to other copyrightable works, it's copyright infringement all the way down.

        And since that is literally the only known way of learning to write or code, we can either return to the caves (but NO DRAWING!) or decide perhaps this is not actually copyright violation.

        As an amusing note, just imagine how many pat legal phrases have been ruthlessly copied without attribution in the court filing!

    • (Score: 2) by mcgrew on Friday January 06 2023, @08:40PM (4 children)

      by mcgrew (701) <publish@mcgrewbooks.com> on Friday January 06 2023, @08:40PM (#1285549) Homepage Journal

      Yet another reason I don't believe that computer code should be covered by copyright at all. That despite (because of?) the fact that I registered two copyrights for computer programs four decades ago, neither of which can be run on any existing computer today. I'll still hold those copyrights ninety five years after I'm buried.

      You can't copyright a food recipe, and a computer program is a recipe of sorts, unlike a book or a painting. You can't copyright a dance. I've never seen any reason, let alone a good one, why computer programs should be covered by copyright.

      Why not patents? I will agree that they're too expensive and have way too much bureaucracy, but that's a completely different set of problems.

      --
      mcgrewbooks.com mcgrew.info nooze.org
      • (Score: 4, Insightful) by maxwell demon on Saturday January 07 2023, @06:44AM (3 children)

        by maxwell demon (1608) on Saturday January 07 2023, @06:44AM (#1285628) Journal

        Why not patents?

        Patents are much worse than copyright for software. With patents, you are in violation even if you provably never saw the patent and came up with the same solution independently.

        The only advantage of patents is that their duration is not as excessively long as copyright. But the solution to that is to reduce copyright to a reasonable time; that would not only benefit software.

        --
        The Tao of math: The numbers you can count are not the real numbers.
        • (Score: 2) by mcgrew on Saturday January 07 2023, @02:39PM (2 children)

          by mcgrew (701) <publish@mcgrewbooks.com> on Saturday January 07 2023, @02:39PM (#1285679) Homepage Journal

          The only advantage of patents is that their duration is not as excessively long as copyright. But the solution to that is to reduce copyright to a reasonable time

          About as easy as flying to the moon by flapping your arms in our plutocratic society.

          --
          mcgrewbooks.com mcgrew.info nooze.org
          • (Score: 2) by maxwell demon on Saturday January 07 2023, @05:00PM (1 child)

            by maxwell demon (1608) on Saturday January 07 2023, @05:00PM (#1285705) Journal

            Well, still easier than getting completely rid of copyright, don't you think?

            --
            The Tao of math: The numbers you can count are not the real numbers.
            • (Score: 3, Interesting) by mcgrew on Sunday January 08 2023, @01:24AM

              by mcgrew (701) <publish@mcgrewbooks.com> on Sunday January 08 2023, @01:24AM (#1285755) Homepage Journal

              Actually, the law now hardly qualifies as copyright, which was never about copying, it was about publishing. In America, for example, foreign works' copyrights were legally ignored, so American authors simply couldn't get published. Congress changed copyright to cover foreign works just to get Americans published.

              Real copyright isn't evil, present copyright is simply ridiculous. 25 years ago it had a 20 year monopoly as opposed to the author's lifetime plus 95 years like now.

              --
              mcgrewbooks.com mcgrew.info nooze.org
  • (Score: 3, Funny) by Opportunist on Friday January 06 2023, @05:28PM (1 child)

    by Opportunist (5545) on Friday January 06 2023, @05:28PM (#1285507)

    Well, MS, how does paying for that DMCA feel now?

    Money well spent, I guess? It's the gift that keeps on giving. And the spending that keeps on spending, it seems.

    • (Score: 3, Funny) by RS3 on Friday January 06 2023, @08:45PM

      by RS3 (6367) on Friday January 06 2023, @08:45PM (#1285550)

      Karma!

  • (Score: 3, Insightful) by crafoo on Friday January 06 2023, @05:32PM (2 children)

    by crafoo (6639) on Friday January 06 2023, @05:32PM (#1285508)

    learning from someone else's work is not a copyright violation. holding a copy of their work in your head is not a copyright violation. producing works that are quite similar to another's work is not a copyright violation.

    copyright is fake and useless anyway. without money and power you cannot realistically defend it. it's largely a tool for scam artists, grifters, and middlemen to "earn" comfortable lives off of actual productive and inventive people instead of digging in the cobalt mines where they belong.

    • (Score: 5, Funny) by canopic jug on Friday January 06 2023, @06:36PM (1 child)

      by canopic jug (3949) Subscriber Badge on Friday January 06 2023, @06:36PM (#1285522) Journal

      It's not learning in the way you or I or anyone else would consider to be learning. There is no understanding. There is no knowledge. There is no insight. Just blind, arbitrary recombination of code snippet after code snippet of various size. In order to do that, it's stripping the licenses and stripping the attribution and combining random pieces from random project millions and hundreds of millions of times until some conditions are met and the result spewed out. There's no intelligence there, just the ability to chew through an inhuman amount of combinations of code snippets arbitrarily harvested from other people's code, but in a very short time.

      A Machine Learning algorithm walks into a bar. The bartender asks, "What'll you have?"
      The algorithm says, "I'll have what everyone else is having."

      I haven't looked under the hood on M$ Copilot but the generic topic to look up is genetic programming aka evolutionary programming, and perhaps neural networks.

      --
      Money is not free speech. Elections should not be auctions.
      • (Score: 3, Touché) by choose another one on Friday January 06 2023, @11:52PM

        by choose another one (515) Subscriber Badge on Friday January 06 2023, @11:52PM (#1285583)

        There is no understanding. There is no knowledge. There is no insight. Just blind, arbitrary recombination of code snippet after code snippet of various size.

        Maybe I'm just too old and jaded and cranky, but when I look at code these days (especially the piles of garbage that pass for "web pages" today) that is what it frequently looks like - whoever or whatever wrote it.

        Only attribute you missed was maybe "arbitrary library after library, package after package". Vaguely recall someone using more library include lines than the lines of actual code that would have been needed to do the job with a little thought. Dependencies seem to maketh a project - the more you include the more important your project must be (and the more complex the dependency / version management in future, thus keeping you in a job I guess.

        To my mind this "AI" has just evolved to the point where human programmers are going anyway, but it's evolution is automated and faster.

  • (Score: 1, Insightful) by Anonymous Coward on Friday January 06 2023, @07:36PM

    by Anonymous Coward on Friday January 06 2023, @07:36PM (#1285539)

    Are typographers
    https://matthewbutterick.com/ [matthewbutterick.com]

  • (Score: 2) by mcgrew on Friday January 06 2023, @08:47PM

    by mcgrew (701) <publish@mcgrewbooks.com> on Friday January 06 2023, @08:47PM (#1285551) Homepage Journal

    Microsoft has been trying to kill open source since its existence became known. They're back to their old Internet Exploiter (with a new name) monopoly tricks, too. Random full screen pop up ads for their renamed IE on boot, and Google is, too; try loading Google News on an Android tablet using Firefox.

    "The love of money is the root of all evil."

    --
    mcgrewbooks.com mcgrew.info nooze.org
  • (Score: 2) by loonycyborg on Sunday January 08 2023, @01:38AM

    by loonycyborg (6905) on Sunday January 08 2023, @01:38AM (#1285758)

    I'm not fully convinced that github copilot generating code for you can be considered distribution. However resulting code can still be considered derivative work of whatever AI was trained on, which may or or may not be fair use depending on particular code taken. Naturally, AI engine cannot automatically determine this. So even if this class action fails people who use autopilot still have to double-check resulting code on license compliance.

(1)