As projected here back in October, there is now a class action lawsuit, albeit in its earliest stages, against Microsoft over its blatant license violation through its use of the M$ GitHub Copilot tool. The software project, Copilot, strips copyright licensing and attribution from existing copyrighted code on an unprecedented scale. The class action lawsuit insists that machine learning algorithms, often marketed as "Artificial Intelligence", are not exempt from copyright law nor are the wielders of such tools.
The $9 billion in damages is arrived at through scale. When M$ Copilot rips code without attribution and strips the copyright license from it, it violates the DMCA three times. So if olny 1% of its 1.2M users receive such output, the licenses were breached 12k times with translates to 36k DMCA violations, at a very low-ball estimate.
"If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000," the litigants stated.
Besides open-source licenses and DMCA (§ 1202, which forbids the removal of copyright-management information), the lawsuit alleges violation of GitHub's terms of service and privacy policies, the California Consumer Privacy Act (CCPA), and other laws.
The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– Fraud.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Negligence.
– Civil conspiracy.
– Declaratory relief.
Furthermore, these actions are contrary to what GitHub stood for prior to its sale to M$ and indicate yet another step in ongoing attempts by M$ to undermine and sabotage Free and Open Source Software and the supporting communities.
Previously:
(2022) GitHub Copilot May Steer Microsoft Into a Copyright Lawsuit
(2022) Give Up GitHub: The Time Has Come!
(2021) GitHub's Automatic Coding Tool Rests on Untested Legal Ground
(Score: 0, Insightful) by Rosco P. Coltrane on Friday January 06 2023, @04:15PM (19 children)
Dude... Saying M$ was cool amongst hip teens 20 years sgo. Using it now only pegs you as immature. Grow up.
(Score: 3, Informative) by Anonymous Coward on Friday January 06 2023, @04:29PM (3 children)
Sure. But at the same time, for SN readers, M$ disambiguates from "MS" which is used in other ways...
(Score: 3, Insightful) by Rosco P. Coltrane on Friday January 06 2023, @05:02PM (2 children)
Yeah. Or you could write Microsoft, to resolve the potential confusion.
(Score: 5, Funny) by DannyB on Friday January 06 2023, @05:27PM (1 child)
Look, it is simple. There are two kinds of MS.
1. An affliction suffered by millions which makes even the simplest tasks difficult.
2. A medical condition.
How to fix pylint error: modify pylint until it no longer complains. Rinse. Repeat.
Consider the savings in human effort if PC keyboards had originally had a single CTRL-ALT-DEL key.
(Score: 2, Interesting) by Anonymous Coward on Friday January 06 2023, @06:37PM
Lots more, here's a famous one,
https://en.wikipedia.org/wiki/MS._Found_in_a_Bottle [wikipedia.org]
(Score: 4, Insightful) by sjames on Friday January 06 2023, @05:02PM (7 children)
Is it really that huge a deal? In addition to disambiguating from the academic achievement and the degenerative disease, it's a long-time nickname that is instantly recognizable and whose commentary is still on point.
Side note, disambiguating from the degenerative disease may be a sort of false distinction :-)
(Score: 1, Redundant) by Rosco P. Coltrane on Friday January 06 2023, @05:08PM (6 children)
No disambiguation needed. TFA is about Microsoft. The abbreviation "MS" in the article - if you really insist on shortening the word "Microsoft" for some reason - can in no way refer to multiple sclerosis. It would only need disambiguation if the article was about Microsoft curing multiple sclerosis or funding multiple sclerosis research. It isn't the case here.
At any rate, in 2023, when you style it as M$, you sound immature.
(Score: 3, Insightful) by HiThere on Friday January 06 2023, @05:59PM
I accept that it has that meaning to you. I didn't even notice it until it got pointed out, the usage is so common.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 4, Funny) by Anonymous Coward on Friday January 06 2023, @07:08PM (2 children)
Thank you gramps for scolding those darned kids. Your harsh and demeaning words have helped make the world a better place.
(Score: 3, Funny) by Gaaark on Saturday January 07 2023, @03:41AM (1 child)
HEY! Get off his lawwwwwnnnn.... mumble mumble can't find my hanky....
--- Please remind me if I haven't been civil to you: I'm channeling MDC. I have always been here. ---Gaaark 2.0 --
(Score: 0) by Anonymous Coward on Sunday January 08 2023, @12:41AM
It's that sticky dripping thing in your, oh nevermind.
(Score: 3, Insightful) by Ingar on Saturday January 07 2023, @11:07AM (1 child)
If in 2023, you're still not styling it M$, you obviously missed what Microsoft has been doing the past three decades.
Understanding is a three-edged sword: your side, their side, and the truth.
(Score: 3, Insightful) by RS3 on Sunday January 08 2023, @12:46AM
I didn't want to engage this any more, but since you broke the ice, exactly my thoughts: "M$" references Microsoft being greedy controlling a-holes. Point being, sometimes they seem to be pretty okay, but I've learned to be very skeptical, maybe moreso cynical of them, as this whole GitHub thing bears out.
(Score: 4, Interesting) by canopic jug on Friday January 06 2023, @06:47PM (2 children)
Among the other reasons, some of which are covered above, it is a nod to M$ Team99 and its successors which used their own custom web spiders to crawl forums and detect the use of the string "Microsoft", case-insensitive. IF you avoided that string, then you had a few hours or unimpaired dialog on whatever topic before they came along anyway. However, if someone was foolish enough to use the string naming the Beast of Redmond directly, then astroturfers and shills swooped in within minutes and filled the thread with trolling up to and including gay and or coprophillic porn text, whatever it took to drag the threads off topic -- including whining about the string "m$" too.
Their teams, whatever they are called now, are sneakier than ever and still doing what they can to disrupt use of FOSS and Open Standards or even discussion of either topic.\
If you're going to troll about the summary, why not pick on the typos or the bad math which looks to be off by three orders of magnitude?
Either way, you have some companies hiding behind algorithms which they set up and are using to strip both licensing information and author attribution, both of which are DMCA violations. Can we sic the Business Software Alliance after its master? Or will it not bite the hand that feeds it?
Money is not free speech. Elections should not be auctions.
(Score: -1, Troll) by Anonymous Coward on Sunday January 08 2023, @05:10AM (1 child)
So anyone who uses "M$" is an incel.
(Score: -1, Troll) by Anonymous Coward on Sunday January 08 2023, @06:17AM
That was supposed to be sarcastic humor. Oh well. Whoosh it is.
(Score: 2, Insightful) by Runaway1956 on Friday January 06 2023, @07:24PM
No one confuses M$ with for an article about multiple sclerosis. M$ is a quite fitting designator for Evil Corp.
“I have become friends with many school shooters” - Tampon Tim Walz
(Score: 0) by Anonymous Coward on Friday January 06 2023, @07:27PM (2 children)
Ain't you caught them pesky Duke boys yet?
(Score: 3, Funny) by Anonymous Coward on Friday January 06 2023, @08:37PM (1 child)
He's pissed off that they keep jumping the General Lee over him and making him look like an idiot trifler.
(Score: 0) by Anonymous Coward on Sunday January 08 2023, @07:59PM
shoulda wrote "trifling idiot"
(Score: 5, Interesting) by looorg on Friday January 06 2023, @04:35PM (7 children)
Just imagine how long and complex the license text would be if they did somehow manage to attribute everything to everyone that their AI "learned" from. It's not that it should be that hard -- it should remember or could take note where it borrowed all the code-snippets from. But it's not like most people would read it anyway and if they had to click anything they would just click the big OK/Accept button.
Still it's amusing now that they are using the piracy-math on how many violations there have been, even tho they are clearly low-balling it instead of high-balling it like copyright lawyers for various entertainment-outfits do.
That said if Copilot doesn't attribute then shouldn't the same be said for more or less ANY "AI" out there? ChatGPT? That even apparently now then write papers and everything and it doesn't cite or attribute anything to anyone. It should instantly classify anything it produces as plagiarism and be a giant lawsuit waiting to happen. How about the "AI" that draws pictures, composes music, creates medicines etc? Seems AI-tech is basically piracy-tech as it ripsoff everything and everyone and shares non of the glory and credits with anyone, possibly with the exception of the one that made them. But fucks everyone that provided the learning data over completely.
(Score: 4, Insightful) by Rich on Friday January 06 2023, @05:17PM
It's the same for search engines. Search engines make a local copy of the web (remember "Google Cache")? This is fundamentally required, because a.) the indexing needs to do a differential update to the word indexes and b.) search needs to access the sequence for sentence matching. Yet, when the first search engined (including Google) came up, copyright law did not allow for that. Not that anyone cared (or even understood), yet there were even discussions whether mere copying something into the browser cache might be illegal. But basically they didn't care about any licence and just grabbed everything accessible and copied it to make a profit. "robots.txt" only came after the fact.
The saying goes something like "when you do something borderline that's novel, you infringe the law, but of big money does the same, it becomes the law."
(Score: 2) by HiThere on Friday January 06 2023, @06:08PM (5 children)
On the one hand, this suit looks absolutely proper by the existing laws, and like if cases were decided on their merits it should win.
On the other hand, the actions being protested look perfectly reasonable.
So, to me, this is more an argument that the existing copyright law should be totally thrown out. And I rather agree. Go back to the copyright laws if 1823, and then update as needed. But explicitly include a working definition of "fair use" so it isn't just "whatever the lawyers can convince the court is reasonable". (I would have said "1723", but the US didn't exist at that point, and the worst changes happened after the civil war. But maybe 1800 would be a better date.)
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 5, Insightful) by canopic jug on Friday January 06 2023, @06:57PM (4 children)
On the other hand, the actions being protested look perfectly reasonable.
How is taking copyrighted code and removing references to both the author and the original license ok? It is ok with a machine learning algorithm, would it still be ok with another algorithm like a complex regex with a few conditional statements thrown in? What about hiring unskilled staff that don't understand the code but can read enough to strip both attribution and licensing information? Where is the line drawn?
Money is not free speech. Elections should not be auctions.
(Score: 2) by HiThere on Friday January 06 2023, @09:40PM
The idea that there should be a line is a real problem, but learning, whether human or machine, involves processing lots and lots of stuff, and extracting what is deemed useful. How can you argue against this and yet support spelling bees. Some of those words being spelled are less than 30 years old, so SOMEBODY invented the spelling.
Consider, e.g., the program "Hello, World.". There are lots of variations of it printed, Should you need to respect the copyrights of the authors? But most of them are trivial variations of the original, so maybe only the original should be deemed worthy of copyright. But this would mean that you couldn't illustrate a new language by doing a translation of "Hello, World" into that language.
Basically, what I think ChatGPT *should* be able to argue is that it's productions are functional, and therefore not deserving of copyright. But that kind of argument requires expensive lawyers, and you've got to be ready to pay for appeals. It would be much better if copyright clearly didn't cover that. And didn't cover anything over 20 years old. (Including renewals! Make it 10 years of copyright and up to two renewals for 5 years each, and they've got to be a continuous span of time.)
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by wisnoskij on Saturday January 07 2023, @04:28AM (2 children)
How do you think human programs do their job?
I went to school for years to have professors write code on the board for me to copy without attribution.
(Score: 4, Insightful) by maxwell demon on Saturday January 07 2023, @06:38AM (1 child)
But your professor carefully selected the code he wrote on the blackboard. He didn't copy some random code from the internet without looking at its copyright.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 0) by Anonymous Coward on Saturday January 07 2023, @02:54PM
AND if he did, he could get sued for it.
So if Microsoft is illegally copying random code without permission and getting sued for it, then I'm all for them getting sued.
That said I'm the one who wants people in Microsoft to be imprisoned for "upgrading" people's Windows machines to Windows 10 without their permission. After all I'd probably end up in prison if I did a similar thing to hundreds or thousands of other people's PCs.
(Score: 4, Interesting) by sjames on Friday January 06 2023, @04:59PM (14 children)
Just what we need, another lawyer more than happy to support a greedy or butthurt individual who doesn't even understand what they're suing over.
This is just another take of the patent trend years ago of claiming that 'on a computer' makes an old thing entirely novel. AI learns from existing things, much like human students. If I write a book, I have done so based on learning from the many books I have read, but I have not violated copyright. Neither have the various art generators or AI text tools, or CoPilot.
It's one thing for a layman to mis-understand and believe the things store everything they learned from and do a cut-paste job, UNLESS they want to take legal action based on that woeful mis-understanding.
This is right up there with idiot corporations claiming #include "stdio.h" is a copyright violation.
I say this as an anything but a fan of MS.
Note that in practice, CoPilot doesn't seem to be all that good (even marketing only claims it to produce the right code 50% of the time) and it sounds like even an improved version is a great way to introduce a nearly ubiquitous exploitable security flaw.
(Score: 3, Interesting) by shrewdsheep on Friday January 06 2023, @05:24PM (6 children)
Your point is well taken and I tend to agree. Just to point out the counter-argument: Language models tend to be so good because they are in part nearest-neighbor learners. They literally copy portions from the training data over to the output which can be deduced from mistakes they make. Certainly this argument is not water-tight as there is no investigation (to my knowledge) showing how widespread this behavior is for any response, but it can certainly be posited that this behavior does happen. I am actually surprised RMS has not yet jumped up to claim all auto-pilot code is GPL.
(Score: 3, Interesting) by sjames on Friday January 06 2023, @05:43PM (5 children)
I have heard the counter-argument, but both programming and natural language are filled with pat phrases. For example, I'll bet you read "Your point is well taken" at some point before the first time you said or wrote it. That's not an accusation, it's just how language works. It's also to be expected of a synthetic neural network.
The person who wrote the code that CoPilot learned from probably did so because their own naturally occurring neural net distilled down many variants that it was exposed to and identified that particular variation as a canonical phrase.
I'm almost 100% certain that I have written a line of code at some point in my career that was identical to a line someone else wrote and that neither of us is aware of it, simply because it was a good concise way to express the thought.
(Score: 2, Interesting) by shrewdsheep on Friday January 06 2023, @05:52PM (4 children)
Indeed one line wouldn't qualify and it would be impossible to find the primordial line anyhow. This is precisely what will be tested in court: how many lines qualify as plagiarism. For me, it would be about 10 lines of code but it is really statistics: how many lines make a unique snippet of code.
(Score: 4, Interesting) by HiThere on Friday January 06 2023, @06:13PM
It probably will end up being something that stupid, but that's a really stupid measure of whether anything significant was copied. And even more of whether anything significant that was original to the author was copied.
And if it's successful, Knuth can sue every programmer in existence for their entire worth.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 1, Informative) by Anonymous Coward on Friday January 06 2023, @06:44PM
> about 10 lines of code
Or about a half-line of APL...
(Score: 3, Interesting) by turgid on Friday January 06 2023, @09:30PM
Something that troubles me is the concept of accessors in OOP languages, getters and setters. They're boiler plate, yet they consume many lines of code. Does copying one of those constitute plagiarism? My next questions is "What have we been doing for the last 40 years?" Why don't OOP languages provide operators for this purpose? Why are we writing this code by hand?
I refuse to engage in a battle of wits with an unarmed opponent [wikipedia.org].
(Score: 3, Interesting) by RS3 on Sunday January 08 2023, @05:27AM
All this is making me wonder: are they looking at source code? Or object / binary executable? Cause you could copy binary, toss in some nops here and there, recalculate checksum, and I think it'd be difficult to detect? But make the source look very different. I dunno, it's a mess; there are no easy or simple answers.
(Score: 2) by SomeRandomGeek on Friday January 06 2023, @06:01PM (1 child)
You need to think about it legally rather than technically. I would like to hear Microsoft's legal theory of the case. Do they think that an AI trained from data is not a derived work of that data? Do they think that their AI is based on so many different pieces of stolen code that attribution can't be traced to any one source and therefore they don't owe anyone anything? Did they hide a license to do this in the GitHub ToS? Do they think they have deeper pockets and they can wait out any challenge regardless of its validity? The details matter. Legal stuff is funny that way.
(Score: 4, Insightful) by sjames on Friday January 06 2023, @06:27PM
How about if an AI trained on publicly available source code is infringing copyright than so is literally every single human programmer who has ever lived except Lady Ada. Extending to other copyrightable works, it's copyright infringement all the way down.
And since that is literally the only known way of learning to write or code, we can either return to the caves (but NO DRAWING!) or decide perhaps this is not actually copyright violation.
As an amusing note, just imagine how many pat legal phrases have been ruthlessly copied without attribution in the court filing!
(Score: 2) by mcgrew on Friday January 06 2023, @08:40PM (4 children)
Yet another reason I don't believe that computer code should be covered by copyright at all. That despite (because of?) the fact that I registered two copyrights for computer programs four decades ago, neither of which can be run on any existing computer today. I'll still hold those copyrights ninety five years after I'm buried.
You can't copyright a food recipe, and a computer program is a recipe of sorts, unlike a book or a painting. You can't copyright a dance. I've never seen any reason, let alone a good one, why computer programs should be covered by copyright.
Why not patents? I will agree that they're too expensive and have way too much bureaucracy, but that's a completely different set of problems.
Impeach Donald Palpatine and his sidekick Elon Vader
(Score: 4, Insightful) by maxwell demon on Saturday January 07 2023, @06:44AM (3 children)
Patents are much worse than copyright for software. With patents, you are in violation even if you provably never saw the patent and came up with the same solution independently.
The only advantage of patents is that their duration is not as excessively long as copyright. But the solution to that is to reduce copyright to a reasonable time; that would not only benefit software.
The Tao of math: The numbers you can count are not the real numbers.
(Score: 2) by mcgrew on Saturday January 07 2023, @02:39PM (2 children)
The only advantage of patents is that their duration is not as excessively long as copyright. But the solution to that is to reduce copyright to a reasonable time
About as easy as flying to the moon by flapping your arms in our plutocratic society.
Impeach Donald Palpatine and his sidekick Elon Vader
(Score: 2) by maxwell demon on Saturday January 07 2023, @05:00PM (1 child)
Well, still easier than getting completely rid of copyright, don't you think?
The Tao of math: The numbers you can count are not the real numbers.
(Score: 3, Interesting) by mcgrew on Sunday January 08 2023, @01:24AM
Actually, the law now hardly qualifies as copyright, which was never about copying, it was about publishing. In America, for example, foreign works' copyrights were legally ignored, so American authors simply couldn't get published. Congress changed copyright to cover foreign works just to get Americans published.
Real copyright isn't evil, present copyright is simply ridiculous. 25 years ago it had a 20 year monopoly as opposed to the author's lifetime plus 95 years like now.
Impeach Donald Palpatine and his sidekick Elon Vader
(Score: 3, Funny) by Opportunist on Friday January 06 2023, @05:28PM (1 child)
Well, MS, how does paying for that DMCA feel now?
Money well spent, I guess? It's the gift that keeps on giving. And the spending that keeps on spending, it seems.
(Score: 3, Funny) by RS3 on Friday January 06 2023, @08:45PM
Karma!
(Score: 3, Insightful) by crafoo on Friday January 06 2023, @05:32PM (2 children)
learning from someone else's work is not a copyright violation. holding a copy of their work in your head is not a copyright violation. producing works that are quite similar to another's work is not a copyright violation.
copyright is fake and useless anyway. without money and power you cannot realistically defend it. it's largely a tool for scam artists, grifters, and middlemen to "earn" comfortable lives off of actual productive and inventive people instead of digging in the cobalt mines where they belong.
(Score: 5, Funny) by canopic jug on Friday January 06 2023, @06:36PM (1 child)
It's not learning in the way you or I or anyone else would consider to be learning. There is no understanding. There is no knowledge. There is no insight. Just blind, arbitrary recombination of code snippet after code snippet of various size. In order to do that, it's stripping the licenses and stripping the attribution and combining random pieces from random project millions and hundreds of millions of times until some conditions are met and the result spewed out. There's no intelligence there, just the ability to chew through an inhuman amount of combinations of code snippets arbitrarily harvested from other people's code, but in a very short time.
I haven't looked under the hood on M$ Copilot but the generic topic to look up is genetic programming aka evolutionary programming, and perhaps neural networks.
Money is not free speech. Elections should not be auctions.
(Score: 3, Touché) by choose another one on Friday January 06 2023, @11:52PM
Maybe I'm just too old and jaded and cranky, but when I look at code these days (especially the piles of garbage that pass for "web pages" today) that is what it frequently looks like - whoever or whatever wrote it.
Only attribute you missed was maybe "arbitrary library after library, package after package". Vaguely recall someone using more library include lines than the lines of actual code that would have been needed to do the job with a little thought. Dependencies seem to maketh a project - the more you include the more important your project must be (and the more complex the dependency / version management in future, thus keeping you in a job I guess.
To my mind this "AI" has just evolved to the point where human programmers are going anyway, but it's evolution is automated and faster.
(Score: 1, Insightful) by Anonymous Coward on Friday January 06 2023, @07:36PM
Are typographers
https://matthewbutterick.com/ [matthewbutterick.com]
(Score: 2) by mcgrew on Friday January 06 2023, @08:47PM
Microsoft has been trying to kill open source since its existence became known. They're back to their old Internet Exploiter (with a new name) monopoly tricks, too. Random full screen pop up ads for their renamed IE on boot, and Google is, too; try loading Google News on an Android tablet using Firefox.
"The love of money is the root of all evil."
Impeach Donald Palpatine and his sidekick Elon Vader
(Score: 2) by loonycyborg on Sunday January 08 2023, @01:38AM
I'm not fully convinced that github copilot generating code for you can be considered distribution. However resulting code can still be considered derivative work of whatever AI was trained on, which may or or may not be fair use depending on particular code taken. Naturally, AI engine cannot automatically determine this. So even if this class action fails people who use autopilot still have to double-check resulting code on license compliance.