from the we-violate-all-open-source-licenses-equally dept.
GitHub’s automatic coding tool rests on untested legal ground:
The Copilot tool has been trained on mountains of publicly available code
[...] When GitHub announced Copilot on June 29, the company said that the algorithm had been trained on publicly available code posted to GitHub. Nat Friedman, GitHub’s CEO, has written on forums like Hacker News and Twitter that the company is legally in the clear. “Training machine learning models on publicly available data is considered fair use across the machine learning community,” the Copilot page says.
But the legal question isn’t as settled as Friedman makes it sound — and the confusion reaches far beyond just GitHub. Artificial intelligence algorithms only function due to massive amounts of data they analyze, and much of that data comes from the open internet. An easy example would be ImageNet, perhaps the most influential AI training dataset, which is entirely made up of publicly available images that ImageNet creators do not own. If a court were to say that using this easily accessible data isn’t legal, it could make training AI systems vastly more expensive and less transparent.
Despite GitHub’s assertion, there is no direct legal precedent in the US that upholds publicly available training data as fair use, according to Mark Lemley and Bryan Casey of Stanford Law School, who published a paper last year about AI datasets and fair use in the Texas Law Review.
[...] And there are past cases to support that opinion, they say. They consider the Google Books case, in which Google downloaded and indexed more than 20 million books to create a literary search database, to be similar to training an algorithm. The Supreme Court upheld Google’s fair use claim, on the grounds that the new tool was transformative of the original work and broadly beneficial to readers and authors.
Microsoft’s GitHub Copilot Met with Backlash from Open Source Copyright Advocates:
GitHub Copilot system runs on a new AI platform developed by OpenAI known as Codex. Copilot is designed to help programmers across a wide range of languages. That includes popular scripts like JavaScript, Ruby, Go, Python, and TypeScript, but also many more languages.
“GitHub Copilot understands significantly more context than most code assistants. So, whether it’s in a docstring, comment, function name, or the code itself, GitHub Copilot uses the context you’ve provided and synthesizes code to match. Together with OpenAI, we’re designing GitHub Copilot to get smarter at producing safe and effective code as developers use it.”
One of the main criticisms regarding Copilot is it goes against the ethos of open source because it is a paid service. However, Microsoft would arguably justify this by saying the resources needed to train the AI are costly. Still, the training is problematic for some people because they argue Copilot is using snippets of code to train and then charging users.
Is it fair use to auto-suggest snippets of code that are under an open source copyright license? Does that potentially bring your code under that license by using Copilot?
One glorious day code will write itself without developers developers.
See Also:
CoPilot on GitHub
Twitter: GitHub Support just straight up confirmed in an email that yes, they used all public GitHub code, for Codex/Copilot regardless of license.
Hacker News: GitHub confirmed using all public code for training copilot regardless license
OpenAI warns AI behind GitHub’s Copilot may be susceptible to bias
Related Stories
The Free Software Foundation (FSF) has published five of the white papers it funded regarding questions about Microsoft Copilot. After Microsoft acquired GitHub, it set up a machine learning system to cull through its archive of software, called Copilot. The approach chosen and even the basic activity raises many questions starting with those of licensing.
Microsoft GitHub's announcement of an AI-driven Service as a Software Substitute (SaaSS) program called Copilot -- which uses machine learning to autocomplete code for developers as they write software -- immediately raised serious questions for the free software movement and our ability to safeguard user and developer freedom. We felt these questions needed to be addressed, as a variety of serious implications were foreseen for the free software community and developers who use GitHub. These inquiries -- and others possibly yet to be discovered -- needed to be reviewed in depth.
In our call for papers, we set forth several areas of interest. Most of these areas centered around copyright law, questions of ownership for AI-generated code, and legal impacts for GitHub authors who use a GNU or other copyleft license(s) for their works. We are pleased to announce the community-provided research into these areas, and much more.
First, we want to thank everyone who participated by sending in their papers. We received a healthy response of twenty-two papers from members of the community. The papers weighed-in on the multiple areas of interest we had indicated in our announcement. Using an anonymous review process, we concluded there were five papers that would be best suited to inform the community and foster critical conversations to help guide our actions in the search for solutions.
As projected here back in October, there is now a class action lawsuit, albeit in its earliest stages, against Microsoft over its blatant license violation through its use of the M$ GitHub Copilot tool. The software project, Copilot, strips copyright licensing and attribution from existing copyrighted code on an unprecedented scale. The class action lawsuit insists that machine learning algorithms, often marketed as "Artificial Intelligence", are not exempt from copyright law nor are the wielders of such tools.
The $9 billion in damages is arrived at through scale. When M$ Copilot rips code without attribution and strips the copyright license from it, it violates the DMCA three times. So if olny 1% of its 1.2M users receive such output, the licenses were breached 12k times with translates to 36k DMCA violations, at a very low-ball estimate.
"If each user receives just one Output that violates Section 1202 throughout their time using Copilot (up to fifteen months for the earliest adopters), then GitHub and OpenAI have violated the DMCA 3,600,000 times. At minimum statutory damages of $2500 per violation, that translates to $9,000,000,000," the litigants stated.
Besides open-source licenses and DMCA (§ 1202, which forbids the removal of copyright-management information), the lawsuit alleges violation of GitHub's terms of service and privacy policies, the California Consumer Privacy Act (CCPA), and other laws.
The suit is on twelve (12) counts:
– Violation of the DMCA.
– Breach of contract. x2
– Tortuous interference.
– Fraud.
– False designation of origin.
– Unjust enrichment.
– Unfair competition.
– Violation of privacy act.
– Negligence.
– Civil conspiracy.
– Declaratory relief.
Furthermore, these actions are contrary to what GitHub stood for prior to its sale to M$ and indicate yet another step in ongoing attempts by M$ to undermine and sabotage Free and Open Source Software and the supporting communities.
Previously:
(2022) GitHub Copilot May Steer Microsoft Into a Copyright Lawsuit
(2022) Give Up GitHub: The Time Has Come!
(2021) GitHub's Automatic Coding Tool Rests on Untested Legal Ground
(Score: 1, Funny) by Anonymous Coward on Friday July 09 2021, @12:56AM (2 children)
It did not turn out well.
(Score: 1, Insightful) by Anonymous Coward on Friday July 09 2021, @03:02AM
Tay Did Nothing Wrong
(Score: 2) by DannyB on Friday July 09 2021, @02:47PM
Adjust your training method. Your test data set should consist of ONLY the Anonymous Coward posts.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 4, Informative) by darkfeline on Friday July 09 2021, @01:12AM (28 children)
https://docs.github.com/en/github/site-policy/github-terms-of-service#4-license-grant-to-us [github.com]
No untested legal ground here. By using GIthub, you agree to their ToS and gave them permission to do this.
Join the SDF Public Access UNIX System today!
(Score: 2) by c0lo on Friday July 09 2021, @02:05AM (9 children)
Doesn't mean a thing. If any provisions of a contract is illegal, it doesn't matter if you agreed with them or not, they are still illegal.
Yes, they can show your code to others if it's not hosted under a private repository. Others that see your code cannot use your code outside the license under your code is released.
https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
(Score: 2) by darkfeline on Friday July 09 2021, @02:13AM (8 children)
> If any provisions of a contract is illegal, it doesn't matter if you agreed with them or not, they are still illegal.
Those provisions are not only not illegal, but standard for most online services that host user content.
> Others that see your code cannot use your code outside the license under your code is released.
And that is relevant how? Github can use your code because you granted Github a license to do so. Whether non-Github entities can use your code is irrelevant to whether Github can use your code for their Copilot product.
Join the SDF Public Access UNIX System today!
(Score: 2) by c0lo on Friday July 09 2021, @02:32AM (2 children)
Until you trip over a corner case, and a law suit carves an exception and creates a precedent. You can't implicitly assume ToSes are legal in all cases for all time.
Not if someone sues and wins on the grounds that, in those particular circumstances, GitHub doing so facilitated copyright infringement. Or outright committed infringement by creating a derivative work that substantially uses yours (beyond what fair use provisions allow).
The probability for this to happen? Low indeed. But not impossible, especially if your work falls within a narrow special area where not much other code exist to train that AI.
https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
(Score: 2) by darkfeline on Friday July 09 2021, @09:52AM (1 child)
https://docs.github.com/en/github/site-policy/github-terms-of-service#o-limitation-of-liability [github.com]
By using Github, you indemnified Github for any liability that may arise from "facilitated copyright infringement" through an AI block box. You would have to prove intent or gross negligence recognized by others in the nascent ML field. Sure, there's a minuscule chance that a court may find Github liable in the future, but now we are extremely far out from the "Untested Legal Ground" claim (disregarding the nitpick that any situation could be considered "Untested Legal Ground" due to the unique configuration of matter in the universe in that moment).
Join the SDF Public Access UNIX System today!
(Score: 2) by c0lo on Saturday July 10 2021, @04:56AM
I don't see where I'm giving up my right to copyright, especially if Github were to be instrumental in infringing the copyright, no matter how they did it: by human operator or by running an algorithm. It is their AI that creates a derivative work from a copyrighted one, unless they receive an explicit license from the author to do it, there's no indemnification for them.
Mind you, it's not only the GPLed software that they potentially infringe. MIT license says "you can do whatever you want as long as you reproduce this very license in your code" - if they strip the license in the process of AI-fycation (creating a derivative work), they are in trouble straight away.
https://www.youtube.com/watch?v=aoFiw2jMy-0 https://soylentnews.org/~MichaelDavidCrawford
(Score: 4, Informative) by http on Friday July 09 2021, @03:31AM (4 children)
Try reading what you posted a second time, paying careful attention:
Using hosted code to train an AI does not count towards providing the service of making a repository publicly available.
I browse at -1 when I have mod points. It's unsettling.
(Score: 2) by bradley13 on Friday July 09 2021, @05:34AM
Sure it does. If they claim co-pilot is part of their service, then it's covered by the ToS.
Everyone is somebody else's weirdo.
(Score: 2) by darkfeline on Friday July 09 2021, @06:00AM (2 children)
https://docs.github.com/en/github/site-policy/github-terms-of-service#a-definitions [github.com]
It's ironic how accurate the subject of this thread is.
Join the SDF Public Access UNIX System today!
(Score: 2) by PiMuNu on Friday July 09 2021, @09:31AM
> It's ironic how accurate the subject of this thread is.
I think rather it means you have to be *ultra* careful when reading ToS in order to understand what it really means. Which is exactly why everyone just clicks "Accept".
(Score: 1, Insightful) by Anonymous Coward on Sunday July 11 2021, @09:11AM
Say at some point in time you read the ToS, review the services offered at that time, agree to the ToS based on those, have you given GitHub permission to use your content for those services or for any services they may think of later on? I don't think the latter is legal everywhere in the world. Perhaps it is in the US, but as far as I'm aware EU law is based on ideas on what is reasonable that don't include things like this, you're supposed to be able to oversee what you agree too, and an "anything we can think of in the future" clause, explicit or implicit, conflicts with that.
(Score: -1, Flamebait) by Anonymous Coward on Friday July 09 2021, @02:18AM
Unenforceable ToS means shit.
You are the sort that make people dislike autistics.
(Score: 0) by Anonymous Coward on Friday July 09 2021, @03:57AM (2 children)
Though why anyone would use github, knowing it's going to be abused, is beyond me.
(Score: 2) by HiThere on Friday July 09 2021, @03:08PM (1 child)
The questions then appears to be "Can they choose to add new features and call it part of the same service?". Certainly it wasn't a part of the service when most people agreed to it, but if their "new AI application" is offered by the same organization to those capable of using the prior service, can they define it as a part of the same service?
It's not as if people never used code repositories as examples of how to do things before. They've just automated that as a new feature of their service. Or is that stretching things beyond where a court would agree?
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 3, Insightful) by JoeMerchant on Friday July 09 2021, @03:44PM
Depends on the court, of course.
What I wonder is: if this goes on for 5, 20, or 100 years without being tested in court, at what point is it immune from contest? I mean, of course as the practice spreads over time and various service providers there will be fewer and fewer courts willing to find against it, but Mickey Mouse made a mockery of fair use for nearly 100 years before the political climate denied him (and the industry as a whole) another copyright extension.
🌻🌻 [google.com]
(Score: 0) by Anonymous Coward on Friday July 09 2021, @06:54AM (2 children)
I think there is a slight distinction to make. GitHub, under the terms of their license that you agreed to when you signed up, is perfectly within their rights to use your software in this way. The problem is that the people and organizations that receive these suggestions may not have the legal right to actually use them.
(Score: 2) by DannyB on Friday July 09 2021, @03:02PM (1 child)
What if someone else puts your GPL code on GitHub? YOU did not authorize this use of your code under non GPL terms. YOU now have cause to sue someone.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 0) by Anonymous Coward on Friday July 09 2021, @09:44PM
Even if someone else puts your code on GitHub in violation of their license, GitHub is still able to have reasonable reliance on the warranties of that user in order to cover their own ass, similar to every other hosting service. The real interesting part is that your remedy is a DMCA notice. Once they receive such a notice, they have to remove the code from everywhere under their control. What that also includes is the AI data sets and the trained AI output. Essentially, they would have to retrain the entire AI in order to be sure they got it all. Only after the DMCA notice would GitHub get anywhere near liability for themselves.
(Score: 2, Insightful) by Anonymous Coward on Friday July 09 2021, @07:06AM (1 child)
But I think the main point of TFA is missing is that most likely the ones at risk are its users, who are copying into their code snippets of copyrighted code from other people, that might or might not require attribution or even redistribution of the entire codebase they are used into with a specific license. Considering this is a paid service, it is likely that it will be used for proprietary software that is quite unlikely to satisfy the terms of open source licenses.
So github is turning their customers into copyright infringers, and might be sued for facilitating and making a profit off it (like piratebay or megaupload). They are just relying on the fact that its other users, those providing the content, are unlikely to have proof the infringement happened.
(Score: 2) by HiThere on Friday July 09 2021, @03:11PM
Except that just as with music, individuals are generally not viable targets for an expensive suite. So a different approach will be taken.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 1, Insightful) by Anonymous Coward on Friday July 09 2021, @11:07AM (1 child)
Yes, and so does the GPL license. And the GPL license then has provisions that the code cannot be re-licensed under terms not compatible with GPL.
If their training regurgitates GPL code, then that code is still GPL. It doesn't matter who or what plagiarized it.
The law is the law, no matter what TOS of some random company (or even the great Microsoft) you agree with.
(Score: 2) by HiThere on Friday July 09 2021, @03:13PM
Additionally, if they offer GPL code, they are obliged to include the GPL license.
Yes, they have the right to use and share that code, but they don't have the right to hide the license.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 4, Insightful) by DannyB on Friday July 09 2021, @03:01PM (6 children)
John writes a GPL program.
Later, Jane, without John's knowledge, puts this program's source code on GitHub.
John did not authorize Microsoft / GitHub to publish this code under any terms other than the GPL. This includes when snippets of this GPL code are inserted into proprietary closed source projects because the developer is using CoPilot to auto-suggest snippets.
I think John has cause to sue someone. Jane, GitHub, Microsoft, a developer using CoPilot who now has GPL code locked up in their proprietary code.
This also brings up another important point. If you work on a proprietary product (as I do) then you better be sure you have a proper license for every last bit of code that is in your project which you didn't write yourself. If you include a commercial library, have a proper commercial license and are in compliance with it. If you have open source in your project, make sure you are in full compliance with the license. (eg, Apache 2, BSD, MIT, etc)
In the Java world there is an embarrassing amount of high quality third party libraries. The licenses on these are very commercial friendly. But ALWAYS review the license. Get management approval.
The commercial friendly licensing is because the users of these are large commercial interests writing large commercial closed source Java programs. And many of the same corporations that consume this open source also sponsor various open source Java projects. Why does Red Hat spend significant resources developing a state of the art Garbage Collector (Shenandoah) for Java? Because their biggest customers are on Java. Why does IBM and Microsoft invest in Java? Same reason.
CoPilot brings a whole new vector where unlicensed code can make its way in to your source code base. This is similar to, but worse than copy/pasting some code you googled from, say, Stack Overflow. If you don't have a license for it, then don't copy/paste it in. Understand it, and then do what it does in your own code.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 0) by Anonymous Coward on Friday July 09 2021, @04:31PM
But its AI, as in "Intelligent": its not copy-pasting code - it "learned" the code and is using its "knowledge" to synthesize new code on the spot!
(Score: 2) by darkfeline on Friday July 09 2021, @07:39PM (3 children)
John does not have cause to sue for monetary damages as likely he cannot demonstrate any monetary damages. He can demand cease and desist for Jane to stop unauthorized re-licensing of his code, if Jane is still doing so. He can demand Github remove the code, as the copyright holder did not agree to Github's ToS. Microsoft, as the owner of Github as a subsidiary, is not involved at all. I highly doubt John would able to prove a developer somewhere got the exact same code that he wrote from CoPilot, and enough of it to constitute copyright infringement. If he could, then he could also demand said developer to cease and desist.
For some reason, people seem to be assuming CoPilot is straight up copying sections of code fed into it. That's not how AI trained on broad datasets work, unless the code is generic enough that copyright would no longer be applicable in the first place (e.g. a function that adds two arguments together).
Join the SDF Public Access UNIX System today!
(Score: 0) by Anonymous Coward on Saturday July 10 2021, @04:35AM
In the United States, John may choose to seek statutory damages which (if successful) entitles him to monetary relief of no less than $750 per work infringed and does not require him to demonstrate any actual damages whatsoever. John must have registered his copyright prior to the alleged infringement in order to be eligible for statutory relief.
Other jurisdictions may have similar mechanisms.
(Score: 0) by Anonymous Coward on Saturday July 10 2021, @04:55AM
GitHub's own research [github.com] says "once, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training -- that was the GNU General Public License."
So yes it does sound like this tool is absolutely capable of straight up regurgitating significant amounts of nontrivial and copyrightable text, verbatim, that was part of its training set.
(Score: 0) by Anonymous Coward on Sunday July 11 2021, @09:29AM
If GitHub (Microsoft) argues there is no problem with copyright there is a simple solution to make everybody happy: train the AI on the huge proprietary code base Microsoft owns.
It's supposed to be code developed by the very best developers in the world, according to what Microsofties told me during a conversion project to their platform I was part of once, so it must be excellent, and without any copyright problems there is no reason not to use it.
(Score: 0) by Anonymous Coward on Friday July 09 2021, @09:26PM
i want my cut of github and MS's hide! yeehaaaawwww!
(Score: 2, Interesting) by Anonymous Coward on Friday July 09 2021, @01:13AM (8 children)
Since it's been trained on open source code, I'm going to assume a lot of it isn't public domain, and that it's going to end up suggesting code that is covered y the GPL, or code that is "you can look but need a license to actually use in non-free products" and such.
Should be "interesting" (as in "highly profitable") to fire it up and catch it offering to use non-public-domain code snippets, which can potentially make their way into proprietary code. Or even to open code that you forget to credit the author for.
(Score: 1, Interesting) by Anonymous Coward on Friday July 09 2021, @02:24AM (4 children)
If this gets to be something important, it will have to be GPLed.
And I LIKE IT.
(Score: 0) by Anonymous Coward on Friday July 09 2021, @03:59AM (3 children)
(Score: 0) by Anonymous Coward on Friday July 09 2021, @04:03AM
For once, MS wasn't wrong. It's by design.
And it's FUCKING Brilliant.
(Score: -1, Redundant) by Anonymous Coward on Friday July 09 2021, @04:16AM
MS wasn't wrong.
And it's FUCKING BRILLIANT.
(Score: 0) by Anonymous Coward on Friday July 09 2021, @11:17AM
What do you mean "claimed"? GPL *is* a viral license by design. That is its sole purpose and why it was designed like it. It's also a reason why LGPL is not viewed positively by FSF since day 1 but it was deemed necessary to allow non-free software to actually run on free-software based OS.
The viral nature of GPL is a detriment of it as a license. Microsoft legal was just shit scared that it would embed itself into something by accident (or maybe malice by some disgruntled employee) and then they would have some trolls sue it like SCO vs. IBM. I think they are more relaxed over it now.
https://cloudblogs.microsoft.com/opensource/2018/03/19/microsoft-open-source-licensing-gplv3/ [microsoft.com]
(Score: 2) by HiThere on Friday July 09 2021, @03:17PM
Just about NO code is public domain. Copyright laws have practically eliminated the existence of new public domain works. That's the reason for licenses such as Artistic, MIT, and BSD. And part of the reason for GPL.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by JoeMerchant on Friday July 09 2021, @03:47PM (1 child)
At what point is a code snippet identifiable as non-public-domain? I'm sure that:
int i = 0;
appears in lots of GPL code, but does that make it covered by the GPL license?
How about a long paragraph of code that appears in both GPL code and MIT code? Does the MIT license take precedent only if it was published first?
🌻🌻 [google.com]
(Score: 0) by Anonymous Coward on Friday July 09 2021, @05:35PM
In order for a work to be copyrightable, it has to meet a minimum level of creativity. This is for courts to decide. The bar is pretty low but something like "int i = 0;", by itself, is unlikely to be considered a creative work eligible for copyright protection.
If a work is not protected by copyright then the terms of a copyright license like the GPL are irrelevant.
Like two people who independently, and unaware of each other's work, write exactly the same program in exactly the same way?
The creativity requirement should in principle prevent this from ever happening. If the work is considered copyrightable, and in the absence of any other information, the person who published first would have a pretty convincing argument that the other party used their work.
(Score: 2, Flamebait) by bmimatt on Friday July 09 2021, @01:17AM
Seems like a horrible idea to use things like that. Why would I want to help train Microsoft machines how to code? So they can push developer salaries down by making coding "more accessible" to the average Joe, who will write shit code with "Microsoft AI's" assistance. All of that so then, some versions later, eventually they'll try to dick me out of my own job with a piece of shitty software? Screw that.
(Score: 0) by Anonymous Coward on Friday July 09 2021, @04:39AM
Imagine an AI trained on pornhub.
(Score: 2) by PiMuNu on Friday July 09 2021, @10:23AM (1 child)
"It looks like you're trying to write a for loop"
(Score: 0) by Anonymous Coward on Friday July 09 2021, @02:47PM
Clippy: "It looks like you are trying to write for a loop."
(Score: 1) by shrewdsheep on Friday July 09 2021, @11:26AM (9 children)
Seems like software to automatically insert boilerplate code. I do not like languages where I have to write boilerplate code to begin with. Next thing that bothers me: why on earth would you use a language model (is in NLP) for this? Maybe perform the template selection based on statistical modeling, but never generate actual code with the model.
Thanks, but no thanks.
(Score: 0) by Anonymous Coward on Friday July 09 2021, @03:04PM
In Python the closest to boilerplate that I ever felt I was writing was:
For key, value in Dict.items():
(Score: 2) by DannyB on Friday July 09 2021, @03:13PM (7 children)
There is nothing wrong with boilerplate per se.
An IDE may insert common templates for you, such as a for or while loop. The entire structure is there. If I then rename the variable in the for loop, all references to that local variable within the loop are renamed before I even start filling in the body.
for ( int i = 0; i < 30; i++ ) {
. . . insert body hear . . .
}
If I change "i" to "z" (by doing ctrl-shift-R to rename variable), then all of the i's within the loop change to z's. Just within the template. It is an intelligent rename, not a stupid search/replace. It is based on the compiler's understanding of the scope of variable i. The compiler is deeply integrated with the editor. That's what makes an IDE powerful. The editor understands code on a conceptual level, not just as characters on a page. Or editors that use silly regex tricks to color code the source. Instead color code the source based on the compiler's understanding of the code in the editor.
What's wrong with boilerplate? You're using boilerplate every time you write various simple structures. while loop. for loop. if-then-else. A single keystroke should generate the template so you can tab to the various sections of the structure and fill them in without extraneous typing.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by HiThere on Friday July 09 2021, @03:25PM (4 children)
Well, ideally there would be a less verbose way to write that, such that the code was both more compact and easier to read and understand. But simple boilerplate like that if already pretty close. Think what the same statements would be like in assembler...or even just rewrite it as a while loop.
That said, everything you use a templated class you're using automated boilerplate. Every time you use an inline function you're using custom boilerplate. Etc.
But "non-standardized boilerplate" is annoying. I've been known to write a function to deal with it even at the cost of some efficiency.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by DannyB on Friday July 09 2021, @03:33PM (3 children)
A good IDE lets you create your own templates. (eg, boilerplate) If you have some construction that you frequently type, you can make it a template, complete with variables. It works the same. A keystroke generates the template at the point where you are typing. You can tab through the variables to name them differently as you wish, but renaming one variable renames it everywhere within that template -- until you start filling in the body, if it has a body.
Some people don't like the noise and complexity of modern IDEs and prefer a simple text editor.
Some people don't like the noise and complexity of a backhoe and prefer to dig a ditch using a shovel. And much more worser is that a backhow requires a bit of learning to use. Best to stick to the shovel.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by hendrikboom on Friday July 09 2021, @05:37PM (1 child)
And sometime the language has features that allow the compiler to expand the boilerplate instead of the editor. Yes, complete with proper handling of bound variables.
(Score: 2) by DannyB on Friday July 09 2021, @08:43PM
The compiler and language may already have ways of hiding various scopes of variables. The problem is that if I want to change the name of variables A and B to be named X and Y, I don't want to have to go change every instance of them by hand. How would a compiler let you do that to your source code?
In an IDE, I can click on a variable, ctrl-shift-R, then rename that variable, and the IDE precisely and exactly changes all occurrences of that variable and not any other identifiers that happen to have the same names but in other scopes or contexts. (like both a function, a variable, a class and a type all named A.) And if that variable is visible in other parts of the project, other files, it changes them there too! It is not some dumb search/replace. It is based on the compiler's understanding of the scope and visibility of that identifier throughout the entire project. The compiler and editor are deeply integrated.
I happen to use Eclipse. No matter what language I'm editing, the editor is integrated with the proper compiler or language server.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by HiThere on Friday July 09 2021, @06:04PM
If the IDE/editor ends up writing verbose text via a boiler-plate template, it's nearly as hard to read a month later as if you had written it by hand. And if you need to customize the variables...well, I've been known to miss some of those, or to do a replace that wasn't limited to the appropriate areas of text. Usually that causes an immediate error, but sometimes it's quite difficult to track down.
This I feel to be the appropriate use-case for templated/generic classes. But, of course, those can't handle all the cases that a template substitution can. OTOH, custom macros are REALLY dangerous. Used appropriately, they're very useful. Over used, or used inappropriately, and they render the program text nearly unreadable.
Back in the day (Fortran IV days) I once was really attracted to macro templated code. (Look at Mortran https://en.wikipedia.org/wiki/Mortran [wikipedia.org] or DYSTAL https://pubmed.ncbi.nlm.nih.gov/14284294/ [nih.gov] though DYSTAL was really more of a library) But they rendered the code unreadable by anyone else, and after awhile unreadable by me.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 1) by shrewdsheep on Friday July 09 2021, @03:54PM (1 child)
To some degree, it is a matter of opinion, so nothing really to disagree about.
As the topic seems to be somewhat sensitive, a definition would be appropriate as a starting point: if the information expressed in N lines of code can be given in N/K of code, I call the code boilerplate. For me, K is somewhere between 5 and 10. With this definition, your for loop would not count for me. My programming style focuses very much around the do not repeat yourself principle and small units of code. For me, a function containing 20 lines of code is long (well in high-level languages like R/python/perl, I do not manage in C++). The examples given for the Copilot, I would have factored out into smaller functions in most cases.
The Copilot goes the wrong way around, IMO. Instead of suggesting boilerplate code, the boilerplate code should be avoided altogether. One very concrete example is packaging where many tools exist to create skeletons (be it R/python/perl/whatever) for you. This is the wrong way round. The information needed is just the code (being inline documented) and a single dictionary containing required meta-information. Basically adding 10-20 lines of description to an existing code directory should allow you to create a package. From this the entire packaging can proceed. The actual package is always temporary code.
(Score: 2) by DannyB on Friday July 09 2021, @08:50PM
I think everyone is in favor of making things simple and less verbose. As simple as possible, but not any simpler.
Now how simple it should be depends on how big your projects are. Java is used for very large source code bases. Many diverse teams may write many different modules or libraries that end up in a single executable.
I think most languages could benefit from some form of IDE assistance to help you type out repetitive templates of code. I used the for() loop for an example, because the typical pattern of a for loop is to have a single variable that is referenced three times. You should only have to type in that variable name once, not three times. When you change the first occurrence of the variable, it should change the others, keystroke by keystroke.
The DRY principle is something I strongly embrace. But in a for() loop, you typically repeat the variable name three times. Or more times if you reference that variable within the body of the loop and not just the initialization, increment, and test of the loop construct. It sure is nice if I can change that variable name one time and have it change everywhere it is used within that loop construct.
I'm not against something like CoPilot, if it works well. But the copyright an license issues are a genuine concern.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by richtopia on Friday July 09 2021, @03:12PM (17 children)
I'll let lawyers argue over the legality here. I'm curious if the copilot is any good. I'm completely self-taught and only have a couple of scripting projects every year, so having something to look over my shoulder and guide me towards faster results and best practices would be awesome.
(Score: 2) by DannyB on Friday July 09 2021, @03:14PM (2 children)
I wonder if CoPilot helps you write code that is of the same quality as that generated by programmers who use google combined with copy/paste.
I'm sure you know the type.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by HiThere on Friday July 09 2021, @03:29PM (1 child)
Who doesn't do that? Every time I start programming in a new area I do that. Want to write a "thread-safe dequeue"? I break out the books, but then I also do a Google search for examples.
The thing is, you've got to analyze the examples to decide how good they are. Whenever I switch languages I hit google heavily for several weeks as I refresh my knowledge of how this particular language does that particular thing.
Javascript is what you use to allow unknown third parties to run software you have no idea about on your computer.
(Score: 2) by DannyB on Friday July 09 2021, @03:35PM
DING! DING!
That's the difference. If you can understand a solution you googled, and then write it in your own code, you've learned something. Making yourself a better programmer.
If you just copy/paste and it seems to work then you have learned nothing and are as bad or possibly even more dangerous than when you started programming.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by JoeMerchant on Friday July 09 2021, @03:50PM (12 children)
Look into static analysis tools - they are available for just about everything including bash scripts. You won't always want to follow their suggestions / squash their warnings, but once in a while they'll point something out that will make you go: Oh, hell, how did I ever do anything so bone-headed?
🌻🌻 [google.com]
(Score: 2) by DannyB on Friday July 09 2021, @08:53PM (11 children)
In an IDE, like I use, that analysis and warnings are done keystroke by keystroke.
If I change something that breaks a function in another file, then on the exact keystroke which does this, that file name turns red in the tree structure of files in the project.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by JoeMerchant on Saturday July 10 2021, @01:29AM (10 children)
We had a vendor's software bug turn out to be use of an uninitialized variable. My comment on the matter was that my editor flags those in realtime, not to mention the compiler warnings.
Ignoring warnings like that is how we end up with procedures demanding zero compiler warnings at max settings.
🌻🌻 [google.com]
(Score: 2) by DannyB on Monday July 12 2021, @01:39PM (9 children)
It should not be a warning. It should be a fatal error that prevents a successful compile.
If the language definition is going to allow uninitialized variables, then it should define what they actually get initialized to, and that should be something sane that follows the principle of least astonishment.
But I would strongly prefer uninitialized variables be a fatal error. If they programmer cannot be bothered to specify the initial value, leading to an undeterministic result, then maybe they are not that good of a programmer. If the language cannot prevent this from occurring, then maybe it's not that good of a language.
I remember forty years ago when I was using Pascal, and there were debates about how that language forced you to write code that was safe, the "bondage and discipline" language users pointed out the sad loss o an interplanetary mission (sorry, no longer remember which one) due to a type mismatch error in FORTRAN.
I simply cannot understand the mindset of people who think compilers, by default, should allow you to write unsafe code. Now I don't have a problem with having some kind of declaration or annotation around a block of lines, or module to say "I know what I'm doing, leave me alone and compile this". But that should be the exception, not the rule.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by JoeMerchant on Monday July 12 2021, @02:45PM (3 children)
I agree, in principle. In practice: programmers are human, as are code reviewers, testers, especially managers, and the rest of us. It happens, which is why we now have a procedure to document checking for it. People are still human, legend has it that there was a documented procedure for the Space Shuttle that required no less than 50 people to sign off that a support beam was removed from the cargo bay before the shuttle was rotated to vertical position. Nonetheless, after 50 people had signed off that the beam was removed, it wasn't, the shuttle was rotated vertical, the beam fell and did millions in damage and weeks in schedule slip.
When our procedure fails to catch the next one, we will up the game to require all compilers to be set with warnings as errors, but that's still no guarantee...
🌻🌻 [google.com]
(Score: 2) by DannyB on Monday July 12 2021, @03:16PM (2 children)
If the compiler checks for it, and it is a fatal error, then problem solved! Us poor fallible humans will get a message we cannot ignore when our program does not compile. This compile error will not make it to the review or testing stage.
The compiler is your first line of defense! Actually it is the language that is the first line of defense. The language should simply make it impossible to do things that have no possible meaning. All variables must be initialized.
About unit testing: the compiler is also your first line of unit testing. If it won't compile, it fails the first line of tests. No need to write all sorts of silly unit tests to check things the compiler should have checked. I always laugh at that for some languages where people write unit tests for things the compiler should have checked.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by JoeMerchant on Monday July 12 2021, @04:03PM (1 child)
Nothing is idiot proof. Never underestimate the ability of idiots to circumvent safety mechanisms.
🌻🌻 [google.com]
(Score: 3, Interesting) by DannyB on Monday July 12 2021, @04:31PM
You can write bad code in any language. However it doesn't hurt for a language to have safety so that fallible humans don't make silly mistakes. Uninitialized variables are an excellent example of something that doesn't make sense. The compiler should be able to prove that you are accessing a variable prior to assigning it a value.
I'm not arguing that the compiler should try to deeply analyze the thought process of your code, how it works, and then be a critic. Just don't allow common mistakes, especially when they don't have any sensible meaning.
We could all program in assembly. Or in C. I strongly suspect there is an economic reason why we don't all program in C or assembly. And I further suspect that economic reasoning has to do with both productivity and safety. And safety is also a form of productivity and testing.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by JoeMerchant on Monday July 12 2021, @02:49PM (4 children)
Oh, I know the type. The ones that hand optimize the assembly because what they are doing is so simple and compact that they think that optimizing the last 0.00001% of performance into their code is worth the risk, they're smart guys, they know what they're doing. I've met them in implantable medical devices (and watched them repeatedly backpedal obvious over-optimizations only after they were thrown in their face as severe real world problems), I've met them in high speed motor controls, I'm sure they're out there in a lot of industries.
🌻🌻 [google.com]
(Score: 2) by DannyB on Monday July 12 2021, @03:30PM (3 children)
If it is mission critical to hand optimize something in assembly to the last possible clock cycle, then do that. Whatever it costs.
In reality there are few, if any, cases where that is actually mission critical.
In the 21st century hardware is way, way cheaper than developer time. Yes, I know this wasn't always true. Once computers were expensive and developers were cheap. Thus is made huge economic sense to have developers optimize as much as possible to get best economic use of the hardware. Now hardware is dirt cheap. Developers are very expensive. For most things in real life, it is cheaper to just better hardware instead of do that optimization.
Also compiler optimization has come a long, long way since the 1970s.
In those edge cases, if they actually even exist, where it is mission critical to optimize the last possible cpu cycle, then do so. Because cost comes secondary to accomplishing the mission.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by JoeMerchant on Monday July 12 2021, @04:11PM (2 children)
In the implantable devices, the perennial excuse was extension of battery life. Approximate real world battery life was maybe 3.5 years, advertised battery life under the clearly unrealistic specified conditions was 7 years. They would do boneheaded things like have an 8 bit checksum on a communication which was estimated to add 2 weeks to that 7 year figure as compared to a 16 bit checksum. Then the 8 bit checksum would allow painful (and unapproved) levels of stimulation to be programmed in error, with dozens of reports from the field, and they implemented a programmer side patch that ate 6 weeks off that 7 year figure. They failed to thermally compensate the battery voltage readings because the extra computation would shave 3 weeks off the 7 year figure, justification being: it's implanted, temperature is stable around body temperature. Yeah, well, geniuses, before it gets implanted it does a battery check on itself and reports itself dead if it has been stored below 50F, which happens - a lot - in the real world. At least that one got caught in validation testing.
The motors - they were contractors, I can only imagine what internal process led them to save those two bytes of code required to initialize the variable.
🌻🌻 [google.com]
(Score: 2) by DannyB on Monday July 12 2021, @04:36PM (1 child)
You mention a very specific use case here.
In a medical device, I strongly suspect that safety is one of the highest priorities.
Hopefully the device has an opportunity to report itself dead prior to reaching the operating table.
Universal health care is so complex that only 32 of 33 developed nations have found a way to make it work.
(Score: 2) by JoeMerchant on Monday July 12 2021, @06:34PM
That was the problem, the device was reporting itself dead because it was (willfully) ignorant of the thermal effect on its battery voltage - willfully ignorant in the name of saving a few nanojoules of energy.
🌻🌻 [google.com]
(Score: 0) by Anonymous Coward on Friday July 09 2021, @09:29PM
Oh yeeeeaaaah, and you could use it on your Windows PC!