from the acquisition-for-30x-annual-recurring-revenue dept.
The Free Software Foundation (FSF) has published five of the white papers it funded regarding questions about Microsoft Copilot. After Microsoft acquired GitHub, it set up a machine learning system to cull through its archive of software, called Copilot. The approach chosen and even the basic activity raises many questions starting with those of licensing.
Microsoft GitHub's announcement of an AI-driven Service as a Software Substitute (SaaSS) program called Copilot -- which uses machine learning to autocomplete code for developers as they write software -- immediately raised serious questions for the free software movement and our ability to safeguard user and developer freedom. We felt these questions needed to be addressed, as a variety of serious implications were foreseen for the free software community and developers who use GitHub. These inquiries -- and others possibly yet to be discovered -- needed to be reviewed in depth.
In our call for papers, we set forth several areas of interest. Most of these areas centered around copyright law, questions of ownership for AI-generated code, and legal impacts for GitHub authors who use a GNU or other copyleft license(s) for their works. We are pleased to announce the community-provided research into these areas, and much more.
First, we want to thank everyone who participated by sending in their papers. We received a healthy response of twenty-two papers from members of the community. The papers weighed-in on the multiple areas of interest we had indicated in our announcement. Using an anonymous review process, we concluded there were five papers that would be best suited to inform the community and foster critical conversations to help guide our actions in the search for solutions.
These five submissions are not ranked, and we decided it best to just let the papers speak for themselves. The papers contain opinions with which the Free Software Foundation (FSF) may or may not agree, and any views expressed by the authors do not necessarily represent the FSF. They were selected because we thought they advanced discussion of important questions, and did so clearly. To that end, the FSF is not providing any summaries of the papers or elaborating on our developing positions until we can learn further, through the community, how best to view the situation.
The FSF has also arranged upcoming discussions regarding these white papers. Microsoft bought GitHub in 2018 for $7.5 billion in stock, which if it had been real money instead it would have been 30 times the annual recurring revenue brought in by GitHub.
(2021) GitHub's Automatic Coding Tool Rests on Untested Legal Ground
(2020) GitHub Revamps Copyright Takedown Policy After Restoring YouTube-dl
(2018) Microsoft Agrees to Acquire GitHub... for $7.5 Billion [Updated]
(2014) Atom, GitHub's Editor Now Open Source
Today, we're excited to announce that we are open-sourcing Atom under the MIT License. We see Atom as a perfect complement to GitHub's primary mission of building better software by working together. Atom is a long-term investment, and GitHub will continue to support its development with a dedicated team going forward. But we also know that we can't achieve our vision for Atom alone. As Emacs and Vim have demonstrated over the past three decades, if you want to build a thriving, long-lasting community around a text editor, it has to be open source.
I have been using the Atom beta as my primary editor for the past few weeks and have been very happy with it.
It is currently only available for the mac, but it is based on Chromium and Node, and "Windows and Linux releases are on the roadmap."
[Update 20180604 @ 14:00 UTC: Acquisition confirmed. Microsoft is paying $7.5 billion in stock. Coverage at Microsoft, Security Week, The Register, and The Verge. Also, see the Microsoft blog post. --martyb]
Microsoft has reportedly acquired GitHub, and could announce the deal as early as Monday. Bloomberg reports that the software giant has agreed to acquire GitHub, and that the company chose Microsoft partly because of CEO Satya Nadella. Business Insider first reported that Microsoft had been in talks with GitHub recently.
Time to move off GitHub?
Previously: Microsoft Holds Acquisition Talks with Github
An AC also submitted Bloomberg's article.
GitHub Revamps Copyright Takedown Policy After Restoring YouTube-dl
The source code for YouTube-dl, a tool you can use to download videos from YouTube, is back up on GitHub after the code repository took it down in October following a DMCA complaint from the Recording Industry Association of America (RIAA). Citing a letter from the Electronic Frontier Foundation (the EFF), GitHub says it ultimately found that the RIAA's complaint didn't have any merit.
This is the best possible outcome of the RIAA's attack on youtube-dl. Good on @GitHub for standing up for developers against DMCA § 1201 abuses.
— Filippo Valsorda 💚🤍❤️ ✊ (@FiloSottile) November 16, 2020
If there's a silver lining to the episode, it's that GitHub is implementing new policies to avoid a repeat of a repeat situation moving forward. [...]
GitHub is also establishing a $1 million defense fund to provide legal aid to developers against suspect section 1201 claims, as well as doubling down on its lobbying work to amend the DMCA and other similar copyright laws across the world.
The Copilot tool has been trained on mountains of publicly available code
[...] When GitHub announced Copilot on June 29, the company said that the algorithm had been trained on publicly available code posted to GitHub. Nat Friedman, GitHub’s CEO, has written on forums like Hacker News and Twitter that the company is legally in the clear. “Training machine learning models on publicly available data is considered fair use across the machine learning community,” the Copilot page says.
But the legal question isn’t as settled as Friedman makes it sound — and the confusion reaches far beyond just GitHub. Artificial intelligence algorithms only function due to massive amounts of data they analyze, and much of that data comes from the open internet. An easy example would be ImageNet, perhaps the most influential AI training dataset, which is entirely made up of publicly available images that ImageNet creators do not own. If a court were to say that using this easily accessible data isn’t legal, it could make training AI systems vastly more expensive and less transparent.
Despite GitHub’s assertion, there is no direct legal precedent in the US that upholds publicly available training data as fair use, according to Mark Lemley and Bryan Casey of Stanford Law School, who published a paper last year about AI datasets and fair use in the Texas Law Review.
[...] And there are past cases to support that opinion, they say. They consider the Google Books case, in which Google downloaded and indexed more than 20 million books to create a literary search database, to be similar to training an algorithm. The Supreme Court upheld Google’s fair use claim, on the grounds that the new tool was transformative of the original work and broadly beneficial to readers and authors.