Torching the Modern-Day Library of Alexandria - “Somewhere at Google there is a database containing 25 million books and nobody is allowed to read them.”
https://www.theatlantic.com/technology/archive/2017/04/the-tragedy-of-google-books/523320/ [theatlantic.com]
>Google’s secret effort to scan every book in the world, codenamed “Project Ocean,” began in earnest in 2002 ... In just over a decade, after making deals with Michigan, Harvard, Stanford, Oxford, the New York Public Library, and dozens of other library systems, the company, outpacing Page’s prediction, had scanned about 25 million books. It cost them an estimated $400 million. It was a feat not just of technology but of logistics.
>Every weekday, semi trucks full of books would pull up at designated Google scanning centers ... The books were unloaded from the trucks onto the kind of carts you find in libraries and wheeled up to human operators sitting at one of a few dozen brightly lit scanning stations, arranged in rows about six to eight feet apart. What made the system so efficient is that it left so much of the work to software. Rather than make sure that each page was aligned perfectly, and flattened, before taking a photo, which was a major source of delays in traditional book-scanning systems, cruder images of curved pages were fed to de-warping algorithms
>Authors Guild was a class action lawsuit, and the class included everyone who held an American copyright in one or more books ... What became known as the Google Books Search Amended Settlement Agreement came to 165 pages and more than a dozen appendices. It took two and a half years to hammer out the details. Sarnoff described the negotiations as “four-dimensional chess” between the authors, publishers, libraries, and Google. “Everyone involved,” he said to me, “and I mean everyone—on all sides of this issue—thought that if we were going to get this through, this would be the single most important thing they did in their careers.”
>Under the agreement, Google would be able to preview up to 20 percent of a given book to entice individual users to buy, and it would be able to offer downloadable copies for sale, with the prices determined by an algorithm or by the individual rightsholder, in price bins initially ranging from $1.99 to $29.99. All the out-of-print books would be packaged into an “institutional subscription database” that would be sold to universities, where students and faculty could search and read the full collection for free ... Google would be given wide latitude to display and sell their books, but in return, 63 percent of the revenues would go into escrow with a new entity called the Book Rights Registry
>Sorting out the details had taken years of litigation and then years of negotiation, but now, in 2011, there was a plan—a plan that seemed to work equally well for everyone at the table. As Samuelson, the Berkeley law professor, put it in a paper at the time, “The proposed settlement thus looked like a win-win-win: the libraries would get access to millions of books, Google would be able to recoup its investment in GBS, and authors and publishers would get a new revenue stream from books that had been yielding zero returns. And legislation would be unnecessary to bring about this result.”
>... however, the legal agreement that would have unlocked a century’s worth of books and peppered the country with access terminals to a universal library was rejected under Rule 23(e)(2) of the Federal Rules of Civil Procedure by the U.S. District Court for the Southern District of New York.
>Amazon, for its part, worried that the settlement allowed Google to set up a bookstore that no one else could. Anyone else who wanted to sell out-of-print books, they argued, would have to clear rights on a book-by-book basis, which was as good as impossible, whereas the class action agreement gave Google a license to all of the books at once. This objection got the attention of the Justice Department, in particular the Antitrust division, who began investigating the settlement. The DOJ objections left the settlement in a double bind: Focus the deal on Google and you get accused of being anticompetitive. Try to open it up and you get accused of stretching the law governing class actions. In the end, the DOJ’s intervention likely spelled the end of the settlement agreement. No one is quite sure why the DOJ decided to take a stand instead of remaining neutral. In his ruling concluding that the settlement was not “fair, adequate, and reasonable” under the rules governing class actions, Judge Denny Chin recited the DOJ’s objections and suggested that to fix them, you’d either have to change the settlement to be an opt-in arrangement—which would render it toothless—or try to accomplish the same thing in Congress.
>When the settlement failed, they pointed to proposals by the U.S. Copyright Office recommending legislation ... Of course, nearly a decade later, nothing of the sort has actually happened. Despite eventually winning Authors Guild v. Google, and having the courts declare that displaying snippets of copyrighted books was fair use, the company all but shut down its scanning operation.
>It was strange to me, the idea that somewhere at Google there is a database containing 25-million books and nobody is allowed to read them. It’s like that scene at the end of the first Indiana Jones movie where they put the Ark of the Covenant back on a shelf somewhere, lost in the chaos of a vast warehouse. It’s there. The books are there. People have been trying to build a library like this for ages—to do so, they’ve said, would be to erect one of the great humanitarian artifacts of all time—and here we’ve done the work to make it real and we were about to give it to the world and now, instead, it’s 50 or 60 petabytes on disk, and the only people who can see it are half a dozen engineers on the project who happen to have access because they’re the ones responsible for locking it up.
>I asked someone who used to have that job, what would it take to make the books viewable in full to everybody? I wanted to know how hard it would have been to unlock them. What’s standing between us and a digital public library of 25 million volumes?
>You’d get in a lot of trouble, they said, but all you’d have to do, more or less, is write a single database query. You’d flip some access control bits from off to on. It might take a few minutes for the command to propagate.
>When the library at Alexandria burned it was said to be an “international catastrophe.” When the most significant humanities project of our time was dismantled in court, the scholars, archivists, and librarians who’d had a hand in its undoing breathed a sigh of relief, for they believed, at the time, that they had narrowly averted disaster.