Every year the works of thousands of authors enter the public domain, but only a small percentage of these end up being widely available. So how do organizations such as Project Gutenberg choose which works to focus on? Allen Riddell has developed an algorithm that automatically generates an independent ranking of notable authors for any given year. It is then a simple task to pick the works to focus on or to spot notable omissions from the past. Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.
Riddell’s algorithm begins with the Wikipedia entries of all authors in the English language edition (PDF)—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on. This produces a “public domain ranking” of all the authors that appear on Wikipedia. For example, the author Virginia Woolf has a ranking of 1,081 out of 1,011,304 while the Italian painter Giuseppe Amisani, who died in the same year as Woolf, has a ranking of 580,363. So Riddell’s new ranking clearly suggests that organizations like Project Guttenberg should focus more on digitizing Woolf’s work than Amisani’s. Of the individuals who died in 1965 and whose work will enter the public domain next January in many parts of the world, the new algorithm picks out TS Eliot as the most highly ranked individual. Others highly ranked include Somerset Maugham, Winston Churchill, and Malcolm X.
(Score: 0) by Anonymous Coward on Tuesday November 18 2014, @03:20PM
from TFA
Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.
Many of the works that the "world focuses on" are likely to be books frequently assigned in literature and history classes in high school and college. The people doing the "focusing" are kids that don't want to shell out to the local B&N or Amazon.com for the books.
(Score: 2) by mcgrew on Tuesday November 18 2014, @03:43PM
Also, works "the world focuses on" includes best-selling hacks like James Patterson.
mcgrewbooks.com mcgrew.info nooze.org
(Score: 2) by Thexalon on Tuesday November 18 2014, @04:18PM
It said quite clearly "The most notable", not "The best" (which is even harder to figure out). I wouldn't be surprised if it turned out that more people today read Stephanie Meyer or Tom Clancy than have even touched anything written by Goethe or Proust or Jonathan Swift.
The only thing that stops a bad guy with a compiler is a good guy with a compiler.
(Score: 2) by mcgrew on Sunday November 23 2014, @12:07PM
I bad writing ever notable? James Patterson is a vary popular hack, but he doesn't write well at all.
mcgrewbooks.com mcgrew.info nooze.org