Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 17 submissions in the queue.
posted by LaminatorX on Tuesday November 18 2014, @08:47AM   Printer-friendly
from the Where's-John-Katz? dept.

Every year the works of thousands of authors enter the public domain, but only a small percentage of these end up being widely available. So how do organizations such as Project Gutenberg choose which works to focus on? Allen Riddell has developed an algorithm that automatically generates an independent ranking of notable authors for any given year. It is then a simple task to pick the works to focus on or to spot notable omissions from the past. Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.

Riddell’s algorithm begins with the Wikipedia entries of all authors in the English language edition (PDF)—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on. This produces a “public domain ranking” of all the authors that appear on Wikipedia. For example, the author Virginia Woolf has a ranking of 1,081 out of 1,011,304 while the Italian painter Giuseppe Amisani, who died in the same year as Woolf, has a ranking of 580,363. So Riddell’s new ranking clearly suggests that organizations like Project Guttenberg should focus more on digitizing Woolf’s work than Amisani’s. Of the individuals who died in 1965 and whose work will enter the public domain next January in many parts of the world, the new algorithm picks out TS Eliot as the most highly ranked individual. Others highly ranked include Somerset Maugham, Winston Churchill, and Malcolm X.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by mcgrew on Sunday November 23 2014, @12:07PM

    by mcgrew (701) <publish@mcgrewbooks.com> on Sunday November 23 2014, @12:07PM (#119076) Homepage Journal

    I bad writing ever notable? James Patterson is a vary popular hack, but he doesn't write well at all.

    --
    mcgrewbooks.com mcgrew.info nooze.org
    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2