Stories
Slash Boxes
Comments

SoylentNews is people

posted by LaminatorX on Tuesday November 18 2014, @08:47AM   Printer-friendly
from the Where's-John-Katz? dept.

Every year the works of thousands of authors enter the public domain, but only a small percentage of these end up being widely available. So how do organizations such as Project Gutenberg choose which works to focus on? Allen Riddell has developed an algorithm that automatically generates an independent ranking of notable authors for any given year. It is then a simple task to pick the works to focus on or to spot notable omissions from the past. Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.

Riddell’s algorithm begins with the Wikipedia entries of all authors in the English language edition (PDF)—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on. This produces a “public domain ranking” of all the authors that appear on Wikipedia. For example, the author Virginia Woolf has a ranking of 1,081 out of 1,011,304 while the Italian painter Giuseppe Amisani, who died in the same year as Woolf, has a ranking of 580,363. So Riddell’s new ranking clearly suggests that organizations like Project Guttenberg should focus more on digitizing Woolf’s work than Amisani’s. Of the individuals who died in 1965 and whose work will enter the public domain next January in many parts of the world, the new algorithm picks out TS Eliot as the most highly ranked individual. Others highly ranked include Somerset Maugham, Winston Churchill, and Malcolm X.

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 0) by Anonymous Coward on Tuesday November 18 2014, @03:20PM

    by Anonymous Coward on Tuesday November 18 2014, @03:20PM (#117244)

    from TFA

    Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.

    Many of the works that the "world focuses on" are likely to be books frequently assigned in literature and history classes in high school and college. The people doing the "focusing" are kids that don't want to shell out to the local B&N or Amazon.com for the books.

  • (Score: 2) by mcgrew on Tuesday November 18 2014, @03:43PM

    by mcgrew (701) <publish@mcgrewbooks.com> on Tuesday November 18 2014, @03:43PM (#117262) Homepage Journal

    Also, works "the world focuses on" includes best-selling hacks like James Patterson.

    --
    mcgrewbooks.com mcgrew.info nooze.org
    • (Score: 2) by Thexalon on Tuesday November 18 2014, @04:18PM

      by Thexalon (636) on Tuesday November 18 2014, @04:18PM (#117280)

      It said quite clearly "The most notable", not "The best" (which is even harder to figure out). I wouldn't be surprised if it turned out that more people today read Stephanie Meyer or Tom Clancy than have even touched anything written by Goethe or Proust or Jonathan Swift.

      --
      The only thing that stops a bad guy with a compiler is a good guy with a compiler.
      • (Score: 2) by mcgrew on Sunday November 23 2014, @12:07PM

        by mcgrew (701) <publish@mcgrewbooks.com> on Sunday November 23 2014, @12:07PM (#119076) Homepage Journal

        I bad writing ever notable? James Patterson is a vary popular hack, but he doesn't write well at all.

        --
        mcgrewbooks.com mcgrew.info nooze.org