Stories
Slash Boxes
Comments

SoylentNews is people

posted by LaminatorX on Tuesday November 18 2014, @08:47AM   Printer-friendly [Skip to comment(s)]
from the Where's-John-Katz? dept.

Every year the works of thousands of authors enter the public domain, but only a small percentage of these end up being widely available. So how do organizations such as Project Gutenberg choose which works to focus on? Allen Riddell has developed an algorithm that automatically generates an independent ranking of notable authors for any given year. It is then a simple task to pick the works to focus on or to spot notable omissions from the past. Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.

Riddell’s algorithm begins with the Wikipedia entries of all authors in the English language edition (PDF)—more than a million of them. His algorithm extracts information such as the article length, article age, estimated views per day, time elapsed since last revision, and so on. This produces a “public domain ranking” of all the authors that appear on Wikipedia. For example, the author Virginia Woolf has a ranking of 1,081 out of 1,011,304 while the Italian painter Giuseppe Amisani, who died in the same year as Woolf, has a ranking of 580,363. So Riddell’s new ranking clearly suggests that organizations like Project Guttenberg should focus more on digitizing Woolf’s work than Amisani’s. Of the individuals who died in 1965 and whose work will enter the public domain next January in many parts of the world, the new algorithm picks out TS Eliot as the most highly ranked individual. Others highly ranked include Somerset Maugham, Winston Churchill, and Malcolm X.

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by stormwyrm on Tuesday November 18 2014, @09:17AM

    by stormwyrm (717) Subscriber Badge on Tuesday November 18 2014, @09:17AM (#117165) Journal

    Alas, this only applies in those countries where the 'life + 70 years' rule applies. This of course doesn't include the United States, where they have works published between 1923 to 1978 having 95 years from date of publication. The Life + 70 rule only applies to works published after 1978. Meaning "Steamboat Willie" should enter the Public Domain in 2018, but I'm pretty damn sure that Disney and their compatriots are even today hard at work lobbying for yet another copyright term extension law that will maintain the 1923 event horizon for another few decades. Eldred v. Ashcroft [wikipedia.org] failed to stop this nonsense, sadly.

    --
    Numquam ponenda est pluralitas sine necessitate.
    • (Score: 0) by Anonymous Coward on Tuesday November 18 2014, @10:39PM

      by Anonymous Coward on Tuesday November 18 2014, @10:39PM (#117433)

      The Sono Bono Copyright Act doesn't extend copyright forever, it extends it for a finite time. I don't doubt that Disney will press for yet another extension, but for copyright not to last for "a limited time" would be plainly unconstitutional. But you know, 10,000 years is "a limited time".

      So what we actually have is that US works do enter the public, however they are older works than those which enter the public domain in other parts of the world.

      • (Score: 0) by Anonymous Coward on Wednesday November 19 2014, @12:14AM

        by Anonymous Coward on Wednesday November 19 2014, @12:14AM (#117460)

        Nothing newer than 1923 has entered the public domain in the US, thanks to Sonny Bono. And if Disney et. al. have anything to say about it, they'll just keep paying Congress to create laws that extend this further and further. Before 2018 I'm willing to bet that we'll see yet another copyright term extension act at the behest of these bastards. A copyright term that lasted 10,000 years might be argued as being practically unlimited though: 10,000 years is almost twice as long as all of recorded history! A copyright term of 100 years is nearly half the age of the United States as a country, and ought to be untenable on those grounds as well, if the government wasn't so beholden to these vested interests.

  • (Score: 2) by wonkey_monkey on Tuesday November 18 2014, @09:29AM

    by wonkey_monkey (279) on Tuesday November 18 2014, @09:29AM (#117170) Homepage

    Machine-Learning Algorithm

    Which part of the algorithm involves "machine-learning"?

    --
    systemd is Roko's Basilisk
  • (Score: 0) by Anonymous Coward on Tuesday November 18 2014, @09:31AM

    by Anonymous Coward on Tuesday November 18 2014, @09:31AM (#117172)

    When I read the dept. line, I did LOL.
    Any other old timers from the other site have a reaction to that?

    -- gewg_

  • (Score: 0) by Anonymous Coward on Tuesday November 18 2014, @03:17PM

    by Anonymous Coward on Tuesday November 18 2014, @03:17PM (#117243)

    Is there any reason given as to why one should care about this algorithm? Everybody and their brother can put together ranking algorithms for just about anything. What is so special about this that one should give it any more weight over any other?

  • (Score: 0) by Anonymous Coward on Tuesday November 18 2014, @03:20PM

    by Anonymous Coward on Tuesday November 18 2014, @03:20PM (#117244)

    from TFA

    Riddell’s approach is to look at what kind of public domain content the world has focused on in the past and then use this as a guide to find content that people are likely to focus on in the future.

    Many of the works that the "world focuses on" are likely to be books frequently assigned in literature and history classes in high school and college. The people doing the "focusing" are kids that don't want to shell out to the local B&N or Amazon.com for the books.

    • (Score: 2) by mcgrew on Tuesday November 18 2014, @03:43PM

      by mcgrew (701) <publish@mcgrewbooks.com> on Tuesday November 18 2014, @03:43PM (#117262) Homepage Journal

      Also, works "the world focuses on" includes best-selling hacks like James Patterson.

      --
      Free Martian whores! [mcgrewbooks.com]
      • (Score: 2) by Thexalon on Tuesday November 18 2014, @04:18PM

        by Thexalon (636) on Tuesday November 18 2014, @04:18PM (#117280)

        It said quite clearly "The most notable", not "The best" (which is even harder to figure out). I wouldn't be surprised if it turned out that more people today read Stephanie Meyer or Tom Clancy than have even touched anything written by Goethe or Proust or Jonathan Swift.

        --
        The inverse of "I told you so" is "Nobody could have predicted"
  • (Score: 0) by Anonymous Coward on Tuesday November 18 2014, @10:54PM

    by Anonymous Coward on Tuesday November 18 2014, @10:54PM (#117435)

    It's not hard at all to find dirt-cheap dead tree editions of notable books, reproductions of artwork, musical scores or recordings of notable musical compositions. To the extent they are hard to purchase they are easy to find in libraries.

    There are for example, the Penguin Classics, Dover has a lot of classic literature, attractively hardbound volumes of The Complete Works of William Shakespeare can be had for a reasonable price.

    Now try to find plays written by any of Shakespeare's contemporaries. I read Faust in high school, but I've never so much as heard of any other Elizabethan playwrights than Shakespeare and whoever wrote Faust - yet theater was wildly popular back then.

    While yes more people want to read Shakespeare or listen to Mozart or Beethoven, but their work isn't likely ever to be forgotten to history. There is an argument for preserving that which might otherwise be totaly lost.