Stories
Slash Boxes
Comments

SoylentNews is people

posted by takyon on Thursday May 07 2015, @04:53AM   Printer-friendly
from the infinite-monkeys dept.

In 1941, Jorge Luis Borge wrote The Library of Babel, a story which described an almost infinite library containing every possible combination of letters in a vast collection of 410-page books.

Jonathon Basile has spent six months learning how to make a virtual version that can generate every possible page of 3,200 characters:

The Library currently allows users to choose from about 104677 potential books. The site also features a search tool, which allows users to retrieve the location in the library of any known page of text. Any individual page of Hamlet or the Bible can be found in the library, but the possibility of finding any other page from the same work in the same volume is vanishingly small.

While the library contains every possible page, it does not yet hold every possible combination of those pages. If this restriction were lifted, Basile explains on the site, the library would house "every book that ever has been written, and every book that ever could be – including every play, every song, every scientific paper, every legal decision, every constitution, every piece of scripture, and so on".

Basile evokes the comprehensive nature of the library's "blind volumes", saying: "To take a recent example, the confidential documents leaked by Edward Snowden... will be there somewhere. It's only a matter of knowing where to look for them."

This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 5, Insightful) by Anonymous Coward on Thursday May 07 2015, @05:12AM

    by Anonymous Coward on Thursday May 07 2015, @05:12AM (#179773)

    By the same token, the library would also contain, say, documents purporting to be from Edward Snowden that say something other than what he said. You'd also have infinitely many corrupt copies of the Bible that say everything and anything. And every scientific paper, including every possible falsified variation. And many, many more that simply contain nonsense. It would be like trying to verify whether the decryption of a one-time pad is actually true or not.

    • (Score: 2) by Jeremiah Cornelius on Thursday May 07 2015, @03:03PM

      by Jeremiah Cornelius (2785) on Thursday May 07 2015, @03:03PM (#179939) Journal

      I suggest reading the Borges [wikipedia.org]. His work seems to be the only sub-topic of the story that has gone without comment in our Soylent thread.

      He's amazing and odd. To the unfamiliar I would describe him as being a bit like Lovecraft without monsters, or Philip K Dick without technologies. Much of this is meta-fiction: nested stories about stories - implying recursions and variations. I can hardly imagine that a movie like pi was conceived, without a familiarity of the later Borges stories.

      I suggest at least Labyrinths [wikipedia.org] as a collection for Soylentils. It contains the marvelous Tlon, Uqbar, Orbis Tertius [wikipedia.org], among others. A topical summary of the stories contained is available here, [wsu.edu] but doesn't really do justice and may be "spoilers", if such are possible.

      --
      You're betting on the pantomime horse...
  • (Score: 3, Insightful) by Anonymous Coward on Thursday May 07 2015, @05:28AM

    by Anonymous Coward on Thursday May 07 2015, @05:28AM (#179776)

    I don't see any page in a script other than latin.
    So much for "any known page of text".

    • (Score: 0) by Anonymous Coward on Thursday May 07 2015, @07:42AM

      by Anonymous Coward on Thursday May 07 2015, @07:42AM (#179796)
      Doesn't even include capital letters, digits or special characters. So my passwords aren't even there :p.

      What next to do is to search for:
      "aaaaaaaaaaaaa.....aaaaaaaaaaaaaaaba"
      "aaaaaaaaaaaaa.....aaaaaaaaaaaaaaabb"
      ...
      "aaaaaaaaaaaaa.....aaaaaaaaaaaaaaabz"
      "aaaaaaaaaaaaa.....aaaaaaaaaaaaaaab,"
      "aaaaaaaaaaaaa.....aaaaaaaaaaaaaaab."
      Then search for
      "aaaaaaaaaaaaa.....aaaaaaaaaaaaaaab"

      And see if the second search turns up an exact match page in the first set. ;)

      Coz I wonder if the guy is not just having a program make BS up on the fly, insert the search key in and then memorize the search key + salt/seed to make sure that repeated searches produce the same BS pages.

      If he isn't, then yes I can see why he took 6 months to do it. Otherwise...
      • (Score: 3, Funny) by maxwell demon on Thursday May 07 2015, @08:27AM

        by maxwell demon (1608) on Thursday May 07 2015, @08:27AM (#179805) Journal

        I can do better than him: I can write you a software that gives you any possible text of the world by just entering its index. To save space, I decided to encode the index not as digit string, but as a sequence of Unicode UTF-8 characters. And for simplicity I decided to use the Unicode UTF-8 encoding of that text as index.

        So basically, my program takes UTF-8 as input, and gives the same UTF-8 back. Voila, the infinite library.

        --
        The Tao of math: The numbers you can count are not the real numbers.
        • (Score: 0) by Anonymous Coward on Thursday May 07 2015, @11:15AM

          by Anonymous Coward on Thursday May 07 2015, @11:15AM (#179846)
          I think the real supposed trick is his library is searchable. It can supposedly give you the book and page that contains the string you search for.
          • (Score: 2) by TK on Thursday May 07 2015, @04:51PM

            by TK (2760) on Thursday May 07 2015, @04:51PM (#179980)

            How many pages are the rough equivalent of:

            alas, poor yorick. i knew him, horatio, a fellow of infinite jest, of most excellent fancy.aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

            So I have to search for the entire 3200 character string I'm looking for in order to find the one page that contains that exact combination. The character set is a-z, full stop (.), comma (,), and space ( ). 29 characters in all.

            If I searched for pages of Hamlet that contain those two sentences, I would have to sort through 29^(3200-91) pages. to find the page that has the entire rest of text after it.

            --
            The fleas have smaller fleas, upon their backs to bite them, and those fleas have lesser fleas, and so ad infinitum
  • (Score: 2, Funny) by Anonymous Coward on Thursday May 07 2015, @05:30AM

    by Anonymous Coward on Thursday May 07 2015, @05:30AM (#179777)

    Why, the infinite library is positively filthy. The smut is overpowering. Finding any useful information would be like finding a specific drop of cum in a pile of faggots. Ban it. All of it.

    • (Score: 2) by TK on Thursday May 07 2015, @04:57PM

      by TK (2760) on Thursday May 07 2015, @04:57PM (#179981)

      There are definitely some pages that are exact matches of existing ASCII porn. Probably not any of the good stuff, since special characters aren't included, but enough to get you through your rough spot.

      --
      The fleas have smaller fleas, upon their backs to bite them, and those fleas have lesser fleas, and so ad infinitum
    • (Score: 2) by FatPhil on Thursday May 07 2015, @06:49PM

      by FatPhil (863) <pc-soylentNO@SPAMasdf.fi> on Thursday May 07 2015, @06:49PM (#180021) Homepage
      There's trashy kiddy porn furry fiction in the library too. (Including some which, even more disguistingly, is Star Wars themed.)

      Burn it down!
      --
      Great minds discuss ideas; average minds discuss events; small minds discuss people; the smallest discuss themselves
  • (Score: 3, Interesting) by jasassin on Thursday May 07 2015, @06:01AM

    by jasassin (3566) <jasassin@gmail.com> on Thursday May 07 2015, @06:01AM (#179780) Homepage Journal

    Even the library containing all of the Brunnen G history was eventually destroyed.

    There's even an episode where the last of some earth like civilization is saying what he has to say to the universe, before the Lexx plows him to smithereens in the middle of space. That simple two seconds of video made me laugh at how pitiful we really are on a cosmic scale.

    If you've never seen Lexx, check it out. First two seasons are the best.

    --
    jasassin@gmail.com GPG Key ID: 0xE6462C68A9A3DB5A
    • (Score: 2) by aristarchus on Thursday May 07 2015, @06:47AM

      by aristarchus (2645) on Thursday May 07 2015, @06:47AM (#179785) Journal

      Brunnen G? I always thought it was the Bruenen Gee. But point is well taken. And if I may add another equally or even more irrelevant Sci-Fi Show: one episode the Andromeda featured the "collectors", who seemed to be archiving everything that had happened in the past. The problem with veridical copies, however, is that once into them, they are reality? So we can only hope that the past remains dead, a dessicated corpse over which we theorize. Else, we go there and all vote for Marco Rubio! Noooooo!

  • (Score: 1, Funny) by Anonymous Coward on Thursday May 07 2015, @06:06AM

    by Anonymous Coward on Thursday May 07 2015, @06:06AM (#179781)

    Slander, libel, hate speech, kiddy porn, state secrets...

    This guys must be locked up and the key thrown away!

    • (Score: 3, Touché) by sigma on Thursday May 07 2015, @06:38AM

      by sigma (1225) on Thursday May 07 2015, @06:38AM (#179784)

      The location of the key is in there too.

    • (Score: 2) by DeathMonkey on Thursday May 07 2015, @06:09PM

      by DeathMonkey (1380) on Thursday May 07 2015, @06:09PM (#180007) Journal

      Slander, libel, hate speech, kiddy porn, state secrets...
       
      T'is but foolish childishness of no significance. The true evil lies buried within: COPYRIGHT INFRINGEMENT!!
       
      Burn the witch! I mean, intellectual property MAURAUDERS!

  • (Score: 3, Interesting) by Geotti on Thursday May 07 2015, @07:27AM

    by Geotti (1146) on Thursday May 07 2015, @07:27AM (#179793) Journal

    If this restriction were lifted

    Why... Just put it in "the cloud" (tm) the space there is infinitely scalable! Oh, and btw, could someone find my thesis in there, so I don't have to write it?

    By the way, what would the paradox be called if someone would actually, by accident, stumble over their thesis, the unfulfilled destiny paradox?

    • (Score: 0) by Anonymous Coward on Thursday May 07 2015, @02:44PM

      by Anonymous Coward on Thursday May 07 2015, @02:44PM (#179927)

      So I guess now everything is copyright infringement since everything anyone can type or think of is already in there and we're just infringing on it. Even this post is infringement too. I hope I don't get sued. The good news is that everything Hollywood puts out is infringement and now they can be sued.

  • (Score: 1, Insightful) by Anonymous Coward on Thursday May 07 2015, @08:12AM

    by Anonymous Coward on Thursday May 07 2015, @08:12AM (#179801)

    Nice

  • (Score: 3, Interesting) by greenfruitsalad on Thursday May 07 2015, @08:41AM

    by greenfruitsalad (342) on Thursday May 07 2015, @08:41AM (#179810)

    this is similar to the infinite storage device in gnu/linux. There's a device called /dev/random and it contains everything ever created if you have the patience to read from it for long enough.

    • (Score: 2) by aristarchus on Thursday May 07 2015, @08:57AM

      by aristarchus (2645) on Thursday May 07 2015, @08:57AM (#179813) Journal

      if you have the patience to read from it for long enough.

      Some times , .. . . . k .. . . the time spent . .. ;// exceeds . k. .. .. . time. .. . . . endMessage. Optimum Omega.

    • (Score: 2, Informative) by Anonymous Coward on Thursday May 07 2015, @09:05AM

      by Anonymous Coward on Thursday May 07 2015, @09:05AM (#179817)

      That device is provided by the Linux kernel and therefore is independent of any GNU stuff. Therefore it is a device in Linux, not just in GNU/Linux.

    • (Score: 2) by kaganar on Thursday May 07 2015, @02:18PM

      by kaganar (605) on Thursday May 07 2015, @02:18PM (#179920)
      Similar to /dev/random, but with some important properties:
      • You can seek to wherever you like. (/dev/random doesn't support seeking at all)
      • The pages are indexed to allow for fast searching. (Try searching /dev/random for a long string. You'll be waiting for a while.)
      • The number of pages is finite. (Not so with /dev/random depending on how it works under the hood.)

      You may be wondering how the virtual Library of Babel can allow indexing and searching so efficiently where /dev/random does not. I'll be honest, I'm not sure exactly how the virtual library was implemented, but it seems probable that it works similar to the following:

      • The index of each page is reversibly hashed into its contents. This makes its contents appear random. This also guarantees each page is unique.
      • If you want to find the index for a full page of text, just reverse the hash of it to obtain the index of it.
      • This last property is put to good use by playing on our perception of what a "search engine" does. Suppose I search for a small word, much less than 3200 characters long. To find the index of pages that do contain the word, all I need to do is put whatever I want before and/or after the word, and then reverse that hash to reveal the index of the page. I can do this as many times as I like with whatever padding I want. For example, I could pad it with English words.

      Is it really a search engine? Well, yes, it did actually find some virtual pages in virtual books on virtual shelves in virtual rooms in the virtual library. :) It's just not a particularly useful search engine.

  • (Score: 2) by KritonK on Thursday May 07 2015, @08:56AM

    by KritonK (465) on Thursday May 07 2015, @08:56AM (#179812)

    I was going to mention that the library even contains a page with this article, but then I noticed that the books do not contain numbers, of which there are three in the article. Punctuation marks and formatting are, of course, out of the question, and the article has those as well.

    Thus, the library's copy of this article is the following mangled version:

    fleg writesin , jorge luis borge wrote the library of babel, a story which descr
    ibed an almost infinite library containing every possible combination of letters
      in a vast collection of page books.jonathon basile has spent six months learnin
    g how to make a virtual version that can generate every possible page of , chara
    cters the library currently allows users to choose from about potential book
    s. the site also features a search tool, which allows users to retrieve the loca
    tion in the library of any known page of text. any individual page of hamlet or
    the bible can be found in the library, but the possibility of finding any other
    page from the same work in the same volume is vanishingly small. while the li
    brary contains every possible page, it does not yet hold every possible combinat
    ion of those pages. if this restriction were lifted, basile explains on the site
    , the library would house every book that ever has been written, and every book
    that ever could be including every play, every song, every scientific paper, ev
    ery legal decision, every constitution, every piece of scripture, and so on.
    basile evokes the comprehensive nature of the librarys blind volumes, saying to
    take a recent example, the confidential documents leaked by edward snowden... wi
    ll be there somewhere. its only a matter of knowing where to look for them.

  • (Score: 3, Touché) by WizardFusion on Thursday May 07 2015, @09:11AM

    by WizardFusion (498) Subscriber Badge on Thursday May 07 2015, @09:11AM (#179819) Journal

    Just wait until the MPAA hear of this, he'll be sued into oblivion for having copies of movie scripts.

    • (Score: 2) by bart9h on Thursday May 07 2015, @11:16AM

      by bart9h (767) on Thursday May 07 2015, @11:16AM (#179847)

      he has copy of nothing, just generate random stuff on the fly.

      • (Score: 0) by Anonymous Coward on Thursday May 07 2015, @12:16PM

        by Anonymous Coward on Thursday May 07 2015, @12:16PM (#179863)

        whoooooooooooooooooosh.

      • (Score: 1, Touché) by Anonymous Coward on Thursday May 07 2015, @02:58PM

        by Anonymous Coward on Thursday May 07 2015, @02:58PM (#179935)

        You think that will stop the MPAA?

  • (Score: 2) by jimshatt on Thursday May 07 2015, @11:14AM

    by jimshatt (978) on Thursday May 07 2015, @11:14AM (#179845) Journal
    Why the hell did he spend 6 months how to do something that I could do in 6 minutes. It's trivial!
    Even more trivial is to just use a transcendent number like Pi or SQR(2) and pick a random index from there. It contains everything (with arbitrary (but finite?) length).
    • (Score: 0) by Anonymous Coward on Thursday May 07 2015, @03:12PM

      by Anonymous Coward on Thursday May 07 2015, @03:12PM (#179941)

      Transcendental numbers are not necessarily normal. Pi and the square root of two are not proven to be normal.

      9.1101001000100001000001000000...

      is transcendental but not normal for instance

    • (Score: 2, Informative) by Anonymous Coward on Thursday May 07 2015, @03:30PM

      by Anonymous Coward on Thursday May 07 2015, @03:30PM (#179953)

      Of course generating random text is simple.

      The interesting accomplishment is that he made it searchable by using a reversible PRNG (effectively a form of (bad) encryption).

      (Also, because I like deflating ego: sqrt(2) is not a transcendental number, transcendental numbers are not necessarily random, and not necessarily uniformly random)

  • (Score: 2, Interesting) by WillAdams on Thursday May 07 2015, @12:08PM

    by WillAdams (1424) on Thursday May 07 2015, @12:08PM (#179859)

    The Freefall webcomic looked at this in a strip --- described it as a ``collection of tweets'' each containing 6 unique words, and the total storage space needed for it was projected to be the size of a small moon.

    Can't find the specific strip, but it's somewhere on: http://freefall.purrsia.com/ [purrsia.com] but I couldn't find it using the webcomic transcription at: http://www.ohnorobot.com [ohnorobot.com] (so obviously I must re-read Freefall and transcribe every panel into that)

  • (Score: 3, Interesting) by Rivenaleem on Thursday May 07 2015, @01:12PM

    by Rivenaleem (3400) on Thursday May 07 2015, @01:12PM (#179877)

    The answer to the ultimate question of life the universe and everything was easy to find. It must be on one of the pages in the library. The tough part is knowing what the correct page is, and more importantly, given how "42" will be on billions of pages, and for it to be the answer, it would be the last 2 characters of the page, you would have to find a computer that could correctly identify what all the correct letters leading up to that are. It makes sense that it would require a vastly more powerful computer to work out the question than the answer.

  • (Score: 2) by The Archon V2.0 on Thursday May 07 2015, @03:03PM

    by The Archon V2.0 (3887) on Thursday May 07 2015, @03:03PM (#179940)

    ... and I found the Denny's Kids Menu: http://brunching.com/randommonkeys.html [brunching.com]

  • (Score: 1) by OrugTor on Thursday May 07 2015, @04:57PM

    by OrugTor (5147) Subscriber Badge on Thursday May 07 2015, @04:57PM (#179982)

    There exists a number N where
    "almost infinite" > infinite - N

    • (Score: 0) by Anonymous Coward on Friday May 08 2015, @08:59AM

      by Anonymous Coward on Friday May 08 2015, @08:59AM (#180245)

      It's only infinitesimally smaller than infinity.

  • (Score: 2) by Bot on Thursday May 07 2015, @05:22PM

    by Bot (3902) on Thursday May 07 2015, @05:22PM (#179992) Journal

    In related news, let me present you the π-compression algorithm: it compresses data by indexing it in π, which can be recomputed at arbitrary precision to retrieve the data.

    Only minor problem, the index is orders of magnitude bigger than the data. Anyway it makes as much sense as the topic.

    --
    Account abandoned.
  • (Score: 2) by Freeman on Thursday May 07 2015, @05:42PM

    by Freeman (732) on Thursday May 07 2015, @05:42PM (#179995) Journal

    A Library at it's core is a repository for Information that can be used, but is not sold. What good is a book of actual gibberish? While this is an interesting exercise, it's a fairly useless resource.

    --
    Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"
    • (Score: 2) by Alfred on Thursday May 07 2015, @06:53PM

      by Alfred (4006) on Thursday May 07 2015, @06:53PM (#180023) Journal
      They should have it drop any pages with spelling errors, which would be most pages. Much more manageable that way. That should strip out most of the gibberish. Grammar check the rest and now you are approaching useful.

      Of course you lose copies of most tweets and comments posted on the internet.
  • (Score: 1) by gishzida on Thursday May 07 2015, @05:57PM

    by gishzida (2870) on Thursday May 07 2015, @05:57PM (#180000) Journal

    Because that is how useless this is.

    A more useful version of such a "library" would use a dictionary and Markov chains to generate more useful pages

    As an example see http://markovswisdom.blogspot.com/ [blogspot.com] for some examples of this kind of random generation

  • (Score: 1) by sharkx on Thursday May 07 2015, @06:41PM

    by sharkx (4299) on Thursday May 07 2015, @06:41PM (#180015)

    Is there some reason this could not be copyrighted? If so, then there can be a claim on any future work. This also describes all future patents as well.

  • (Score: 0) by Anonymous Coward on Friday May 08 2015, @04:10AM

    by Anonymous Coward on Friday May 08 2015, @04:10AM (#180186)

    So every possible Name of God must be written in those books.

    Hey! Aren't those the stars starting to wink out?

  • (Score: 1) by dingus on Saturday May 09 2015, @07:23PM

    by dingus (5224) on Saturday May 09 2015, @07:23PM (#180839)