Stories
Slash Boxes
Comments

SoylentNews is people

SoylentNews is powered by your submissions, so send in your scoop. Only 18 submissions in the queue.
posted by Fnord666 on Wednesday May 01 2019, @10:21PM   Printer-friendly
from the it's-all-greek-to-me dept.

Submitted via IRC for Bytram

OCR4all: Modern tool for old texts

Historians and other humanities' scholars often have to deal with difficult research objects: centuries-old printed works that are difficult to decipher and often in an unsatisfactory state of conservation. Many of these documents have now been digitized—usually photographed or scanned—and are available online worldwide. For research purposes, this is already a step forward.

However, there is still a challenge to overcome: bringing the digitized old fonts into a modern form with text recognition software that is readable for non-specialists as well as for computers. Scientists at the Center for Philology and Digitality at Julius-Maximilians-Universität Würzburg (JMU) in Bavaria, Germany, have made a significant contribution to further development in this field.

With OCR4all, the JMU research team is making a new tool available to the scientific community. It converts digitized historical prints with an error rate of less than one percent into computer-readable texts. And it offers a graphical user interface that requires no IT expertise. With previous tools of this kind, user-friendliness was not always a given, as the users mostly had to work with programming commands.

[...] In developing OCR4all, computer scientists have collaborated with the humanities at JMU—including German and Romance studies and literature studies in the project "Narragonien digital." The aim was to digitize the "Narrenschiff," a moral satire by Sebastian Brant, a bestseller of the 15th century that was translated into many languages. Furthermore, OCR4all has been frequently used in the JMU's Kolleg "Medieval and Early Modern Times."

OCR4all is freely available to the public on the GitHub platform (with instructions and examples): https://github.com/OCR4all


Original Submission

 
This discussion has been archived. No new comments can be posted.
Display Options Threshold/Breakthrough Mark All as Read Mark All as Unread
The Fine Print: The following comments are owned by whoever posted them. We are not responsible for them in any way.
  • (Score: 2) by driverless on Thursday May 02 2019, @02:23AM (1 child)

    by driverless (4770) on Thursday May 02 2019, @02:23AM (#837646)

    Scientists at the Center for Philology and Digitality at Julius-Maximilians-Universität Würzburg (JMU) in Bavaria, Germany

    Lets face it, the only reason they did this was so they could introduce the world to OCR'd writings like "kinnts es deppn ned a boarisch ren wia jeda nuaamale mensch, es grattler?".

    Starting Score:    1  point
    Karma-Bonus Modifier   +1  

    Total Score:   2  
  • (Score: 2) by Freeman on Thursday May 02 2019, @04:01PM

    by Freeman (732) on Thursday May 02 2019, @04:01PM (#837937) Journal

    Google Translate doesn't not parse that statement.

    --
    Joshua 1:9 "Be strong and of a good courage; be not afraid, neither be thou dismayed: for the Lord thy God is with thee"