In the essential field of documents analysis of digital humanities, text to image linking has proven to be a primordial step to provide expert with convenient work material. The linking of manuscript words with their transcription in the electronic text allows for more efficient, clear and sharable analysis.
In this fast growing field, reliable computer automatisation is of crucial importance. In fact, when done manually text-image alignment requires to circle each word within a polygon, and potentially each character for thinner granularity, before linking to the electronic transcript. This heavily time consuming task has motivated many innovations in order to achieve automatisation.
Layout analysis is the primary form of manuscript analysis. Yet it is paramount to split document image between writings and other (e.g. ornamentation) before any text-image alignment. To address this task, a computer software would need to be trained on specific layout whose segmentation was established by human experts . Such layout are called ground truth as they encompass the need of samples with known labels for use as training datas. Unfortunately, ground truth of historical document are rather rare. To address this problem, the DIVA group (University of Fribourg, Switzerland) developed a web-based interface (D IVA DIAWI) to semi-automatically generate ground truth for large numbers of historical documents. The system automatically draws polygons while prompting the user, in parallel or sequentially, for manual editing. Compared to the state of the art, this interface characterize itself by greater automation and easy access through any web browser. The interface reported to work well on the IAM Historical Document Database (IAM-HistDB) 7.
Efforts to achieve text-image alignment itself have already yield promising results. Among them, Yann Leydier and coworkers method to automatically link images of digitized manuscripts with electronic texts, at the granularity of page, column, line, word, and character. Using a handwritten text recognition based on the hidden markov model, line transcripts are assigned to the line images and used to train a word recognizer. Alignment at word level achieved 91.3% average accuracy on historical document.
Another significant advance is the TILT web-service developed by Desmond Schmidt implementing a novel method. TILT identify word-shapes rather than actual characters unlike classic optical character recognition program. To do so, a general line-recognition algorithm first extract lines from the document before fitting inline inter-word gaps to the transcript. At this stage lines are splitted in word-shapes potentially containing more than one word. To achieve linking, TILT then proceed to match several word-shapes in image to one word in the transcript or several word from the transcript to one word-shape. In this method the focus is hence directed toward word-width rather than character. It is to note that the effectiveness of the line extraction step is for now limited by variability of spacing in manuscript.
The two highlighted methods seek for the same goal while using slightly different technique with one focusing on character discrimination and the other on word-shape and width. Both are viable and ask for refinement. Despite accuracy favoring the first one, the variability of handwriting and motivate the development of alternative methods. Furthermore damaged documents with illegible characters but preserved inter-word gaps could be confronted to other document transcript with the second method. Doing so, manuscript could be identified even without any decipherable letter.
These methods are substantial advances toward what is now a noteworthy trend in the field : reliable computer automatisation of text-image linking.
|“DIVADIAWI – A Web-based Interface for Semi-automatic Labeling of Historical Document Images” Hao Wei, Kai Chen, Mathias Seuret, Marcel Würsch, Marcus Liwicki, Rolf Ingold; University of Fribourg, Switzerland|
|“From Text and Image to Historical Resource: Text-Image Alignment for Digital Humanists” Dominique Stutzmann, Théodore Bluche, Alexei Lavrentev, Yann Leydier, LIRIS Laboratoire d’Informatique en Image et Systèmes d’information (INSA de Lyon – UMR 5205); Christopher Kermorvant, Teklia, France|
|“TILT 2: Text to Image Linking Tool” Desmond Schmidt, Queensland University of Technology, Australia;|