Dr. Arne Schuldt
CEO and Researcher

Glyph Extraction from Historic Document Images

L. Meyer-Lerbs, A. Schuldt, and B. Gottfried

Centre for Computing and Communication Technologies (TZI)
University of Bremen, Am Fallturm 1, D-28359 Bremen

Abstract

This paper is about the reproduction of ancient texts with vectorised fonts. While for OCR only recognition rates count, a reproduction process does not necessarily require the recognition of characters. Our system aims at extracting all characters from printed historic documents without the employment of knowledge of language, font, or writing system. It searches for the best prototypes and creates a document-specific font from these glyphs. To reach this goal, many common OCR preprocessing steps are no longer adequate. We describe the necessary changes of our system that deals particularly with documents typeset in Fraktur. On the one hand, algorithms are described that extract glyphs accurately for the purpose of precise reproduction. On the other hand, classification results of extracted Fraktur glyphs are presented for different shape descriptors.

Reference

Meyer-Lerbs, L., Schuldt, A., and Gottfried, B. (2010). Glyph Extraction from Historic Document Images. In Antonacopoulos, A., Gormish, M. J., and Ingold, R. (eds.): 10th ACM Symposium on Document Engineering (DocEng 2010). Manchester, UK, September 21-24, 2010. ACM Press, pp. 227-230.

Copyright

© ACM, 2010. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in DocEng 2010, Manchester, UK, September 21-24
http://doi.acm.org/10.1145/1860559.1860609

More publications