MultiHTR

MultiHTR - Multilingual Handwritten Text Recognition

Specialist allocation

Project category

Projektzeitraum

06/01/2020 – 31/05/2024

Project start: 06/01/2020 - Project end: 31/05/2024

Short description of the project

The MultiHTR team is continuing the successful first project phase (June 1, 2020 to May 31, 2022) in order to expand the language portfolio in the second phase (June 1, 2022 to May 31, 2024) and make the latest advances in handwriting recognition (HTR) usable for the public and academia using artificial intelligence (AI). The overall project focuses on the (further) development of shorthand models for German, Yiddish written in the Hebrew alphabet, Ukrainian, Russian, Serbian and Ottoman. The automated transliteration and transcription models are intended to provide the public and researchers with access to previously inaccessible handwritten materials.

Project content

The MultiHTR team is continuing the results of the first successful project phase (June 1, 2020 to May 31, 2022) in order to expand the language portfolio in the second project phase (June 1, 2022 to May 31, 2024) and make the latest advances in the field of handwriting recognition (HTR) available to the population and academia. In this continuation, artificial intelligence (AI) will be used to develop advanced handwriting recognition models for languages and scripts not previously considered. The aim is to enable access to complex handwritten materials that were previously inaccessible to most users. The second phase focuses on the (further) development of shorthand models for German. In addition, a model for documents written in Hebrew Yiddish will be developed to make them accessible to descendants and the public. A further component is dedicated to the development of an HTR model for the Ukrainian language in order to make the indexing of Ukrainian-language archive holdings more efficient. At the same time, Ottoman-Turkish and Russian models are being further developed. The overarching goal of the project is to systematically advance progress in the field of handwriting recognition based on AI and to use the acquired technologies for the benefit of the population. In particular, the project focuses on the development of handwriting recognition models for German and for relevant migration languages in Germany/Baden-Württemberg. These models are to be trained by AI to automatically decode archive materials, ego documents and correspondence. In the first project phase, the project published models for Serbian and Russian. On the one hand, the automatically decoded texts serve as a basis for humanities research, in particular for micro-historical, discourse-analytical and sociolinguistic analyses. On the other hand, the population benefits directly by making complex, multilingual documents accessible without paleographic knowledge. The project is funded by the Baden-Württemberg Ministry of Science, Research and the Arts as part of the state's digital@bw digitization strategy.

Sponsor

Baden-Württemberg Ministry of Science, Research and the Arts as part of the state's digital@bw digitization strategy

Specialist allocation

Project category

Contact

achim.rabus@slavistik.uni-freiburg.de
multihtr@slavistik.uni-freiburg.de

Find out more at
www.multihtr.uni-freiburg.de

Project team members

Prof. Dr. Achim Rabus, Milanka Matić-Chalkitis, Aleksej Tikhonov, Lesley Loew, Martin Meindl

Register a new project

Add your DH research project to the project showcase by submitting a short project description via the web form. Enter project data, a brief description, a graphic or visualization as well as a detailed description of the project content with technical assignment, addressees, added value, project managers, funding information and duration.

More projects

Deutsches Textarchiv (DTA)

Das DTA ist ein Archiv für deutschsprachige, historische Textsammlungen an der Berlin-Brandenburgischen Akademie der Wissenschaften. Es umfasst annotierte Volltexttranskriptionen von

Weiterlesen →

Bibliotheca legum

Das Ziel der Bibliotheca legum ist es, einen Überblick über die handschriftliche Produktion weltlichen Rechts in der Karolingerzeit zu geben.

Weiterlesen →

Revistas culturales

Das Portal Revistas culturales 2.0 dient als virtuelle Forschungsumgebung für alle Interessenten, die sich mit historischen Zeitschriften aus dem spanischsprachigen

Weiterlesen →

Das Projekt erarbeitet eine Neuedition der fränkischen Herrschererlasse („Kapitularien”), die zu den zentralen Rechtsquellen des europäischen Mittelalters gehören. Zum einen

Weiterlesen →

Institute for Dokumentologie and Scholarly Editing

Das Institut für Dokumentologie und Editorik e.V. (IDE) ist ein internationaler Zusammenschluss von Wissenschaftlerinnen und Wissenschaftlern aus verschiedenen Disziplinen der

Weiterlesen →

Media of Cooperation

Im Zentrum steht die Erforschung digitaler, datenintensiver Medien, die sich auf breiter Front als kooperative Werkzeuge, Plattformen und Infrastrukturen herausgestellt

Weiterlesen →

1914-1918-online

Das Zusammenwirken von über 1,200 Projektbeteiligten aus über 50 Ländern ermöglicht eine umfassende Darstellung der „Urkatastrophe des 20. Jahrhunderts“ in

Weiterlesen →

Media of Cooperation

Im Zentrum steht die Erforschung digitaler, datenintensiver Medien, die sich auf breiter Front als kooperative Werkzeuge, Plattformen und Infrastrukturen herausgestellt haben und im Kontext ubiquitärer Verdatung durch Sensortechnologien und semi-autonom operierender künstlicher Intelligenz auf neue Weise virulent werden. In diesem Spannungsfeld leistet der SFB digitale Grundlagenforschung, die zwischen Geschichte und Gegenwart vermittelt und zukünftige digitale Medien gestaltet.

Weiterlesen →