MultiHTR

MultiHTR - Multilingual Handwritten Text Recognition

Specialist allocation

Project category

Projektzeitraum

06/01/2020 – 31/05/2024

Project start: 06/01/2020 - Project end: 31/05/2024

Short description of the project

The MultiHTR team is continuing the successful first project phase (June 1, 2020 to May 31, 2022) in order to expand the language portfolio in the second phase (June 1, 2022 to May 31, 2024) and make the latest advances in handwriting recognition (HTR) usable for the public and academia using artificial intelligence (AI). The overall project focuses on the (further) development of shorthand models for German, Yiddish written in the Hebrew alphabet, Ukrainian, Russian, Serbian and Ottoman. The automated transliteration and transcription models are intended to provide the public and researchers with access to previously inaccessible handwritten materials.

Project content

The MultiHTR team is continuing the results of the first successful project phase (June 1, 2020 to May 31, 2022) in order to expand the language portfolio in the second project phase (June 1, 2022 to May 31, 2024) and make the latest advances in the field of handwriting recognition (HTR) available to the population and academia. In this continuation, artificial intelligence (AI) will be used to develop advanced handwriting recognition models for languages and scripts not previously considered. The aim is to enable access to complex handwritten materials that were previously inaccessible to most users. The second phase focuses on the (further) development of shorthand models for German. In addition, a model for documents written in Hebrew Yiddish will be developed to make them accessible to descendants and the public. A further component is dedicated to the development of an HTR model for the Ukrainian language in order to make the indexing of Ukrainian-language archive holdings more efficient. At the same time, Ottoman-Turkish and Russian models are being further developed. The overarching goal of the project is to systematically advance progress in the field of handwriting recognition based on AI and to use the acquired technologies for the benefit of the population. In particular, the project focuses on the development of handwriting recognition models for German and for relevant migration languages in Germany/Baden-Württemberg. These models are to be trained by AI to automatically decode archive materials, ego documents and correspondence. In the first project phase, the project published models for Serbian and Russian. On the one hand, the automatically decoded texts serve as a basis for humanities research, in particular for micro-historical, discourse-analytical and sociolinguistic analyses. On the other hand, the population benefits directly by making complex, multilingual documents accessible without paleographic knowledge. The project is funded by the Baden-Württemberg Ministry of Science, Research and the Arts as part of the state's digital@bw digitization strategy.

Sponsor

Baden-Württemberg Ministry of Science, Research and the Arts as part of the state's digital@bw digitization strategy

Specialist allocation

Project category

Contact

achim.rabus@slavistik.uni-freiburg.de
multihtr@slavistik.uni-freiburg.de

Find out more at
www.multihtr.uni-freiburg.de

Project team members

Prof. Dr. Achim Rabus, Milanka Matić-Chalkitis, Aleksej Tikhonov, Lesley Loew, Martin Meindl

Register a new project

Add your DH research project to the project showcase by submitting a short project description via the web form. Enter project data, a brief description, a graphic or visualization as well as a detailed description of the project content with technical assignment, addressees, added value, project managers, funding information and duration.

More projects

ELAN wird am Max-Planck-Institut für Psycholinguistik im Sprach-Archiv (TLA – The Language Archive) entwickelt. Es wird in der Programmiersprache Java

Weiterlesen →

Die Virtuelle Forschungsumgebung TextGrid ist optimiert für die digitale Erschließung geisteswissenschaftlicher Quellen und deren langfristige Archivierung in einem Web-Archiv, insbesondere

Weiterlesen →

Der Holocaust in Ungarn und die Deportationen nach Norddeutschland

Die Website „Der Holocaust in Ungarn und die Deportationen nach Norddeutschland“ präsentiert Ergebnisse aus dem transnationalen Projekt „Digitale Gedenk- und

Weiterlesen →

DARIAH-DE (gefördert 2011-2019) unterstützt die mit digitalen Ressourcen und Methoden arbeitenden Geistes- und Kulturwissenschaftler/innen in Forschung und Lehre. Dazu baut

Weiterlesen →

Oekonomische Enzyclopaedie

Digital edition of the economic-technological encyclopaedia by J. G. Krünitz, published in 242 volumes from 1773 to 1858.

Weiterlesen →

Das von der NRW Akademie der Wissenschaften und der Künste sowie der Union der Deutschen Akademien finanzierte Langzeitprojekt an der

Weiterlesen →

aesthetiken des zugangs projektcover bild

Ästhetiken des Zugangs

is in die 1990er Jahre war es in der Filmwissenschaft ein Gemeinplatz, dass Frauen in den Anfangsjahren der Filmproduktion nur

Weiterlesen →

correspSearch

Der Webservice correspSearch wurde entwickelt um ein lange bestehendes Desiderat von Briefeditionen zu beheben: Die edierten Briefe editionsübergreifend durchsuchen zu

Weiterlesen →