FOLK

Forschungs- und Lehrkorpus Gesprochenes Deutsch (FOLK) / Research and Teaching Corpus for Spoken German

Specialist allocation

Project category

Corpus

Projektzeitraum

01/01/2008 – laufend

Project start: 01/01/2008 - Project end: ongoing

Short description of the project

The Research and Teaching Corpus of Spoken German (FOLK) is continously being built at the Leibniz Institute for the German Language since 2008. The corpus contains audio and video recordings of everyday natural interactions in German from various areas of social life (work, leisure, education, public life, services, etc.). The aim of FOLK is to provide a large and broadly diversified database for analysing German spoken language in natural interaction. The data is transcribed and further annotated and documented according to modern corpus annotation standards and made available to the scientific community via the Database for Spoken German (DGD).

Project content

While various well-stratified corpora of written German are now available to the scientific community, there is still no comparable collection of spoken German. Therefore, the IDS has continously been building the Research and Teaching Corpus of Spoken German as a large corpus that contains audio- and videorecordings of spoken German everyday interactions in various areas of social life (e.g. work, leisure, education, public life, services, etc.). The data is transcribed and further annotated and documented according to modern corpus annotation standards and made available to the scientific community via the Database for Spoken German (DGD).

The corpus offers researchers the opportunity to analyse a variety of scientific questions while using data that is available to everyone in the scientific community in the the same way. To date, many studies in Interactional Linguistics, Conversation Analysis and Corpus Linguistics already use the corpus as the sole basis for analyses or as a point of comparison. FOLK provides cultural and media studies with insights to a wide range of the reality of social interaction in Germany. Data from the FOLK corpus can also serve as material for teaching German studies and German as a second language at universities.

The FOLK database consists of audio and/or video recordings of authentic interactions. The interactions are transcribed with the transcription editor FOLKER in accordance with the cGAT transcription conventions (conventions for computer-assisted transcription on the basis of the GAT2-system for transcribing talk-in-interaction) as minimal transcripts in literary transcription. The transcripts are aligned with the audio recordings, so that the corresponding audio excerpt is immediately available in the DGD for each passage of a transcript. An orthographic normalisation, a lemmatisation and a part-of-speech tagging are added as further annotation levels to improve searchability. Comprehensive contextual data on the circumstances of the interaction and socio-demographic data of the speakers involved are also documented as metadata for each conversation. All recordings, transcripts, and metadata can be viewed and systematically searched via the DGD. In addition to a full-text search, there is a structure-sensitive token search on transcript data with the option of filtering the search results using metadata.

The corpus is continuously being expanded and additions to the corpus are regularly made available via the DGD.

FOLK is part of the Archive for Spoken German (AGD) in the ‘Oral Corpora’ programme area of the Pragmatics Department at the Leibniz Institute for the German Language (IDS) in Mannheim. The IDS is the central extramural institution for researching and documenting the German language. As a member of the Leibniz Association, the IDS is funded by the federal government and the federal states (especially by the state of Baden-Württemberg)

Sponsor

Federal government and federal states (especially by the state of Baden-Württemberg) / Leibniz-Association

Specialist allocation

Project category

Corpus

Contact

Dr. Silke Reineke
folk@ids-mannheim.de

Find out more at
agd.ids-mannheim.de/folk.shtml

Project team members

Dr. Silke Reineke, Dr. Mia Schürmann, Evi Schedl, M.A., Jürgen Immerz, Prof. Dr. Arnulf Deppermann

Register a new project

Add your DH research project to the project showcase by submitting a short project description via the web form. Enter project data, a brief description, a graphic or visualization as well as a detailed description of the project content with technical assignment, addressees, added value, project managers, funding information and duration.

More projects

Institute for Dokumentologie and Scholarly Editing

Das Institut für Dokumentologie und Editorik e.V. (IDE) ist ein internationaler Zusammenschluss von Wissenschaftlerinnen und Wissenschaftlern aus verschiedenen Disziplinen der

Weiterlesen →

Schnitzler projekt cover bild

Arthur Schnitzler

Arthur Schnitzler gehört zu den bedeutendsten österreichischen Autoren und war ein produktiver und gut vernetzter Briefschreiber. Seine Korrespondenz wurde jedoch

Weiterlesen →

Das Projekt erarbeitet eine Neuedition der fränkischen Herrschererlasse („Kapitularien”), die zu den zentralen Rechtsquellen des europäischen Mittelalters gehören. Zum einen

Weiterlesen →

DARIAH-DE Repository

Das DARIAH-DE Repository ist eine zentrale Komponente der DARIAH-DE Forschungsdaten-Föderationsarchitektur, die verschiedene Dienste und Anwendungen aggregiert und so komfortabel nutzbar

Weiterlesen →

Zeta und Konsorten

Der Vergleich als methodisches und epistemologisches Paradigma ist in den Geisteswissenschaften tief verankert. Ob in der qualitativen oder quantitativen Forschung

Weiterlesen →

Schnitzler projekt cover bild

Arthur Schnitzler

Arthur Schnitzler gehört zu den bedeutendsten österreichischen Autoren und war ein produktiver und gut vernetzter Briefschreiber. Seine Korrespondenz wurde jedoch

Weiterlesen →

Wörterbuchnetz

The Trier dictionary network offers access to more than 40 dictionaries and reference works, which can either be called up individually or queried together using a comprehensive search. At the same time, the dictionaries are interlinked at keyword level so that iterative navigation within the dictionary network is possible.

Weiterlesen →

Non-Latin Script Data

Das Fach ist BUA-finanziert und am Seminar für Semitistik und Arabistik angelegt. Es fokussiert sich auf die Analyse des status

Weiterlesen →