
Language Corpora and Databases: Design, Data Collection, Annotation, Management, and Applications


Contemporary linguistic research is increasingly reliant on corpora-based and –driven methodologies. Beyond written language, modern linguistic corpora encompass spoken language and multimodal communication. Not only do they serve as test beds for hypothesis testing but also inspire extended experimental and field research. Having establised its methodological and technical standards, the field remains dynamic, continually evolving, and encountering new challenges, ranging from the intricacies of interacting with informants or speakers to the technical complexities of handling big data storage and processing.

This session aims to foster a comprehensive understanding of the current landscape of linguistic corpus research, addressing both theoretical and practical aspects.


We invite contributions dedicated to the following key topics:

  • Methodological, technical, legal and ethical issues and challenges in language data collection, processing and storage,   
  • Annotation of language corpora: dimensions, levels, units, tagsets, tools and procedures,   
  • Corpora processing and corpus-based/driven language modelling,
  •  Applications of corpora in speech technology, speech therapy and diagnostics, industry, and media,   
  • Corpora as repositories of cultural heritage.









Institute of Applied Linguistics

Al. Niepodległości 4
61-874 Poznań

Phone: +48 61 829 2925

Visit us

Strona www stworzona w kreatorze WebWave.