Junior Researcher in Speech-to-Text Technologies for Minority Languages - Bolzano - Eurac Research
Descrizione
Institute for Applied Linguistics
Faccia ora il prossimo passo nella sua carriera: scorra verso il basso per leggere la descrizione completa del ruolo e invii la sua candidatura.
The Language Technologies (LT) research group at the Institute for Applied Linguistics is seeking a junior computational linguist to contribute to the DIGI-RLF project (Interreg Italia-Svizzera 2 DIGI-RLF addresses the challenge of preserving and enhancing the Rhaeto‑Romance minority languages Ladin in South Tyrol and Romansh in Grisons, aiming to overcome linguistic barriers that hinder the efficiency of public administrations in cross‑border regions.
The goal is to transform the currently fragmented landscape of digital linguistic resources into a coordinated ecosystem, enabling administrations to provide services in minority languages with greater efficiency and quality.
Key outputs include a joint digitisation strategy, integration into international standards (Unicode CLDR) and optimised AI models for automatic speech‑to‑text (STT) transcription for Ladin and Romansh.
The position focusses on the acquisition, refinement and processing of Ladin and Romansh spoken data for the development and evaluation of STT models.
The role offers opportunities to build expertise in NLP for low‑resource languages, while working in a collaborative, interdisciplinary research environment.
As the position involves working with other researchers and practitioners on locally collected language data, experience with linguistic research (especially speech data) as well as basic knowledge of Ladin and Romansh is considered an advantage.
We are looking for a cooperative, proactive colleague, who thrives in an interdisciplinary and application‑oriented research environment.Tasks
Contribute to the creation, processing, documentation and maintenance of spoken corpora
Support digitisation, data cleaning and annotation, quality control and metadata management in line with good research data management practices
Analyse mono‑ and multilingual data using quantitative and computational methods
Implement, adapt and evaluate language technology workflows (e.g. NLP pipelines, data processing, evaluation setups)
Support research dissemination through scientific and transfer‑oriented publications, presentations and internal knowledge sharing
Although dedicated to the project, the candidate will join the LT group and wider Institute meetings and initiatives
Requirements
Degree (MA/MSc or BSc) in relevant fields, such as Computational Linguistics, Data Science, Computer Science or similar (linguistic degrees will be considered *if* technical skills are also provided)
Awareness of (and/or interest to acquire good practice in) research data management, including all steps required for the collection and creation of data and metadata that comply with FAIR principles
Awareness of reproducible research practices or strong interest in learning and applying them in practice
Strong programming skills in Python, Pytorch, Transformers and relevant libraries
Knowledge of typical text and data processing pipelines and current NLP toolkits (e.g. spaCy, Stanza, quanteda) or strong interest in learning and applying them in practice
Solid knowledge of (large) language models and their application to common NLP tasks
Familiarity with git, Jupyter notebooks and command‑line interfaces (CLI)
Willingness to move to South Tyrol or to its vicinity in order to work on‑site
Strong command of English and Italian
Basic knowledge of German or the willingness to acquire it as a working language
Social, organisational and communication skills, including careful scheduling and task management
Ability and willingness to collaborate with researchers from different disciplinary backgrounds and research paradigms
Additional advantageous skills
Experience with DevOps, data integration workflows or high‑performance computing environments
Experience with digitisation technologies such as digital audio recording and editing
Experience with linguistic research, especially speech data, as well as basic knowledge of Ladin and Romansh
We offer
A full‑time position for 18 months. If the selected candidate is interested, a part‑time contract of at least 70% could also be considered. xkiyazw
A supportive, international, and interdisciplinary research environment
Professional development opportunities
Flexible working arrangements with regular on‑site presence to ensure exchange and collaboration
Benefits (e.g. family‑friendly benefits, lunch bonus, supplementary health insurance, etc.)
Access to numerous scientific and cultural facilities and events
Eurac Research actively supports equal opportunities and diversity and encourages applications from candidates of all backgrounds.
Interested candidates should submit their application (CV and cover letter) by
#J-18808-Ljbffr