Junior Researcher in Speech-to-Text Technologies for Minority Languages - Bolzano - Eurac Research

    Eurac Research
    Eurac Research Bolzano

    2 giorni fa

    Descrizione
    Institute for Applied Linguistics


    Faccia ora il prossimo passo nella sua carriera: scorra verso il basso per leggere la descrizione completa del ruolo e invii la sua candidatura.

    The Language Technologies (LT) research group at the Institute for Applied Linguistics is seeking a junior computational linguist to contribute to the DIGI-RLF project (Interreg Italia-Svizzera 2 DIGI-RLF addresses the challenge of preserving and enhancing the Rhaeto‑Romance minority languages Ladin in South Tyrol and Romansh in Grisons, aiming to overcome linguistic barriers that hinder the efficiency of public administrations in cross‑border regions.

    The goal is to transform the currently fragmented landscape of digital linguistic resources into a coordinated ecosystem, enabling administrations to provide services in minority languages with greater efficiency and quality.

    Key outputs include a joint digitisation strategy, integration into international standards (Unicode CLDR) and optimised AI models for automatic speech‑to‑text (STT) transcription for Ladin and Romansh.

    The position focusses on the acquisition, refinement and processing of Ladin and Romansh spoken data for the development and evaluation of STT models.

    The role offers opportunities to build expertise in NLP for low‑resource languages, while working in a collaborative, interdisciplinary research environment.

    As the position involves working with other researchers and practitioners on locally collected language data, experience with linguistic research (especially speech data) as well as basic knowledge of Ladin and Romansh is considered an advantage.

    We are looking for a cooperative, proactive colleague, who thrives in an interdisciplinary and application‑oriented research environment.
    Tasks

    Contribute to the creation, processing, documentation and maintenance of spoken corpora
    Support digitisation, data cleaning and annotation, quality control and metadata management in line with good research data management practices
    Analyse mono‑ and multilingual data using quantitative and computational methods
    Implement, adapt and evaluate language technology workflows (e.g. NLP pipelines, data processing, evaluation setups)
    Support research dissemination through scientific and transfer‑oriented publications, presentations and internal knowledge sharing
    Although dedicated to the project, the candidate will join the LT group and wider Institute meetings and initiatives

    Requirements

    Degree (MA/MSc or BSc) in relevant fields, such as Computational Linguistics, Data Science, Computer Science or similar (linguistic degrees will be considered *if* technical skills are also provided)
    Awareness of (and/or interest to acquire good practice in) research data management, including all steps required for the collection and creation of data and metadata that comply with FAIR principles
    Awareness of reproducible research practices or strong interest in learning and applying them in practice
    Strong programming skills in Python, Pytorch, Transformers and relevant libraries
    Knowledge of typical text and data processing pipelines and current NLP toolkits (e.g. spaCy, Stanza, quanteda) or strong interest in learning and applying them in practice
    Solid knowledge of (large) language models and their application to common NLP tasks
    Familiarity with git, Jupyter notebooks and command‑line interfaces (CLI)
    Willingness to move to South Tyrol or to its vicinity in order to work on‑site
    Strong command of English and Italian
    Basic knowledge of German or the willingness to acquire it as a working language
    Social, organisational and communication skills, including careful scheduling and task management
    Ability and willingness to collaborate with researchers from different disciplinary backgrounds and research paradigms

    Additional advantageous skills

    Experience with DevOps, data integration workflows or high‑performance computing environments
    Experience with digitisation technologies such as digital audio recording and editing
    Experience with linguistic research, especially speech data, as well as basic knowledge of Ladin and Romansh

    We offer

    A full‑time position for 18 months. If the selected candidate is interested, a part‑time contract of at least 70% could also be considered. xkiyazw
    A supportive, international, and interdisciplinary research environment
    Professional development opportunities
    Flexible working arrangements with regular on‑site presence to ensure exchange and collaboration
    Benefits (e.g. family‑friendly benefits, lunch bonus, supplementary health insurance, etc.)
    Access to numerous scientific and cultural facilities and events

    Eurac Research actively supports equal opportunities and diversity and encourages applications from candidates of all backgrounds.
    Interested candidates should submit their application (CV and cover letter) by
    #J-18808-Ljbffr

Lavori
>
Bolzano