Skip to main content
Technical component

Machine Translation System

Domain-specific Machine Translation

The Machine Translation system ensures accurate and contextually appropriate translations by fine-tuning general-purpose machine translation models with domain-specific scientific data.

SciLake provides open, domain-adapted machine translation models to improve the translation of scientific text, including specialised terminology and complex sentence structures. Three models were developed for French→English, Spanish→English, and Portuguese→English (fine-tuned from OPUS‑MT and specialised for the project pilot domains).

The models are open-source and can be downloaded from the Hugging Face platform:

Publications:

  • S. Kotitsas, P. Kounoudis, E. Koutli, H. Papageorgiou (2024) Leveraging fine-tuned Large Language Models with LoRA for Effective Claim, Claimer, and Claim Object Detection Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). URL: https://aclanthology.org/2024.eacl-long.156

Functionalities

Domain-adapted translation for scientific terminology

Coverage of FR/ES/PT → EN language pairs

Integration-ready

for workflows that process multilingual scholarly content (e.g., titles/abstracts)

For

Research Communities

Provided by

Contacts

Sokratis Sofianopoulos

Related Articles

Machine Translation for the Scientific Domain

11 July 2024
SciLake's partners from Athena RC present advancements in Machine Translation at the 25th Annual Conference of The European Association for Machine Translation.

Domain-Specific Machine Translation for SciLake

10 January 2024
Sokratis Sofianopoulos and Dimitris Roussis from Athena RC present their cutting-edge Machine Translation system, which will be integrated into the Scientific Lake Service.