Technical component
Machine Translation System
Domain-specific Machine Translation
The Machine Translation system ensures accurate and contextually appropriate translations by fine-tuning general-purpose machine translation models with domain-specific scientific data.
SciLake provides open, domain-adapted machine translation models to improve the translation of scientific text, including specialised terminology and complex sentence structures. Three models were developed for French→English, Spanish→English, and Portuguese→English (fine-tuned from OPUS‑MT and specialised for the project pilot domains).
The models are open-source and can be downloaded from the Hugging Face platform:
- French-English: https://huggingface.co/ilsp/opus-mt-big-fr-en_ct2_ft-SciLake
- Portuguese-English: https://huggingface.co/ilsp/opus-mt-pt-en_ct2_ft-SciLake
- Spanish-English: https://huggingface.co/ilsp/opus-mt-big-es-en_ct2_ft-SciLake
Publications:
- S. Kotitsas, P. Kounoudis, E. Koutli, H. Papageorgiou (2024) Leveraging fine-tuned Large Language Models with LoRA for Effective Claim, Claimer, and Claim Object Detection Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (Volume 1: Long Papers). URL: https://aclanthology.org/2024.eacl-long.156
Functionalities
Domain-adapted translation for scientific terminology
Coverage of FR/ES/PT → EN language pairs
Integration-ready
for workflows that process multilingual scholarly content (e.g., titles/abstracts)
Related Articles
Machine Translation for the Scientific Domain
11 July 2024
SciLake's partners from Athena RC present advancements in Machine Translation at the 25th Annual Conference of The European Association for Machine Translation.
Domain-Specific Machine Translation for SciLake
10 January 2024
Sokratis Sofianopoulos and Dimitris Roussis from Athena RC present their cutting-edge Machine Translation system, which will be integrated into the Scientific Lake Service.