A tool that locates, downloads, and extracts machine translation corpora
-
Updated
Sep 18, 2025 - Python
A tool that locates, downloads, and extracts machine translation corpora
Large-scale, distributed, sparse linear algebra in Julia.
PhoMT: A High-Quality and Large-Scale Benchmark Dataset for Vietnamese-English Machine Translation (EMNLP 2021)
Code and data for the EMNLP 2020 paper: "Detecting Fine-Grained Cross-Lingual Semantic Divergences without Supervision by Learning to Rank"
Wikipedia-Vikidia Corpus (WiViCo) - A general-purpose parallel sentence simplification dataset for French
A Telegram Bot for Amharic Speech Data Collection
This repository focuses on distributed and parallel computing with PyTorch, covering model parallelism, data parallelism, and advanced optimization techniques. It provides resources for scaling AI training and inference efficiently across multiple devices.
Add a description, image, and links to the parallel-data topic page so that developers can more easily learn about it.
To associate your repository with the parallel-data topic, visit your repo's landing page and select "manage topics."