Europarl

Submitted by on Jan 15 2015 } Suggest Revision
By: Philipp Koehn
From: University of Edinburgh
Resource Type:
Data
License:
Language:
Data Format:
XML

Description

A corpus of parallel text in 21 languages from the proceedings of the European Parliament.
Post comment
Cancel
markh43357
2015-01-17 16:54:04

Also if you use this resource in your work, it is asked that you cite the following: Europarl: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, MT Summit 2005

markh43357
2015-01-17 16:51:57

Since its original offering of 11 languages, the Europarl corpus has been further expanded to include 10 additional European languages, including Czech, Latvian, and Slovak. It was collected by Philipp Koehn to establish a benchmark training/testing suite for machine translation. It is advised to reserve the Q4/2000 portion of the data (2000-10 to 2000-12) for testing.