Also if you use this resource in your work, it is asked that you cite the following: Europarl: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, MT Summit 2005
Since its original offering of 11 languages, the Europarl corpus has been further expanded to include 10 additional European languages, including Czech, Latvian, and Slovak. It was collected by Philipp Koehn to establish a benchmark training/testing suite for machine translation. It is advised to reserve the Q4/2000 portion of the data (2000-10 to 2000-12) for testing.