[CWB] using the EuroParl aligned corpora

Stefan Evert stefanML at collocations.de
Fri Aug 28 13:17:50 CEST 2015


I think this hasn't been answered yet, so briefly:

> Hi, I'm keen to use CWB in a course this Fall. I'm particularly interested in showing off EuroParl3 corpora that are posted on the sourceforge page.
> 	1. is there a document indicating who we should credit? what the full set of annotations are? I didn't see anything in the info file

It's the same version of Europarl that's available in the public CQP demo Web interface.  You can find some information here:

	http://corpora.linguistik.uni-erlangen.de/demos/CQP/Europarl/index.html

If you go to the "CQP Mode" query page and select "Help Page" at the top, you should find links to tagset descriptions for the languages included in this version of the corpus.

Note that the OPUS page offers a different version of the Europarl corpus (both Europarl 7 and Europarl 3 are different from the one at cwb.sf.net), with a slightly different Web interface:

	http://diates.lingfil.uu.se/cwb/Europarl3/
	http://diates.lingfil.uu.se/cwb/Europarl7/

> 	2. is there any way to take advantage of the "aligned" aspect of these corpora? i.e. to ask how you say "We don't want you to raise tariffs" (e.g.) in Spanish, Dutch, Italian,...

The most obvious thing to do is search for the expression to be translated and then browse through the aligned sentences to see what strategies translators chose.

You can do this nicely in the Web interfaces above with a suitable CQP search and then selecting the the relevant target languages for display.

You can also exclude a particular translation from the search results (e.g. a "standard" translation if you're looking for alternative expressions, or a translation for a different sense of the original phrase) with an aligned query.  Unfortunately, there's no documentation in the official CQP tutorial yet – please ask again if you need help.

Regards,
Stefan


More information about the CWB mailing list