[CWB] help with indexing JRC-Acquis

brice brice at crl.univ-paris-diderot.fr
Thu Nov 5 09:43:29 CET 2015


Hello,

We are trying to index the JRC-Acquis corpora (EN-FR for the moment) on 
the Europarl web interface. We have managed to index the EN and FR 
sub-corpora on cwb and we are currently trying to manage the alignment.

We have downloaded the AC aligned corpus using Vanilla aligner and 
HunAlign on the 
https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis website. We 
were wondering whether there is a script which would allow us to use 
this file in order to align the two sub-corpora and convert the .xml 
file to a format cwb-align-encode could use ?

Thank you very much.


Alexandra Volanschi
(Assistant Professor, University Paris Diderot)
& Brice Bricaud
software developer



More information about the CWB mailing list