[CWB] help with indexing JRC-Acquis
brice
brice at crl.univ-paris-diderot.fr
Thu Nov 5 09:43:29 CET 2015
Hello,
We are trying to index the JRC-Acquis corpora (EN-FR for the moment) on
the Europarl web interface. We have managed to index the EN and FR
sub-corpora on cwb and we are currently trying to manage the alignment.
We have downloaded the AC aligned corpus using Vanilla aligner and
HunAlign on the
https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis website. We
were wondering whether there is a script which would allow us to use
this file in order to align the two sub-corpora and convert the .xml
file to a format cwb-align-encode could use ?
Thank you very much.
Alexandra Volanschi
(Assistant Professor, University Paris Diderot)
& Brice Bricaud
software developer
More information about the CWB
mailing list