[CWB] help with indexing JRC-Acquis
Stefan Evert
stefanML at collocations.de
Thu Nov 5 11:30:59 CET 2015
> On 5 Nov 2015, at 09:43, brice <brice at crl.univ-paris-diderot.fr> wrote:
>
> We have downloaded the AC aligned corpus using Vanilla aligner and HunAlign on the https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis website. We were wondering whether there is a script which would allow us to use this file in order to align the two sub-corpora and convert the .xml file to a format cwb-align-encode could use ?
There's a tool
cwb-align-import
in the CWB/Perl modules which does exactly what you need. After installing CWB/Perl, consult
perldoc cwb-align-import
for detailed usage and format information.
Best,
Stefan
More information about the CWB
mailing list