[CWB] help with indexing JRC-Acquis

Stefan Evert stefanML at collocations.de
Thu Nov 5 11:30:59 CET 2015


> On 5 Nov 2015, at 09:43, brice <brice at crl.univ-paris-diderot.fr> wrote:
> 
> We have downloaded the AC aligned corpus using Vanilla aligner and HunAlign on the https://ec.europa.eu/jrc/en/language-technologies/jrc-acquis website. We were wondering whether there is a script which would allow us to use this file in order to align the two sub-corpora and convert the .xml file to a format cwb-align-encode could use ?

There's a tool

	cwb-align-import

in the CWB/Perl modules which does exactly what you need.  After installing CWB/Perl, consult

	perldoc cwb-align-import

for detailed usage and format information.

Best,
Stefan


More information about the CWB mailing list