[CWB] sentence-Aligned parallel corpus in CWB

Stefan Evert stefanML at collocations.de
Thu Feb 18 15:40:01 CET 2016


> On 18 Feb 2016, at 14:25, Marc Reznicek <mreznice at ucm.es> wrote:
> 
> I am trying to convert a parallel sentence-aligned novel corpus to CWB. I have already compiled single language corpora but I have trouble finding information about the input format, conversion and querying in standard CQP concerning parallel corpora.

Unfortunately, they're described in the TODO parts of the CQP Query Language Tutorial and Corpus Encoding Tutorial. :-}

Contributions to the documentation would be very welcome …


For indexing a pre-aligned corpus, I very much recommend the cwb-align-import script included in the CWB/Perl API. "perldoc cwb-align-import" will give you a brief description of the required input formats and the import procedure.


Best,
Stefan



More information about the CWB mailing list