[CWB] Aligning parallel corpora

Graham Ranger -- UAPV graham.ranger at univ-avignon.fr
Tue May 7 14:57:00 CEST 2019


Hello to all,
I have set up a parallel corpus on cqpweb using s-attributes for the 
visualisation of translations but I would like to be able to do the same 
thing more cleanly, using alignment attributes. However, try as I might, 
I cannot seem to follow the instructions in the encoding tutorial. I 
have not been able to find the English and German Holmes files used in 
Stefan Evert's tutorial for illustration. Now, what I would like to know 
is: what exactly is the required input format for the cwb-align command? 
If I have .vrt files created in two languages with treetagger, and if I 
have prealigned these, in such a way that the first sentence of one file 
corresponds to the first sentence of the other, the second sentence to 
the second, etc. then is that enough? Or should my files also including 
numerical information with all sentences numbered? I suspect this is a 
very naive question, but it's one that I do not seem to be able to find 
my way around without help!
Best,
Graham.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190507/71772180/attachment.html>


More information about the CWB mailing list