[CWB] Aligning parallel corpora
Graham Ranger -- UAPV
graham.ranger at univ-avignon.fr
Tue May 7 14:57:00 CEST 2019
Hello to all,
I have set up a parallel corpus on cqpweb using s-attributes for the
visualisation of translations but I would like to be able to do the same
thing more cleanly, using alignment attributes. However, try as I might,
I cannot seem to follow the instructions in the encoding tutorial. I
have not been able to find the English and German Holmes files used in
Stefan Evert's tutorial for illustration. Now, what I would like to know
is: what exactly is the required input format for the cwb-align command?
If I have .vrt files created in two languages with treetagger, and if I
have prealigned these, in such a way that the first sentence of one file
corresponds to the first sentence of the other, the second sentence to
the second, etc. then is that enough? Or should my files also including
numerical information with all sentences numbered? I suspect this is a
very naive question, but it's one that I do not seem to be able to find
my way around without help!
Best,
Graham.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190507/71772180/attachment.html>
More information about the CWB
mailing list