[CWB] Short sentences inconsistent alignment

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Dec 27 13:58:02 CET 2018

The .align file is read as described in man cwb-align.

In brief, cols 1-4 are two pairs of cpos, where the first cpos pair = region in source and the second cpos pair = aligned region in target: so what I'm asking is, are the example sentences you sent with id=73 correctly represented by a line of cpos pairs in the a-attribute?

(You can also use cwb-align-decode to check that what is encoded is the same as what is in  your .align file.)

If the cpos pairs are not correct for that sentence alignment, then the problem is in the generation of the .align file. One point to note is that if you used cwb-align to generate the alignments (??), errors are to be expected for language pairs which share little or no vocab.



From: "Andrés Chandía" <andres at chandia.net>
Sent: 27 December 2018 11:47
To: Hardie, Andrew <a.hardie at lancaster.ac.uk>
Cc: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: RE: [CWB] Short sentences inconsistent alignment

Thanks for the answer, but how do I check that these s elements are really aligned with one another in the underlying a-attribute?

If you mean to check the align files, how should they be read?, anyway, here they are (just in case):
[Image removed by sender. [   ]]


2018-12-27 12:32


[Image removed by sender. [   ]]


2018-12-27 12:32


[Image removed by sender. [   ]]


2018-12-27 12:32


[Image removed by sender. [   ]]


2018-12-27 12:32


But I don't think it is an alignment problem because two sentences further on they alignment display is the right one:

pichi wentru nien / pequeño hombre tengo

rume küme wentru / muy buen hombre

pichi wentru nien / I have a small man

rume küme wentru / a very good man


            andrés chandía
[Image removed by sender. chandia.net]<http://www.chandia.net>[Image removed by sender.]<https://twitter.com/chandianet>
Dungupeyem<http://chandia.net/content/dungupeyem> | IECMap<http://chandia.net/content/iecmap> | ISECMap<http://chandia.net/content/isecmap> | NMT<http://chandia.net/content/nmt> | Corlexim<http://corlexim.cl>

administrador de:
Parles.upf<http://parles.upf.edu> | IWCH<https://iwch.upf.edu> | Amind terapia<http://amindterapia.com> | ONG Mapuche koyaktu<http://koyaktumapuche.net> | Nocando<http://parles.upf.edu/llocs/nocando> | IAC<https://iac.upf.edu> | CddZ<https://iac.upf.edu/cddz> | ISAC<https://iac.upf.edu/isac> | CatCg<http://catcg.upf.edu>
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20181227/2f74f230/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD000.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD000.jpg
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20181227/2f74f230/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 338 bytes
Desc: image001.jpg
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20181227/2f74f230/attachment-0003.jpg>

More information about the CWB mailing list