[CWB] Error cwb-align-import

Hardie, Andrew a.hardie at lancaster.ac.uk
Wed Jun 24 09:20:41 CEST 2015


On the subject of contributions to the documentation - which, as Stefan points out, are always very welcome - the easiest way is to express what you want to add as a series of plain text bullet points or paragraphs and send that plain text to the list in an email. Then Stefan or I will roll the contents of the email into the LaTeX source of the relevant documents. 

Patches against the LaTeX source as in the repo are also welcome if that's more your bag.

NB: The website version of the tutorial has not been updated since 2010. The source version in the repo has some initial notes I began to collect back in May 2013 towards better documentation of the alignment system, but - since I quickly ran out of time to work on this area - right now it's all in pieces, probably not readable at all. See https://sourceforge.net/p/cwb/code/HEAD/tree/doc/tutorials/encoding_tutorial/ 

(and alignment.tex is the file containing my notes so far)

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Stefan Evert
Sent: 24 June 2015 08:00
To: CWBdev Mailing List
Subject: Re: [CWB] Error cwb-align-import

> I've managed to import the alignment of two corpora at sentence level. I don't mind to document the process somehow for the encoding tutorial.

Thanks, that would be really neat.

> You can find attached a test data set to reproduce the issue. My question is, is there a way to overcome this error?

Unfortunately not because ...

> This alignment is basically some kind of "word alignment",  Sometimes, depending on the source text unit, the translation is a non-contiguous rendering.

… CWB's alignment attributes are designed for sentence-level alignment (e.g. in a translation memory) and thus ...

> So the tokens involved in the alignment have to be contiguous (not the structural elements). In the example given, this is trivial (one token more or less...), but I have other cases where elements appear much far apart and I don't want to include all the tokens in between.

… alignment beads can only link a contiguous range of tokens in the source corpus to another contiguous range in the target corpus. That's already a big improvement over early versions of CWB, which didn't even allow gaps between different beads or crossing alignments. 

> Although my case is a bit special, I don't think this is an infrequent scenario see Amoia et al. 2011 http://www.aclweb.org/anthology/W11-4302.

Certainly, but these are applications that (alignment in) CWB hasn't been designed for.

CWB4, when it eventually arrives, will allow for much more flexible types of alignment.

Best,
Stefan
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list