[CWB] Giza ++ and CQPWeb

Ruprecht von Waldenfels ruprecht.waldenfels at gmx.net
Mon Dec 12 13:54:59 CET 2016


Hi,
we use Uplug for word alignment in a small Polish-German corpus we've 
built, see http://www.parasolcorpus.org/KrakowMW/

What we do is we encode the alignment candidates as a set in a 
positional attribute

this ART    12.23    |der:n=12.23|die:n=12:27|das:n=12:35|

with 'der' 'die' 'das' tokens that were word aligned to  'this'. We use 
client-side XSLT to figure out which target words are translation 
candidates of the word forms that were searched for, based on the token 
numbers.

Best,
Ruprecht

Am 12.12.2016 um 11:17 schrieb Stefan Evert:
>> On 24 Nov 2016, at 16:55, Annarita Felici <Annarita.Felici at unige.ch> wrote:
>>
>> I am planning to build a bidirectional parallel corpus of legal texts German-Italian/Italian-German.  For the alignment I was thinking of using Giza ++, but before embarking on this, I would like to know if I can import later one Giza alignment on CQPWeb.
> CWB alignment attributes are designed for sentence-level alignment, not for word alignment.  Depending on how complex your word alignment is, the CWB mechanism could in theory be abused to store the alignment information, but its use in CWB would be severely limited and CQPweb would not be able to display it properly.
>
> Are you sure that the automatic word alignment will be good enough for use in corpus searches?  If you're satisfied with a simple 1-to-1 alignment, you could possibly encode just the aligned words in a p-attribute and display it as a "gloss" in CQPweb.
>
> Best,
> Stefan
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb




More information about the CWB mailing list