[CWB] Giza ++ and CQPWeb

Annarita Felici Annarita.Felici at unige.ch
Thu Dec 15 10:09:47 CET 2016


Hi Stefan, 

thanks for the info about the CWB alignment. 
I am not sure whether an automatic word alignment will be good enough for corpus searches. I just assume so. Ideally, I'd like the possibility to search for translation equivalents with their usage context and also be able to do linguistic searches as in CQPWeb. I have tried several aligners at sentence level (Vanilla, LF Aligner, You align), but the output German-Italian is not great. I ran a test on Paraconc which allows translation searches and has sentence alignment, but candidate translations are not correctly spotted. The main problem are trennbare Verben, Partizipialsätze, eingebettete Satzglieder. On the other hand, my texts have identical sentence breaks and except for single segments within the sentences, a 1- to-1 simple is not too bad.  I might give a go to Uplug as also Ruprecht mentioned in his message.
Best,

Annarita

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Stefan Evert
Sent: lundi 12 décembre 2016 11:18
To: CWBdev Mailing List <cwb at sslmit.unibo.it>
Subject: Re: [CWB] Giza ++ and CQPWeb


> On 24 Nov 2016, at 16:55, Annarita Felici <Annarita.Felici at unige.ch> wrote:
> 
> I am planning to build a bidirectional parallel corpus of legal texts German-Italian/Italian-German.  For the alignment I was thinking of using Giza ++, but before embarking on this, I would like to know if I can import later one Giza alignment on CQPWeb.

CWB alignment attributes are designed for sentence-level alignment, not for word alignment.  Depending on how complex your word alignment is, the CWB mechanism could in theory be abused to store the alignment information, but its use in CWB would be severely limited and CQPweb would not be able to display it properly.

Are you sure that the automatic word alignment will be good enough for use in corpus searches?  If you're satisfied with a simple 1-to-1 alignment, you could possibly encode just the aligned words in a p-attribute and display it as a "gloss" in CQPweb.

Best,
Stefan
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list