[CWB] Giza ++ and CQPWeb

Annarita Felici Annarita.Felici at unige.ch
Thu Dec 15 10:36:37 CET 2016


Dear Vladimir,

Thanks for the link and for sharing the interesting project.

I did not know the KonText and the NoSketchEngine. I will look more into it and perhaps run a test on a small pilot corpus with word alignment. I'll do the same with Uplug at sentence level and see  which one works best in my case, even in terms of time.

Annarita


From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Vladimír Benko
Sent: lundi 12 décembre 2016 20:41
To: cwb at sslmit.unibo.it
Cc: Michal Křen <michal.kren at ff.cuni.c>; Alexandr Rosen <alexandr.rosen at gmail.com>
Subject: Re: [CWB] Giza ++ and CQPWeb

Dear Annarita,



On 24 Nov 2016, at 16:55, Annarita Felici <Annarita.Felici at unige.ch><mailto:Annarita.Felici at unige.ch> wrote:



I am planning to build a bidirectional parallel corpus of legal texts German-Italian/Italian-German.  For the alignment I was thinking of using Giza ++, but before embarking on this, I would like to know if I can import later one Giza alignment on CQPWeb.



CWB alignment attributes are designed for sentence-level alignment, not for word alignment.  Depending on how complex your word alignment is, the CWB mechanism could in theory be abused to store the alignment information, but its use in CWB would be severely limited and CQPweb would not be able to display it properly.



Are you sure that the automatic word alignment will be good enough for use in corpus searches?  If you're satisfied with a simple 1-to-1 alignment, you could possibly encode just the aligned words in a p-attribute and display it as a "gloss" in CQPweb.



Best,

Stefan

You may want to have a look at Treq that has been developed at the Institute of the Czech National Corpus in the framework of their InterCorp Project:

https://treq.korpus.cz/

Treq works with word-level aligned parallel corpora. (The backend here, however, is not the CQP, but rather KonText, a fork of NoSketch Engine.)

Best,

Vlado B, 20:35
--
Vladimír Benko

Université Comenius de Bratislava
Chaire UNESCO de communication
plurilingue et multiculturelle

Šafárikovo námestie 6, SK-81499 Bratislava
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20161215/52e52356/attachment.html>


More information about the CWB mailing list