[CWB] Giza ++ and CQPWeb
Ruprecht von Waldenfels
ruprecht.waldenfels at gmx.net
Mon Dec 12 13:54:59 CET 2016
Hi,
we use Uplug for word alignment in a small Polish-German corpus we've
built, see http://www.parasolcorpus.org/KrakowMW/
What we do is we encode the alignment candidates as a set in a
positional attribute
this ART 12.23 |der:n=12.23|die:n=12:27|das:n=12:35|
with 'der' 'die' 'das' tokens that were word aligned to 'this'. We use
client-side XSLT to figure out which target words are translation
candidates of the word forms that were searched for, based on the token
numbers.
Best,
Ruprecht
Am 12.12.2016 um 11:17 schrieb Stefan Evert:
>> On 24 Nov 2016, at 16:55, Annarita Felici <Annarita.Felici at unige.ch> wrote:
>>
>> I am planning to build a bidirectional parallel corpus of legal texts German-Italian/Italian-German. For the alignment I was thinking of using Giza ++, but before embarking on this, I would like to know if I can import later one Giza alignment on CQPWeb.
> CWB alignment attributes are designed for sentence-level alignment, not for word alignment. Depending on how complex your word alignment is, the CWB mechanism could in theory be abused to store the alignment information, but its use in CWB would be severely limited and CQPweb would not be able to display it properly.
>
> Are you sure that the automatic word alignment will be good enough for use in corpus searches? If you're satisfied with a simple 1-to-1 alignment, you could possibly encode just the aligned words in a p-attribute and display it as a "gloss" in CQPweb.
>
> Best,
> Stefan
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list