[CWB] Re: Output from cwb-align

Gabriele Brandolini gabriele.brandolini at gmail.com
Mon Oct 11 10:56:11 CEST 2010


Dear Stefan,
Thanks for your suggestions, that I could follow, about the output of
alignment with cwb.

Excuse me if I bother you again. When you get a bit of time could you
help me a step by step guide in order to use the queries of CQP with
an interface like that used in your demo. But in local, as I have
installed Apache2  and i've used it with other program (NATools.)
Really I've tried to find a tutorial  in Internet, but without
success. You know also that it is not always  easy for me to get a
fast and continuous internet connection, so my search maybe was
imperfect.
Thank you very much again.
Gabriel

2010/9/28, Stefan Evert <stefanML a collocations.de>:
>
> On 24 Sep 2010, at 11:16, Gabriele Brandolini wrote:
>
>> I’ve tried to align bilingual corpora (Latin-Swahili) by using cwb-align
>> and some very useful and clear instructions given by Stefan. Thanks to
>> him!
>>
>> I got some good output, and I’m now checking it.
>>
>>
>> Please, someone could tell me if it’s possible, and how, to save the
>> aligned file in a readable format, say txt, each line having one
>> <source_aligned_sentence(s)>TAB (or something
>> else)<target_aligned_sentence(s)> ?
>
> I'm afraid there isn't a ready-made tool that does exactly what you need.
> It would be relatively straightforward to write such a program in Perl, if
> you've got the CWB::CL Perl module installed.
>
> Another option is to encode the alignment for use with CQP and then do
> something like the following in CQP -- e.g. for EUROPARL-EN aligned with
> EUROPARL-FR:
>
> EUROPARL-EN;
> Sents = <s> [] :EUROPARL-FR [];
> show -cpos +europarl-fr;
> set ld "";
> set rd "";
> set Context europarl-fr;
> cat Sents > "alignment-pairs.txt";
>
> This will print alignment pairs in two consecutive lines (instead of
> separated by a TAB), where the second line is always marked with
> "-->europar-fr:".  It should be easy to convert this file into the format
> you need.  However, there will be some duplicates whenever multiple
> sentences in the source language form a single alignment block (there's no
> way around this at the moment, I'm afraid).
>
> Hope this helps a bit,
> Stefan
>
>
> _______________________________________________
> CWB mailing list
> CWB a sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>

-- 
Inviato dal mio dispositivo mobile


More information about the CWB mailing list