[CWB] Querying parallel corpora

Ruprecht von Waldenfels ruprecht.waldenfels at gmx.net
Thu Nov 12 18:21:12 CET 2015


Hi,
I just wanted to point you to ParaVoz -
https://bitbucket.org/rvwfels/paravoz2

this is a web interface to parallel corpora that should be easy to use 
and configure. Have a look!
Best,
Ruprecht

Am 12.11.2015 um 05:22 schrieb Stefan Evert:
>> On 12 Nov 2015, at 12:23, José Manuel Martínez Martínez <chozelinek at gmail.com> wrote:
>>
>> So, if I want to see the aligned sentences corresponding to the matches I just type this:
>>
>> show +tdc-tt-fl
>>
>> And then my query:
>>
>> [word="catch-the-eye"];
>>
>> Can one tabulate or save the alignments somehow corresponding to the matches? If yes, how?
> Well, you can always redirect the "cat" output to a file:
>
> 	cat > "output_with_alignment.txt";
>
> If you want more control with the help of "tabulate" and if you're using a recent beta version of CQP (v3.4.7 or newer), you can also "translate" the query result to the target language.  Note that this is an experimental feature, so no guarantees …
>
> Let me give you an example based on the Europarl corpus:
>
> [no corpus]> EUROPARL-EN
> EUROPARL-EN> Law = "German" "law";
> EUROPARL-EN> cat Law 0 2;
>    1317230:  It is a funny saga of the mishaps and adventures of this group of men , who live beyond the margins of German society in the shadowy areas outside <German law> .
>    2366610:  And the <German law> on energy saving is clearly supported by the proposal for a directive .
>    4145616:  An example would be the <German law> on non-medical practitioners .
>
> # now we use the new from … to … command to "translate" the query results to the aligned regions
> EUROPARL-EN> Gesetz = from Law to EUROPARL-DE;
> EUROPARL-EN> tabulate EUROPARL-DE:Gesetz 0 2 match .. matchend word;
> Es ist eine humorvolle Erz?hlung von Mi?geschicken und Abenteuern , die eine Gruppe von M?nnern erlebt , die am Rande der deutschen Gesellschaft im Graubereich au?erhalb der deutschen Gesetze leben .
> Auch das deutsche Stromeinspeisegesetz wird mit dem Richtlinienvorschlag klar unterst?tzt .
> Ein Beispiel ist das deutsche Heilpraktikergesetz .
>
> # one problem is that matches without alignment to the target language are silently dropped; the same happens for multiple matches within the same alignment bead;
> # notice that the translated query result has only 35 lines rather than 38
> EUROPARL-EN> show named;
> Named Query Results:
>     m-*  EUROPARL-DE:Gesetz [35]
>     m-*  EUROPARL-EN:Law [38]
>
> # if you want corresponding regions for both languages (which you probably do), you can translate back into English;
> # of course, the actual matches are no longer marked, and there is no easy workaround for this
> EUROPARL-EN> Law2 = from EUROPARL-DE:Gesetz to EUROPARL-EN;
> EUROPARL-EN> show named;
> Named Query Results:
>     m-*  EUROPARL-EN:Law2 [35]
>     m-*  EUROPARL-DE:Gesetz [35]
>     m-*  EUROPARL-EN:Law [38]
>
> Hope this helps,
> Stefan
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list