[CWB] Querying parallel corpora
Stefan Evert
stefanML at collocations.de
Thu Nov 12 14:22:22 CET 2015
> On 12 Nov 2015, at 12:23, José Manuel Martínez Martínez <chozelinek at gmail.com> wrote:
>
> So, if I want to see the aligned sentences corresponding to the matches I just type this:
>
> show +tdc-tt-fl
>
> And then my query:
>
> [word="catch-the-eye"];
>
> Can one tabulate or save the alignments somehow corresponding to the matches? If yes, how?
Well, you can always redirect the "cat" output to a file:
cat > "output_with_alignment.txt";
If you want more control with the help of "tabulate" and if you're using a recent beta version of CQP (v3.4.7 or newer), you can also "translate" the query result to the target language. Note that this is an experimental feature, so no guarantees …
Let me give you an example based on the Europarl corpus:
[no corpus]> EUROPARL-EN
EUROPARL-EN> Law = "German" "law";
EUROPARL-EN> cat Law 0 2;
1317230: It is a funny saga of the mishaps and adventures of this group of men , who live beyond the margins of German society in the shadowy areas outside <German law> .
2366610: And the <German law> on energy saving is clearly supported by the proposal for a directive .
4145616: An example would be the <German law> on non-medical practitioners .
# now we use the new from … to … command to "translate" the query results to the aligned regions
EUROPARL-EN> Gesetz = from Law to EUROPARL-DE;
EUROPARL-EN> tabulate EUROPARL-DE:Gesetz 0 2 match .. matchend word;
Es ist eine humorvolle Erz?hlung von Mi?geschicken und Abenteuern , die eine Gruppe von M?nnern erlebt , die am Rande der deutschen Gesellschaft im Graubereich au?erhalb der deutschen Gesetze leben .
Auch das deutsche Stromeinspeisegesetz wird mit dem Richtlinienvorschlag klar unterst?tzt .
Ein Beispiel ist das deutsche Heilpraktikergesetz .
# one problem is that matches without alignment to the target language are silently dropped; the same happens for multiple matches within the same alignment bead;
# notice that the translated query result has only 35 lines rather than 38
EUROPARL-EN> show named;
Named Query Results:
m-* EUROPARL-DE:Gesetz [35]
m-* EUROPARL-EN:Law [38]
# if you want corresponding regions for both languages (which you probably do), you can translate back into English;
# of course, the actual matches are no longer marked, and there is no easy workaround for this
EUROPARL-EN> Law2 = from EUROPARL-DE:Gesetz to EUROPARL-EN;
EUROPARL-EN> show named;
Named Query Results:
m-* EUROPARL-EN:Law2 [35]
m-* EUROPARL-DE:Gesetz [35]
m-* EUROPARL-EN:Law [38]
Hope this helps,
Stefan
More information about the CWB
mailing list