[CWB] Structural attributes in aligned languages

Stefan Evert stefanML at collocations.de
Tue Mar 24 13:54:04 CET 2015


> On 24 Mar 2015, at 12:13, Ruprecht von Waldenfels <ruprecht.waldenfels at gmx.net> wrote:
> 
>  I don't think it's worth making an effort to implement this; as you say, these things can be extracted in other ways.

If you compile the latest version of CWB 3.4.8 from the SVN repository, you can try a new experimental command in CQP:

	EUROPARL-DE;
	Zeit = [lemma = "Zeit"];
	Time = from Zeit to EUROPARL-EN;

This will generate a new named query result

	EUROPARL-EN:Time

containing the full aligned regions for each match of Zeit.  Multiple matches in the same alignment bead are translated into duplicate ranges in the target corpus, but unaligned matches are silently dropped.

You can then e.g. tabulate the new named query result to obtain metadata

	tabulate EUROPARL-EN:Time match text_date;

but don't directly cat it without activating the target corpus first!

As it turns out, there is a long-standing bug in CQP which messes up the context descriptor (printed by "show cd") if you cat a corpus that isn't currently activated (and there are currently no plans to fix this bug).  For the same reason, the new command cannot be used without assignment to a named result, i.e.

	from Zeit to EUROPARL-EN;

will just raise a syntax error.

TIA for testing and hope you can make use of the new command!
Stefan



More information about the CWB mailing list