[CWB] agreement checks

Stefan Evert stefan.evert at uos.de
Thu May 29 10:01:35 CEST 2008


>

Hi Gertrud!

Have you been using TIGERSearch or a reasoning system recently? If I  
understand correctly what you're trying to do with those macros, it's  
a strategy that might work in these tools but not in CQP.

> The Southern Bantu language Sepedi makes use of noun classes, and  
> we mark these classes in making them a part of a positional  
> attribute (usually part of speech)  by a number, so a noun of class  
> 1 is called N01, etc (no separate feature set attribute is encoded).
>
> In order to identify e.g. a noun phrase correctly, these numbers  
> have to be compared first, e.g.
>
> Monna/N01     yo/CDEM01
> noun                demonstrative concord
> man                 this
> -> this man.
>
> I've been writing rather simple macros for a while now, and know  
> how to read in a known value (using $0, $1, etc.), so firstly I  
> thought I have to write a little perl looping one macro over all  
> possible noun classes. However, it would be nicer if I could utlize  
> something like the following macro (which does not work, the error  
> is in the first line, as it seems)
>
> MACRO np($0 =  "[0-9]" | "10" | "1[45]" )
>               np_ = [pos = "N.$0] [pos="CDEM.$0];
>               cat np_;
>

Macros don't allow default arguments. In the definition, you can only  
assign descriptive names, which are not used anywhere in the macro  
body and invocation; they're only displayed for information by the  
command-line completion function.

> I did try the following as well, no luck:
> MACRO np($0 =  "[0-9]" | $0 = "10" | $0 = "1[45]" )
> ..
>
> If one of you knows any way to encode this macro, please help.  
> THANKS in advance,

Sorry, what you're trying to do is simply not possible (and it  
wouldn't work in any corpus query system I can think of). CQP macros  
perform simple string replacement, so whatever you do would just  
insert the same regular expression pattern in both places without  
ensuring that the number are actually the same for N and CDEM.

The CQP way :-) is to encode the class feature in a separate  
attribute (rule of thumb: any bit of information that you want to  
test - and especially compare - individually has to go into an  
attribute of its own), say noun_class. For tokens to which noun class  
doesn't apply the attribute will be undefined; it's convenient to  
assign a string value, e.g. "--", so they don't show up as __UNDEF__  
in CQP's display.

Then it's easy to write your query:

	a:[pos="N" & class != "--"] b:[pos = "CDEM"] :: a.class = b.class;

If you want to embed this in a reusable macro (that you might invoke  
multiple times in a query), it's convenient to use the $$ pseudo arg  
to generate a unique label:

MACRO np(0)
	np$$: [pos="N"] [pos="CDEM" & class = np$$.class]
;

Best,
Stefan


More information about the CWB mailing list