[CWB] agreement checks
Stefan Evert
stefan.evert at uos.de
Thu May 29 10:01:35 CEST 2008
>
Hi Gertrud!
Have you been using TIGERSearch or a reasoning system recently? If I
understand correctly what you're trying to do with those macros, it's
a strategy that might work in these tools but not in CQP.
> The Southern Bantu language Sepedi makes use of noun classes, and
> we mark these classes in making them a part of a positional
> attribute (usually part of speech) by a number, so a noun of class
> 1 is called N01, etc (no separate feature set attribute is encoded).
>
> In order to identify e.g. a noun phrase correctly, these numbers
> have to be compared first, e.g.
>
> Monna/N01 yo/CDEM01
> noun demonstrative concord
> man this
> -> this man.
>
> I've been writing rather simple macros for a while now, and know
> how to read in a known value (using $0, $1, etc.), so firstly I
> thought I have to write a little perl looping one macro over all
> possible noun classes. However, it would be nicer if I could utlize
> something like the following macro (which does not work, the error
> is in the first line, as it seems)
>
> MACRO np($0 = "[0-9]" | "10" | "1[45]" )
> np_ = [pos = "N.$0] [pos="CDEM.$0];
> cat np_;
>
Macros don't allow default arguments. In the definition, you can only
assign descriptive names, which are not used anywhere in the macro
body and invocation; they're only displayed for information by the
command-line completion function.
> I did try the following as well, no luck:
> MACRO np($0 = "[0-9]" | $0 = "10" | $0 = "1[45]" )
> ..
>
> If one of you knows any way to encode this macro, please help.
> THANKS in advance,
Sorry, what you're trying to do is simply not possible (and it
wouldn't work in any corpus query system I can think of). CQP macros
perform simple string replacement, so whatever you do would just
insert the same regular expression pattern in both places without
ensuring that the number are actually the same for N and CDEM.
The CQP way :-) is to encode the class feature in a separate
attribute (rule of thumb: any bit of information that you want to
test - and especially compare - individually has to go into an
attribute of its own), say noun_class. For tokens to which noun class
doesn't apply the attribute will be undefined; it's convenient to
assign a string value, e.g. "--", so they don't show up as __UNDEF__
in CQP's display.
Then it's easy to write your query:
a:[pos="N" & class != "--"] b:[pos = "CDEM"] :: a.class = b.class;
If you want to embed this in a reusable macro (that you might invoke
multiple times in a query), it's convenient to use the $$ pseudo arg
to generate a unique label:
MACRO np(0)
np$$: [pos="N"] [pos="CDEM" & class = np$$.class]
;
Best,
Stefan
More information about the CWB
mailing list