[CWB] agreement checks

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu May 29 10:29:53 CEST 2008


Gertrud,

Although Stefan's suggestion about having class as a separate attribute
is clearly the most elegant and efficient solution, a stop-gap temporary
fix that will work with what you've got now might be the following:

MACRO np(0)
         ( [pos = "N01"] [pos = "CDEM01"] 
         | [pos = "N02"] [pos = "CDEM02"] 
         | [pos = "N03"] [pos = "CDEM03"] 
         --etc etc etc.-- 
         )
;

or, alternatively with a second macro to simplify matters:

MACRO npx(1)
	[pos = "N$0"] [pos = "CDEM$0"]
;
MACRO np(0)
	( /npx["01"] | /npx["02"] | /npx["03"] | --etc etc etc.--  )
;

usage: 
>myquery = /np[];
>cat myquery;

You end up with a very long query in either case but it should do the
job.

best

Andrew.



 

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it]
On Behalf Of Stefan Evert
Sent: 29 May 2008 09:02
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] agreement checks


>

Hi Gertrud!

Have you been using TIGERSearch or a reasoning system recently? If I  
understand correctly what you're trying to do with those macros, it's  
a strategy that might work in these tools but not in CQP.

> The Southern Bantu language Sepedi makes use of noun classes, and  
> we mark these classes in making them a part of a positional  
> attribute (usually part of speech)  by a number, so a noun of class  
> 1 is called N01, etc (no separate feature set attribute is encoded).
>
> In order to identify e.g. a noun phrase correctly, these numbers  
> have to be compared first, e.g.
>
> Monna/N01     yo/CDEM01
> noun                demonstrative concord
> man                 this
> -> this man.
>
> I've been writing rather simple macros for a while now, and know  
> how to read in a known value (using $0, $1, etc.), so firstly I  
> thought I have to write a little perl looping one macro over all  
> possible noun classes. However, it would be nicer if I could utlize  
> something like the following macro (which does not work, the error  
> is in the first line, as it seems)
>
> MACRO np($0 =  "[0-9]" | "10" | "1[45]" )
>               np_ = [pos = "N.$0] [pos="CDEM.$0];
>               cat np_;
>

Macros don't allow default arguments. In the definition, you can only  
assign descriptive names, which are not used anywhere in the macro  
body and invocation; they're only displayed for information by the  
command-line completion function.

> I did try the following as well, no luck:
> MACRO np($0 =  "[0-9]" | $0 = "10" | $0 = "1[45]" )
> ..
>
> If one of you knows any way to encode this macro, please help.  
> THANKS in advance,

Sorry, what you're trying to do is simply not possible (and it  
wouldn't work in any corpus query system I can think of). CQP macros  
perform simple string replacement, so whatever you do would just  
insert the same regular expression pattern in both places without  
ensuring that the number are actually the same for N and CDEM.

The CQP way :-) is to encode the class feature in a separate  
attribute (rule of thumb: any bit of information that you want to  
test - and especially compare - individually has to go into an  
attribute of its own), say noun_class. For tokens to which noun class  
doesn't apply the attribute will be undefined; it's convenient to  
assign a string value, e.g. "--", so they don't show up as __UNDEF__  
in CQP's display.

Then it's easy to write your query:

	a:[pos="N" & class != "--"] b:[pos = "CDEM"] :: a.class =
b.class;

If you want to embed this in a reusable macro (that you might invoke  
multiple times in a query), it's convenient to use the $$ pseudo arg  
to generate a unique label:

MACRO np(0)
	np$$: [pos="N"] [pos="CDEM" & class = np$$.class]
;

Best,
Stefan
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list