[CWB] CQP macro libraries?

Stefan Evert stefan.evert at uos.de
Wed Aug 29 17:43:57 CEST 2007


Hi Eckart and everyone,

that's a great idea!  Perhaps we could open a section on the CWB wiki  
(btw, can everyone edit pages there? should we set up user accounts  
to avoid spamming and vandalism?) where we collect useful macros for  
various languages?  This could also include a few paragraphs on "how  
to design your macro library".

> I was wondering if there are CQP macro libraries already available  
> or if somebody here would be willing to share CQP macros they have  
> done.

Ok, just as a proof of concept ;-), I'll attach my personal  
"standard" CQP macros below.

> I have just started a little CQP macro library for our project.  
> It's tag-set dependent (the Penn-TreeBank tagset variant used by  
> Tree-Tagger) and only contains two macros so far for finding  
> passive forms of verbs. If people are interested it exchanging  
> macro definitions or talking about best practices in implementing  
> CQP macros... well, I'd be curious to hear about it ;)

Two comments on this:

- Best practice for macros that implement linguistic queries should  
be to think of the macro library as a non-recursive context-free  
grammar (each macro is a rule of the grammar), and to "overload"  
macros (same name, but different # of parameters) to implement useful  
default values for parameters (cf. the codist macros built into CQP).

- Tagset parametrisation can most easily be achieved by defining  
"word lists" (define $var ... in CQP) for word classes, which contain  
the tags belonging to  the respective word class (this makes it easy  
to implement a hierarchy of word classes by merging such word lists).  
Then base all macros on these word list variables rather than regexps  
for POS tags. This should allow users to substitute a different tag  
set by changing the word list definitions (as long as the tagsets are  
roughly equivalent). NB: such word lists can easily be read in by a  
macro, so you might simply call /use_brown[]; or /use_bnc[]; before  
running your queries.  An alternative would be to implement word  
classes as macros themselves.

Best to all of you! I'll be working on an updated version of the CWB/ 
Perl interface next.
Stefan


-----Stefan's standard macros-------

# sort query results by match, left context, or right context
#   /sort[<nqr>, <att>]  /sort_left[<nqr>, <att>]  /sort_right[<nqr>,  
<att>]
#     ... sort named query result <nqr> on attribute <att>
#   /sort[<nqr>]         /sort_left[<nqr>]         /sort_right[<nqr>]
#     ... defaults to "word" attribute
#   /sort[]              /sort_left[]              /sort_right[]
#     ... defaults to "Last" query result
# sort option "descending" can be specified after the sort macro;  
reverse sorts on left context
# and match can be performed with special macros (but option  
"descending" is not supported):
#   /sort_left_rev[...]       /sort_right_rev[...]
MACRO sort($0=NQR $1=Att)
   sort $0 by $1 %cd on match .. matchend
;
MACRO sort_left($0=NQR $1=Att)
   sort $0 by $1 %cd on match[-1] .. match[-42]
;
MACRO sort_right($0=NQR $1=Att)
   sort $0 by $1 %cd on matchend[1] .. matchend[42]
;
MACRO sort($0=NQR)
   /sort[$0, word]
;
MACRO sort_left($0=NQR)
   /sort_left[$0, word]
;
MACRO sort_right($0=NQR)
   /sort_right[$0, word]
;
MACRO sort()
   /sort[Last]
;
MACRO sort_left()
   /sort_left[Last]
;
MACRO sort_right()
   /sort_right[Last]
;

MACRO sort_rev($0=NQR $1=Att)
   sort $0 by $1 %cd on match .. matchend reverse
;
MACRO sort_left_rev($0=NQR $1=Att)
   sort $0 by $1 %cd on match[-42] .. match[-1] reverse
;
MACRO sort_rev($0=NQR)
   /sort_rev[$0, word]
;
MACRO sort_left_rev($0=NQR)
   /sort_left_rev[$0, word]
;
MACRO sort_rev()
   /sort_rev[Last]
;
MACRO sort_left_rev()
   /sort_left_rev[Last]
;


# compute frequencies of individual n-grams (n = 1 .. 4)
#   /ngram[word, "black"]; /ngram[word, "black", "box"];
#   /ngram[pos, "DT", "JJ.*", "NN.*"];
#   /ngram_flags[word, c, "a", "tale", "of"]; /ngram_flags[word, l,  
"."];
# (note that quotes may not be necessary for "simple" words that are  
not reserved keywords)
MACRO ngram($0=Att $1=Word1)
   _NGram = [$0 = "$1"];
   size _NGram;
   discard _NGram;
;

MACRO ngram($0=Att $1=Word1 $2=Word2)
   _NGram = MU(meet [$0 = "$1"] [$0 = "$2"] 1 1);
   size _NGram;
   discard _NGram;
;

MACRO ngram($0=Att $1=Word1 $2=Word2 $3=Word3)
   _NGram = MU(meet (meet [$0 = "$1"] [$0 = "$2"] 1 1) [$0 = "$3"] 2 2);
   size _NGram;
   discard _NGram;
;

MACRO ngram($0=Att $1=Word1 $2=Word2 $3=Word3 $4=Word4)
   _NGram = MU(meet (meet (meet [$0 = "$1"] [$0 = "$2"] 1 1) [$0 =  
"$3"] 2 2) [$0 = "$4"] 3 3);
   size _NGram;
   discard _NGram;
;

MACRO ngram_flags($0=Att $1=Flags $2=Word1)
   _NGram = [$0 = "$2" %$1];
   size _NGram;
   discard _NGram;
;

MACRO ngram_flags($0=Att $1=Flags $2=Word1 $3=Word2)
   _NGram = MU(meet [$0 = "$2" %$1] [$0 = "$3" %$1] 1 1);
   size _NGram;
   discard _NGram;
;

MACRO ngram_flags($0=Att $1=Flags $2=Word1 $3=Word2 $4=Word3)
   _NGram = MU(meet (meet [$0 = "$2" %$1] [$0 = "$3" %$1] 1 1) [$0 =  
"$4" %$1 2 2);
   size _NGram;
   discard _NGram;
;

MACRO ngram_flags($0=Att $1=Flags $2=Word1 $3=Word2 $4=Word3 $5=Word4)
   _NGram = MU(meet (meet (meet [$0 = "$2" %$1] [$0 = "$3" %$1] 1 1)  
[$0 = "$4" %$1 2 2) [$0 = "$5" %$1] 3 3);
   size _NGram;
   discard _NGram;
;



More information about the CWB mailing list