[CWB] Multi-word units

Stefan Evert stefanML at collocations.de
Fri Feb 15 18:02:37 CET 2013


On 14 Feb 2013, at 23:12, "Hardie, Andrew" <a.hardie at lancaster.ac.uk> wrote:

> I was thinking of this kind of arrangement:
> 
> apressurada	apressuradamientre
> mientre	{some kind of ditto mark or just __NULL__}
> 
> .... so that subsequent tokens on the two attributes stay in sync.

That's neat, but it doesn't work in (naive) queries, especially if users are not aware which words are multi-word tokens.  They'd have to write something like

  [pos = "adverb"] [word = "__NULL__"]? [ ... ] [word = "__NULL__"]? ...

Would an option to automatically ignore certain tokens (e.g. __NULL__ tokens, or all punctuation marks) in CQP queries be something useful for the wishlist and worth giving a try a the hackathon?

Cheers,
Stefan



More information about the CWB mailing list