[CWB] Specific regular expression

Stefan Evert stefanML at collocations.de
Thu Jun 12 11:26:46 CEST 2014


> Thanks again. I have another similar question related to the regular expression  [lem =  “(davon|weg)?brausen”%c]

But do keep in mind that this will only find instances where the verb is actually written as a single word, e.g.

	Das Auto ist davongebraust.

but not where the particle is separated, as in

	Das Auto braust davon.

To identify the latter reliably, you'll need a fairly good syntactic parser.  I'm not sure whether any publicly available software does this and also recombines the particle to form the correct lemma.  I know that Hannah Kermes' YAC chunker can do it, but don't think that it's easily available.


As a rough approximation, you can search for the verb followed by a suitable particle, e.g.

	[lem = "brausen"] [pos != "V.*|AP.*"]* [pos = "PTKVZ"]

assuming there is a "pos" attribute with STTS tags.  However, in my experience taggers tend do have difficulty recognizing PTKVZ correctly and may mistag them as prepositions.

— Stefan



Kermes, Hannah (2003). Off-line (and On-line) Text Analysis for Computational Lexicography. Ph.D. thesis, IMS, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS), volume 9, number 3.




More information about the CWB mailing list