[CWB] Specific regular expression
Stefan Evert
stefanML at collocations.de
Thu Jun 12 11:26:46 CEST 2014
> Thanks again. I have another similar question related to the regular expression [lem = “(davon|weg)?brausen”%c]
But do keep in mind that this will only find instances where the verb is actually written as a single word, e.g.
Das Auto ist davongebraust.
but not where the particle is separated, as in
Das Auto braust davon.
To identify the latter reliably, you'll need a fairly good syntactic parser. I'm not sure whether any publicly available software does this and also recombines the particle to form the correct lemma. I know that Hannah Kermes' YAC chunker can do it, but don't think that it's easily available.
As a rough approximation, you can search for the verb followed by a suitable particle, e.g.
[lem = "brausen"] [pos != "V.*|AP.*"]* [pos = "PTKVZ"]
assuming there is a "pos" attribute with STTS tags. However, in my experience taggers tend do have difficulty recognizing PTKVZ correctly and may mistag them as prepositions.
— Stefan
Kermes, Hannah (2003). Off-line (and On-line) Text Analysis for Computational Lexicography. Ph.D. thesis, IMS, University of Stuttgart. Arbeitspapiere des Instituts für Maschinelle Sprachverarbeitung (AIMS), volume 9, number 3.
More information about the CWB
mailing list