[CWB] s-attributes

Stefan Evert stefanML at collocations.de
Thu Apr 11 17:03:07 CEST 2013


On 11 Apr 2013, at 16:18, "Hardie, Andrew" <a.hardie at lancaster.ac.uk> wrote:

> Subcorpus = <p_monthstudy="[1-7]">[]*</p_monthstudy>;
> Subcorpus;

For technical reasons, it's better to use this form:

	Subcorpus = <p_monthstudy="[1-7]">[] expand to p_monthstudy;
	Subcorpus;

otherwise you'll lose all longer paragraphs (containing more than 100 tokens); on a large corpus, this form will also be substantially faster.

If you don't mind a loss of efficiency, you can run the query on the full corpus and post-filter your results with a global constraint.  Note that if you're not confident about working out the correct regular expressions to match single- and double-digit months correctly, you can use numeric comparisons in this second version.  Perform this without activating a subcorpus:

	... your query ... :: int(match.p_monthstudy) >= 1 & int(match.p_monthstudy) <= 7;

You should perhaps add a "within" clause to the query to make sure that the entire match is within a single paragraph, otherwise it's not very sensible to filter on the p_monthstudy attribute.

Hope this hilft,
Stefan





More information about the CWB mailing list