[CWB] Structural Attributes

Stefan Evert stefanML at collocations.de
Tue Nov 18 17:36:20 CET 2014


On 18 Nov 2014, at 14:21, Maarten Janssen <maartenpt at gmail.com> wrote:

> (1) When using a (potentially) complex CQL it would be useful to be able to restrict that whole query to texts of a specific year, say
> 
> [word=".*ion"] []{3,5} [pos="V.*"] :: text_year="1900"

Ruprecht already gave the answer to this issue.  I'd just like to add that it's often much faster to restrict the search to the subcorpus rather than filter the full set of results:

	Sub = <text_year = "1900"> [] expand to text;
	Sub;
	[word=".*ion"] []{3,5} [pos="V.*"]

> (2) Is there any way to sort results on structural attributes? Nothing of the things I tries works, the most obvious being: 
> 	
> 	Matches = [word="in.*"];
> 	sort Matches by text_year;
> 
> Maarten

Unfortunately, that's not possible for technical reasons (s-attributes and p-attributes use a different representation; I've adapted the "group" command to allow counts on s-attributes, but the sort/count code is more complicated).

If you really, really want to do it, add a p-attribute that specifies the year of publication for every token (so it'll be quite repetitive, but compression should keep the extra disk space required manageable). 

Also keep in mind that the sort is always in CQP's internal lexicographic ordering, so uncertain years like [1900] won't be in the proper numeric sequence.

Cheers,
Stefan



More information about the CWB mailing list