[CWB] Structural attributes as feature sets?

Stefan Evert stefanML at collocations.de
Sun Apr 21 13:39:45 CEST 2013


> I'm reading the corpus encoding tutorial and in section 5 I've found interesting stuff about feature sets for positional attributes. I am wondering if it would be possible to use such feature but with structural attributes.

Yes.

> Say that in my corpus I've collected information about the speakers, and some of them can speak more than one foreign language. I would like to have a structural attribute like
> 
> foreing_languages="ES|PT|IT"
> 
> for each text produced by that particular speaker.

Simply encode them in feature set format as you would for positional attributes.  In your case, you need to add leading and trailing "|" separators as specified in the tutorial, e.g.

	<speaker foreign_languages="|ES|PT|IT|">
	...
	</speaker>

and declare the foreign_languages XML attribute to be set valued (cf. "cwb-encode -h"):

	cwb-encode .... -S speaker:0+foreign_languages/

(the trailing slash marks foreign_languages as a set-valued attribute).  cwb-encode will validate the set notation of attribute values and re-order the set elements alphabetically (keep in mind that sets are unordered, so you cannot specify first, second and third foreign language in this way).

You will then be able to restrict searches to speakers who know Portuguese e.g. with a global constraint such as

	... query ... :: match.speaker_foreign_language contains "PT";

Hope this helps,
Stefan




More information about the CWB mailing list