[CWB] Question about metadata

Josep M. Fontana josepm.fontana at upf.edu
Thu Feb 12 14:14:18 CET 2015


Hi,

We are trying to make it easy for users to add new texts to a corpus of 
texts used in language courses. One of the things that would make the 
corpus more useful would be to be able to have keywords related to 
content type that could be used to select texts to do a search or to do 
any of the other operations that are possible with specific metadata 
fields (e.g. frequencies of a certain expression in texts of type X vs. 
texts of type Y).

The problem is that it is a bit hard to classify a text with a single 
label and therefore restricting a particular field to only one category 
is rather limiting. What would be ideal would be to have fields where 
the person introducing the text would be able to add different keywords 
separated by commas as in the field 'ct' (for Content Type) below:


<doc title="Nice Title" id="C-03" century="20" ch="2'-1" ct="culture, 
politics, racial conflicts, US" >

<doc title="Another Nice Title" id="C-04" century="20" ch="2'-1" 
ct="politics, UK, society" >

Would that be a problem for CQP? Could CQP make use of partial segments 
of the information contained in a field? For instance, if such kind of 
metadata was introduced in the headings, would a query like the 
following be possible?

::match.doc_ct="racial conflicts";


Thanks in advance.

Josep M.


More information about the CWB mailing list