[CWB] Question about metadata
Josep M. Fontana
josepm.fontana at upf.edu
Thu Feb 12 14:14:18 CET 2015
Hi,
We are trying to make it easy for users to add new texts to a corpus of
texts used in language courses. One of the things that would make the
corpus more useful would be to be able to have keywords related to
content type that could be used to select texts to do a search or to do
any of the other operations that are possible with specific metadata
fields (e.g. frequencies of a certain expression in texts of type X vs.
texts of type Y).
The problem is that it is a bit hard to classify a text with a single
label and therefore restricting a particular field to only one category
is rather limiting. What would be ideal would be to have fields where
the person introducing the text would be able to add different keywords
separated by commas as in the field 'ct' (for Content Type) below:
<doc title="Nice Title" id="C-03" century="20" ch="2'-1" ct="culture,
politics, racial conflicts, US" >
<doc title="Another Nice Title" id="C-04" century="20" ch="2'-1"
ct="politics, UK, society" >
Would that be a problem for CQP? Could CQP make use of partial segments
of the information contained in a field? For instance, if such kind of
metadata was introduced in the headings, would a query like the
following be possible?
::match.doc_ct="racial conflicts";
Thanks in advance.
Josep M.
More information about the CWB
mailing list