[CWB] Adding metada to corpus via CQPWeb
Martí Quixal
marti.quixal at gmail.com
Tue Oct 16 03:43:04 CEST 2012
Hi all,
I just installed my first corpus, but I did not manage to associate
metadata to it. Instructions say I should prepare a separate file for
the metadata where the first column contains the text_id (one line per
text). My corpus has several text, with different text ids.
Currently I am only using two different types of metadata (I am
playing around still)
text has the attribute id, which an id like
AF002, etc.
lang has the attribute code, which currently can only be en (but I
foresee that it can have en, es, fr...)
How should my metadata file look like? Like this? (I write \tab cause
I cannot use tabs)
AF002 \tab en
AF003 \tab en
AF004 \tab en
AF006 \tab en
...
That sounds a bit weird.
The other thing is I don't quite understand the terminology used in
the form to add metadata in the CQPWeb interface:
- handle? lang or lang_code?
- description (a free description or the way it will appear in the
search/query interface)
- classification or free text is clear (but where do I declare my
classifications?)
- how should I decide which is the primary field (I would say it is
text_id, which apparently is default)
Just for info, the corpus I am testing the process with looks like this:
<text id="AF002">
buenas bueno ADJ
tardes tarde NC
estamos estar VEfin
aquí aquí ADV
con con PREP
X X NC
gracias gracia NC
por por PREP
hacer hacer VLinf
esta este DM
entrevista entrevistar VLfin
laura laura NC
cuándo cuándo ADV
y y CC
dónde dónde ADV
naciste nacer VLfin
<lang code="en">
ok ok VV
um um RB
</lang>
nací nacer VLfin
en en PREP
1988 @card@ CARD
este este DM
nací nacer VLfin
aquí aquí ADV
en en PREP
el el ART
paso paso NC
<lang code="en">
texas texas NN
</lang>
en en PREP
octubre octubre NMON
(...)
Best regards,
Martí
--
Martí Quixal
Computational Linguist & Educational Technologist
http://www.iqubo.org/quixal
More information about the CWB
mailing list