[CWB] Adding metada to corpus via CQPWeb

Martí Quixal marti.quixal at gmail.com
Tue Oct 16 03:43:04 CEST 2012


Hi all,

I just installed my first corpus, but I did not manage to associate
metadata to it. Instructions say I should prepare a separate file for
the metadata where the first column contains the text_id (one line per
text). My corpus has several text, with different text ids.

Currently I am only using two different types of metadata (I am
playing around still)

text has the attribute id, which an id like
AF002, etc.

lang has the attribute code, which currently can only be en (but I
foresee that it can have en, es, fr...)

How should my metadata file look like? Like this? (I write \tab cause
I cannot use tabs)

AF002 \tab en
AF003 \tab en
AF004 \tab en
AF006 \tab en
...

That sounds a bit weird.

The other thing is I don't quite understand the terminology used in
the form to add metadata in the CQPWeb interface:

- handle? lang or lang_code?
- description (a free description or the way it will appear in the
search/query interface)
- classification or free text is clear (but where do I declare my
classifications?)
- how should I decide which is the primary field (I would say it is
text_id, which apparently is default)

Just for info, the corpus I am testing the process with looks like this:

<text id="AF002">
buenas  bueno   ADJ
tardes  tarde   NC
estamos estar   VEfin
aquí    aquí    ADV
con     con     PREP
X    X   NC
gracias gracia  NC
por     por     PREP
hacer   hacer   VLinf
esta    este    DM
entrevista      entrevistar     VLfin
laura   laura   NC
cuándo  cuándo  ADV
y       y       CC
dónde   dónde   ADV
naciste nacer   VLfin
<lang code="en">
ok      ok      VV
um      um      RB
</lang>
nací    nacer   VLfin
en      en      PREP
1988    @card@  CARD
este    este    DM
nací    nacer   VLfin
aquí    aquí    ADV
en      en      PREP
el      el      ART
paso    paso    NC
<lang code="en">
texas   texas   NN
</lang>
en      en      PREP
octubre octubre NMON
(...)

Best regards,
Martí


--
Martí Quixal
Computational Linguist & Educational Technologist
http://www.iqubo.org/quixal


More information about the CWB mailing list