[CWB] Test corpus indexing

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Oct 18 17:06:54 CEST 2012


Hi Andrés,

The error message indicates that your mistake was, on the corpus-install page, leaving the details under "P-attributes" set to "Use default setup for P-attributes", when you should have switched it to "Use custom setup" and described your second column, i.e. "pos", in the table opposite.

You need to delete the corpus and start again, I'm afraid.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of "Andrés Chandía"
Sent: 18 October 2012 15:01
To: Open source development of the Corpus WorkBench
Subject: [CWB] Test corpus indexing

Hi there I'm trying to index a small test corpus but I always get an error:

Corpus: (file prueba2.txt)

<text id="prueba">
texto   N
crudo   Aj
para    P
la      D
prueba  N
de      P
la      D
interfaz        N
web     N
de      P
CQP     N
</text>

it is uploaded at the upload area

then in Install new corpus:
Specify the MySQL name of the corpus you wish to create = prueba
Specify the CWB name of the corpus you wish to create = prueba
Enter the full name of the corpus = prueba

I click on "Install corpus with settings above" and...

CQPweb encountered an error and could not continue.

cwb-huffcode reported an error! Corpus indexing aborted.

/usr/local/bin/cwb-makeall -r /B_NFS_P/diposit/corpora/cwb/registry -V PRUEBA 2>&1

=== Makeall: processing corpus PRUEBA === Registry directory:

/B_NFS_P/diposit/corpora/cwb/registry ATTRIBUTE word

  + creating LEXSRT ... OK  - lexicon

OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating REVCIDX ... OK

+ creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK ATTRIBUTE pos  +

creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token

stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK

- index        OK ATTRIBUTE hw  + creating LEXSRT ... OK  - lexicon      OK  + creating FREQS

... OK  - frequencies  OK  - token stream OK  + creating REVCIDX ... OK  + creating REVCORP

... OK  ? validating REVCORP ... OK  - index        OK ATTRIBUTE semtag  + creating LEXSRT ...

OK  - lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  +

creating REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK  - index

 OK ATTRIBUTE class  + creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  -

frequencies  OK  - token stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ?

validating REVCORP ... OK  - index        OK ATTRIBUTE lemma  + creating LEXSRT ... OK  -

lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating

REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK

========================================

... in file .../cqp/lib/admin-install.inc.php line 467.


At the database I got createt next stuffs:

annotation_metadata
corpus

handle

description

tagset

external_url

prueba

lemma

Tagged lemma

Lemma/OST

http://www.natcorp.ox.ac.uk/XMLedition/URG/codes.h...

prueba

class

Simple tag

Oxford Simplified Tags

http://www.natcorp.ox.ac.uk/XMLedition/URG/codes.h...

prueba

hw

Lemma

Lemma



prueba

semtag

Semantic tag

USAS Tagset

http://ucrel.lancs.ac.uk/usas/

prueba

pos

Part-of-speech tag

CLAWS7 Tagset

http://ucrel.lancs.ac.uk/claws7tags.html


corpus_metadata_fixed
corpus

visible

primary_classification_field

primary_annotation

secondary_annotation

tertiary_annotation

tertiary_annotation_tablehandle

combo_annotation

external_url

public_freqlist_desc

corpus_cat

cwb_external

prueba

1

NULL

pos

hw

class

oxford_simplified_tags

lemma

NULL

NULL

1

0



Finally the corpus name do appear at the starting page of web interface, but on clicking it I got this:

The page ".../cqp/prueba/" can not be located

What should I check or, what I'm doing wrong?


_______________________
            andrés chandía
[Image removed by sender. chandia.net]<http://www.chandia.net>
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121018/105a3ffc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD000.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD000.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121018/105a3ffc/attachment-0001.jpg>


More information about the CWB mailing list