[CWB] Test corpus indexing
Hardie, Andrew
a.hardie at lancaster.ac.uk
Thu Oct 18 17:06:54 CEST 2012
Hi Andrés,
The error message indicates that your mistake was, on the corpus-install page, leaving the details under "P-attributes" set to "Use default setup for P-attributes", when you should have switched it to "Use custom setup" and described your second column, i.e. "pos", in the table opposite.
You need to delete the corpus and start again, I'm afraid.
best
Andrew.
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of "Andrés Chandía"
Sent: 18 October 2012 15:01
To: Open source development of the Corpus WorkBench
Subject: [CWB] Test corpus indexing
Hi there I'm trying to index a small test corpus but I always get an error:
Corpus: (file prueba2.txt)
<text id="prueba">
texto N
crudo Aj
para P
la D
prueba N
de P
la D
interfaz N
web N
de P
CQP N
</text>
it is uploaded at the upload area
then in Install new corpus:
Specify the MySQL name of the corpus you wish to create = prueba
Specify the CWB name of the corpus you wish to create = prueba
Enter the full name of the corpus = prueba
I click on "Install corpus with settings above" and...
CQPweb encountered an error and could not continue.
cwb-huffcode reported an error! Corpus indexing aborted.
/usr/local/bin/cwb-makeall -r /B_NFS_P/diposit/corpora/cwb/registry -V PRUEBA 2>&1
=== Makeall: processing corpus PRUEBA === Registry directory:
/B_NFS_P/diposit/corpora/cwb/registry ATTRIBUTE word
+ creating LEXSRT ... OK - lexicon
OK + creating FREQS ... OK - frequencies OK - token stream OK + creating REVCIDX ... OK
+ creating REVCORP ... OK ? validating REVCORP ... OK - index OK ATTRIBUTE pos +
creating LEXSRT ... OK - lexicon OK + creating FREQS ... OK - frequencies OK - token
stream OK + creating REVCIDX ... OK + creating REVCORP ... OK ? validating REVCORP ... OK
- index OK ATTRIBUTE hw + creating LEXSRT ... OK - lexicon OK + creating FREQS
... OK - frequencies OK - token stream OK + creating REVCIDX ... OK + creating REVCORP
... OK ? validating REVCORP ... OK - index OK ATTRIBUTE semtag + creating LEXSRT ...
OK - lexicon OK + creating FREQS ... OK - frequencies OK - token stream OK +
creating REVCIDX ... OK + creating REVCORP ... OK ? validating REVCORP ... OK - index
OK ATTRIBUTE class + creating LEXSRT ... OK - lexicon OK + creating FREQS ... OK -
frequencies OK - token stream OK + creating REVCIDX ... OK + creating REVCORP ... OK ?
validating REVCORP ... OK - index OK ATTRIBUTE lemma + creating LEXSRT ... OK -
lexicon OK + creating FREQS ... OK - frequencies OK - token stream OK + creating
REVCIDX ... OK + creating REVCORP ... OK ? validating REVCORP ... OK - index OK
========================================
... in file .../cqp/lib/admin-install.inc.php line 467.
At the database I got createt next stuffs:
annotation_metadata
corpus
handle
description
tagset
external_url
prueba
lemma
Tagged lemma
Lemma/OST
http://www.natcorp.ox.ac.uk/XMLedition/URG/codes.h...
prueba
class
Simple tag
Oxford Simplified Tags
http://www.natcorp.ox.ac.uk/XMLedition/URG/codes.h...
prueba
hw
Lemma
Lemma
prueba
semtag
Semantic tag
USAS Tagset
http://ucrel.lancs.ac.uk/usas/
prueba
pos
Part-of-speech tag
CLAWS7 Tagset
http://ucrel.lancs.ac.uk/claws7tags.html
corpus_metadata_fixed
corpus
visible
primary_classification_field
primary_annotation
secondary_annotation
tertiary_annotation
tertiary_annotation_tablehandle
combo_annotation
external_url
public_freqlist_desc
corpus_cat
cwb_external
prueba
1
NULL
pos
hw
class
oxford_simplified_tags
lemma
NULL
NULL
1
0
Finally the corpus name do appear at the starting page of web interface, but on clicking it I got this:
The page ".../cqp/prueba/" can not be located
What should I check or, what I'm doing wrong?
_______________________
andrés chandía
[Image removed by sender. chandia.net]<http://www.chandia.net>
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121018/105a3ffc/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD000.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD000.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121018/105a3ffc/attachment-0001.jpg>
More information about the CWB
mailing list