[CWB] Problem indexing even tiny corpus

Andres Chandia andres at chandia.net
Fri May 17 15:38:18 CEST 2013



Hi there,
file prueba.txt  (attached to this mail) put in ../cqp/uploads/
chown
www-data:bncweb ../cqp/uploads/prueba.txt

then from the web interface:  Install new corpus > 
Specify the MySQL name of the corpus
you wish to create: prueba    
Specify the CWB name of the corpus you wish
to create: prueba
Enter the full name of the corpus: Prueba
Include? x prueba.txt
					Use default setup for S-attributes (only <s>)
Use default setup for
P-attributes (pos, hw, semtag, class, lemma)
> Install corpus with settings above
<

and then I get this, I'm a little lost, what should I look for, what to
fix?


CQPweb encountered an error and could not continue.
cwb-huffcode reported an error! Corpus indexing aborted. 
/usr/local/bin/cwb-encode -xsB -c utf8 -d /B_NFS_P/diposit/corpora/cwb/data/prueba -f
/B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt -R
/B_NFS_P/diposit/corpora/cwb/registry/prueba  -P pos -P hw -P semtag -P class -P lemma -S s -S
text:0+id 2>&1 Undeclared element attribute  ignored (file
/B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt, line #1, warning issued only once).
Undeclared element attribute  ignored (file /B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt,
line #1, warning issued only once). /usr/local/bin/cwb-makeall -r
/B_NFS_P/diposit/corpora/cwb/registry -V PRUEBA 2>&1 === Makeall: processing corpus
PRUEBA === Registry directory: /B_NFS_P/diposit/corpora/cwb/registry ATTRIBUTE word  +
creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token
stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK 
- index        OK ATTRIBUTE pos  + creating LEXSRT ... OK  - lexicon      OK  + creating FREQS
... OK  - frequencies  OK  - token stream OK  + creating REVCIDX ... OK  + creating REVCORP
... OK  ? validating REVCORP ... OK  - index        OK ATTRIBUTE hw  + creating LEXSRT ... OK 
- lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating
REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK
ATTRIBUTE semtag  + creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  -
frequencies  OK  - token stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ?
validating REVCORP ... OK  - index        OK ATTRIBUTE class  + creating LEXSRT ... OK  -
lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating
REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK
ATTRIBUTE lemma  + creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  -
frequencies  OK  - token stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ?
validating REVCORP ... OK  - index        OK ========================================
/usr/local/bin/cwb-huffcode -r /B_NFS_P/diposit/corpora/cwb/registry -A PRUEBA 2>&1
Problem: No output generated -- no items? /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd:
No such file or directory ERROR: reading /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd
failed. Aborted. COMPRESSING TOKEN STREAM of PRUEBA.word - writing code descriptor block to
/B_NFS_P/diposit/corpora/cwb/data/prueba/word.hcd - writing compressed item sequence to
/B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf - writing sync (every 128 tokens) to
/B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf.syn VALIDATING PRUEBA.word - reading code
descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.hcd - reading compressed
item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf - reading sync (mod 128)
from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf.syn !! You can delete the file  now.
COMPRESSING TOKEN STREAM of PRUEBA.pos - writing code descriptor block to
/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.hcd - writing compressed item sequence to
/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf - writing sync (every 128 tokens) to
/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf.syn VALIDATING PRUEBA.pos - reading code
descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.hcd - reading compressed
item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf - reading sync (mod 128)
from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf.syn !! You can delete the file  now.
COMPRESSING TOKEN STREAM of PRUEBA.hw - writing code descriptor block to
/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.hcd - writing compressed item sequence to
/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf - writing sync (every 128 tokens) to
/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf.syn VALIDATING PRUEBA.hw - reading code
descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.hcd - reading compressed
item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf - reading sync (mod 128)
from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf.syn !! You can delete the file  now.
COMPRESSING TOKEN STREAM of PRUEBA.semtag VALIDATING PRUEBA.semtag - reading code descriptor
block from /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd
... in file
/srv/web/llocs/cqp/lib/admin-install.inc.php line 467.

Thanks......

_______________________
            andrés
chandía

P
No imprima innecesariamente.
¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130517/cf925915/attachment.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: prueba.txt
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130517/cf925915/attachment.txt>


More information about the CWB mailing list