[CWB] Problem indexing even tiny corpus
Hardie, Andrew
a.hardie at lancaster.ac.uk
Fri May 17 16:04:14 CEST 2013
Well, if you choose the option "Use the default setup for P-attributes", and you don't *have* the default p-attributes, then you are likely to run into problems.
Specify the p-attributes you actually have in the indexing screen.
best
Andrew.
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Andres Chandia
Sent: 17 May 2013 14:38
To: cwb at sslmit.unibo.it
Subject: [CWB] Problem indexing even tiny corpus
Hi there,
file prueba.txt (attached to this mail) put in ../cqp/uploads/
chown www-data:bncweb ../cqp/uploads/prueba.txt
then from the web interface: Install new corpus >
Specify the MySQL name of the corpus you wish to create: prueba
Specify the CWB name of the corpus you wish to create: prueba
Enter the full name of the corpus: Prueba
Include? x prueba.txt
Use default setup for S-attributes (only <s>)
Use default setup for P-attributes (pos, hw, semtag, class, lemma)
> Install corpus with settings above <
and then I get this, I'm a little lost, what should I look for, what to fix?
CQPweb encountered an error and could not continue.
cwb-huffcode reported an error! Corpus indexing aborted.
/usr/local/bin/cwb-encode -xsB -c utf8 -d /B_NFS_P/diposit/corpora/cwb/data/prueba -f
/B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt -R
/B_NFS_P/diposit/corpora/cwb/registry/prueba -P pos -P hw -P semtag -P class -P lemma -S s -S
text:0+id 2>&1 Undeclared element attribute ignored (file
/B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt, line #1, warning issued only once).
Undeclared element attribute ignored (file /B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt,
line #1, warning issued only once). /usr/local/bin/cwb-makeall -r
/B_NFS_P/diposit/corpora/cwb/registry -V PRUEBA 2>&1 === Makeall: processing corpus
PRUEBA === Registry directory: /B_NFS_P/diposit/corpora/cwb/registry ATTRIBUTE word +
creating LEXSRT ... OK - lexicon OK + creating FREQS ... OK - frequencies OK - token
stream OK + creating REVCIDX ... OK + creating REVCORP ... OK ? validating REVCORP ... OK
- index OK ATTRIBUTE pos + creating LEXSRT ... OK - lexicon OK + creating FREQS
... OK - frequencies OK - token stream OK + creating REVCIDX ... OK + creating REVCORP
... OK ? validating REVCORP ... OK - index OK ATTRIBUTE hw + creating LEXSRT ... OK
- lexicon OK + creating FREQS ... OK - frequencies OK - token stream OK + creating
REVCIDX ... OK + creating REVCORP ... OK ? validating REVCORP ... OK - index OK
ATTRIBUTE semtag + creating LEXSRT ... OK - lexicon OK + creating FREQS ... OK -
frequencies OK - token stream OK + creating REVCIDX ... OK + creating REVCORP ... OK ?
validating REVCORP ... OK - index OK ATTRIBUTE class + creating LEXSRT ... OK -
lexicon OK + creating FREQS ... OK - frequencies OK - token stream OK + creating
REVCIDX ... OK + creating REVCORP ... OK ? validating REVCORP ... OK - index OK
ATTRIBUTE lemma + creating LEXSRT ... OK - lexicon OK + creating FREQS ... OK -
frequencies OK - token stream OK + creating REVCIDX ... OK + creating REVCORP ... OK ?
validating REVCORP ... OK - index OK ========================================
/usr/local/bin/cwb-huffcode -r /B_NFS_P/diposit/corpora/cwb/registry -A PRUEBA 2>&1
Problem: No output generated -- no items? /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd:
No such file or directory ERROR: reading /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd
failed. Aborted. COMPRESSING TOKEN STREAM of PRUEBA.word - writing code descriptor block to
/B_NFS_P/diposit/corpora/cwb/data/prueba/word.hcd - writing compressed item sequence to
/B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf - writing sync (every 128 tokens) to
/B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf.syn VALIDATING PRUEBA.word - reading code
descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.hcd - reading compressed
item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf - reading sync (mod 128)
from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf.syn !! You can delete the file now.
COMPRESSING TOKEN STREAM of PRUEBA.pos - writing code descriptor block to
/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.hcd - writing compressed item sequence to
/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf - writing sync (every 128 tokens) to
/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf.syn VALIDATING PRUEBA.pos - reading code
descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.hcd - reading compressed
item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf - reading sync (mod 128)
from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf.syn !! You can delete the file now.
COMPRESSING TOKEN STREAM of PRUEBA.hw - writing code descriptor block to
/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.hcd - writing compressed item sequence to
/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf - writing sync (every 128 tokens) to
/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf.syn VALIDATING PRUEBA.hw - reading code
descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.hcd - reading compressed
item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf - reading sync (mod 128)
from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf.syn !! You can delete the file now.
COMPRESSING TOKEN STREAM of PRUEBA.semtag VALIDATING PRUEBA.semtag - reading code descriptor
block from /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd
... in file /srv/web/llocs/cqp/lib/admin-install.inc.php line 467.
Thanks......
_______________________
andrés chandía
[Image removed by sender. chandia.net]<http://www.chandia.net>
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130517/da47b708/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD000.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD000.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130517/da47b708/attachment-0001.jpg>
More information about the CWB
mailing list