[CWB] Problem indexing even tiny corpus

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri May 17 16:04:14 CEST 2013


Well, if you choose the option "Use the default setup for P-attributes", and you don't *have* the default p-attributes, then you are likely to run into problems.

Specify the p-attributes you actually have in the indexing screen.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Andres Chandia
Sent: 17 May 2013 14:38
To: cwb at sslmit.unibo.it
Subject: [CWB] Problem indexing even tiny corpus

Hi there,
file prueba.txt  (attached to this mail) put in ../cqp/uploads/
chown www-data:bncweb ../cqp/uploads/prueba.txt

then from the web interface:  Install new corpus >
Specify the MySQL name of the corpus you wish to create: prueba
Specify the CWB name of the corpus you wish to create: prueba
Enter the full name of the corpus: Prueba
Include? x prueba.txt
Use default setup for S-attributes (only <s>)
Use default setup for P-attributes (pos, hw, semtag, class, lemma)
> Install corpus with settings above <

and then I get this, I'm a little lost, what should I look for, what to fix?

CQPweb encountered an error and could not continue.

cwb-huffcode reported an error! Corpus indexing aborted.

/usr/local/bin/cwb-encode -xsB -c utf8 -d /B_NFS_P/diposit/corpora/cwb/data/prueba -f

/B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt -R

/B_NFS_P/diposit/corpora/cwb/registry/prueba  -P pos -P hw -P semtag -P class -P lemma -S s -S

text:0+id 2>&1 Undeclared element attribute  ignored (file

/B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt, line #1, warning issued only once).

Undeclared element attribute  ignored (file /B_NFS_P/diposit/corpora/cqp/uploads/prueba.txt,

line #1, warning issued only once). /usr/local/bin/cwb-makeall -r

/B_NFS_P/diposit/corpora/cwb/registry -V PRUEBA 2>&1 === Makeall: processing corpus

PRUEBA === Registry directory: /B_NFS_P/diposit/corpora/cwb/registry ATTRIBUTE word  +

creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token

stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK

- index        OK ATTRIBUTE pos  + creating LEXSRT ... OK  - lexicon      OK  + creating FREQS

... OK  - frequencies  OK  - token stream OK  + creating REVCIDX ... OK  + creating REVCORP

... OK  ? validating REVCORP ... OK  - index        OK ATTRIBUTE hw  + creating LEXSRT ... OK

- lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating

REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK

ATTRIBUTE semtag  + creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  -

frequencies  OK  - token stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ?

validating REVCORP ... OK  - index        OK ATTRIBUTE class  + creating LEXSRT ... OK  -

lexicon      OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating

REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK

ATTRIBUTE lemma  + creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  -

frequencies  OK  - token stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ?

validating REVCORP ... OK  - index        OK ========================================

/usr/local/bin/cwb-huffcode -r /B_NFS_P/diposit/corpora/cwb/registry -A PRUEBA 2>&1

Problem: No output generated -- no items? /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd:

No such file or directory ERROR: reading /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd

failed. Aborted. COMPRESSING TOKEN STREAM of PRUEBA.word - writing code descriptor block to

/B_NFS_P/diposit/corpora/cwb/data/prueba/word.hcd - writing compressed item sequence to

/B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf - writing sync (every 128 tokens) to

/B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf.syn VALIDATING PRUEBA.word - reading code

descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.hcd - reading compressed

item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf - reading sync (mod 128)

from /B_NFS_P/diposit/corpora/cwb/data/prueba/word.huf.syn !! You can delete the file  now.

COMPRESSING TOKEN STREAM of PRUEBA.pos - writing code descriptor block to

/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.hcd - writing compressed item sequence to

/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf - writing sync (every 128 tokens) to

/B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf.syn VALIDATING PRUEBA.pos - reading code

descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.hcd - reading compressed

item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf - reading sync (mod 128)

from /B_NFS_P/diposit/corpora/cwb/data/prueba/pos.huf.syn !! You can delete the file  now.

COMPRESSING TOKEN STREAM of PRUEBA.hw - writing code descriptor block to

/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.hcd - writing compressed item sequence to

/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf - writing sync (every 128 tokens) to

/B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf.syn VALIDATING PRUEBA.hw - reading code

descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.hcd - reading compressed

item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf - reading sync (mod 128)

from /B_NFS_P/diposit/corpora/cwb/data/prueba/hw.huf.syn !! You can delete the file  now.

COMPRESSING TOKEN STREAM of PRUEBA.semtag VALIDATING PRUEBA.semtag - reading code descriptor

block from /B_NFS_P/diposit/corpora/cwb/data/prueba/semtag.hcd

... in file /srv/web/llocs/cqp/lib/admin-install.inc.php line 467.

Thanks......

_______________________
            andrés chandía
[Image removed by sender. chandia.net]<http://www.chandia.net>
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130517/da47b708/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD000.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD000.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130517/da47b708/attachment-0001.jpg>


More information about the CWB mailing list