[CWB] Test corpus indexing

Hardie, Andrew a.hardie at lancaster.ac.uk
Sat Oct 20 01:53:48 CEST 2012


No, I mean send me *the actual file* - as an attachment - I can't tell from how it appears when pasted into an email whether or not it is formatted correctly in the actual file.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of "Andrés Chandía"
Sent: 20 October 2012 00:52
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Test corpus indexing

Here you have it

file name "prueba.txt"

<text id="prueba">
texto   N
crudo   Aj
para    P
la      D
prueba  N
de      P
la      D
interfaz        N
web     N
de      P
CQP     N
</text>

El Sab, 20 de Octubre de 2012, 1:45, Hardie, Andrew escribió:
Yes, that's the correct update method. Now I can see the error.
It's this bit (error lines reordered for clarity):
COMPRESSING TOKEN STREAM of PRUEBA6.pos
Problem: No output generated -- no items?
VALIDATING PRUEBA6.pos
/B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd: No such file or directory
ERROR: reading /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd failed. Aborted.
- reading code descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd
Apparently, there is no data in the encoded attribute "pos", so attempting to compress it fails.
The reason is that the pos attribute was not created correctly by cwb-encode. One possible reason for this is that your input file is not properly formatted. Could you send me it (off list) so I can check the format?
best
Andrew.
From: "Andrés Chandía" [mailto:andres at chandia.net]
Sent: 19 October 2012 14:17
To: Hardie, Andrew
Cc: Open source development of the Corpus WorkBench
Subject: RE: [CWB] Test corpus indexing
Ok I did it with the svn version, (it says Revission 337)
I guess I only have to update the CQPweb files, I did it this way:

"Normally, you can update your CQPweb to a more recent version simply by copying

over the
files in the directory with the new versions. If you used Subversion to get

the

CQPweb code,
this can typically be done with the svn

update command."
so probably I'm doing something wrong because I get this message:

CQPweb encountered an error and could not continue.

cwb-huffcode reported an error! Corpus indexing aborted.

/usr/local/bin/cwb-makeall -r /B_NFS_P/diposit/corpora/cwb/registry -V PRUEBA6

2>&1 === Makeall: processing corpus PRUEBA6 === Registry directory:

/B_NFS_P/diposit/corpora/cwb/registry ATTRIBUTE word  + creating LEXSRT ... OK  - lexicon



OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating REVCIDX ...

OK

+ creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK ATTRIBUTE pos

+

creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  - frequencies  OK  -

token

stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ...

OK

- index        OK ======================================== /usr/local/bin/cwb-huffcode

-r

/B_NFS_P/diposit/corpora/cwb/registry -A PRUEBA6 2>&1 Problem: No output generated

--

no items? /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd: No such file or directory

ERROR:

reading /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd failed. Aborted. COMPRESSING

TOKEN

STREAM of PRUEBA6.word - writing code descriptor block to

/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.hcd - writing compressed item sequence

to

/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf - writing sync (every 128 tokens)

to

/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf.syn VALIDATING PRUEBA6.word - reading

code

descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.hcd - reading

compressed

item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf - reading sync (mod

128)

from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf.syn !! You can delete the file

now.

COMPRESSING TOKEN STREAM of PRUEBA6.pos VALIDATING PRUEBA6.pos - reading code descriptor

block

from /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd

... in file /srv/web/llocs/cqp/lib/admin-install.inc.php line 467.




_______________________
            andrés chandía
[Image removed by sender. chandia.net]<http://www.chandia.net>
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121019/dee36cec/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD135.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD135.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121019/dee36cec/attachment.jpg>


More information about the CWB mailing list