[CWB] Test corpus indexing

"Andrés Chandía" andres at chandia.net
Sat Oct 20 01:52:24 CEST 2012



Here you have it

file name "prueba.txt"

<text
id="prueba">
texto   N
crudo   Aj
para    P
la      D
prueba  N
de      P
la      D
interfaz        N
web     N
de      P
CQP     N
</text>

El Sab, 20 de Octubre de 2012, 1:45, Hardie, Andrew escribió:
 <style type="text/css">-></style>


Yes,
that&rsquo;s the correct update method. Now I can see the error.


It&rsquo;s
this bit (error lines reordered for clarity):


COMPRESSING TOKEN STREAM of PRUEBA6.pos 
Problem: No output generated -- no items? 
VALIDATING PRUEBA6.pos
/B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd: No such file or
directory 
ERROR: reading /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd
failed. Aborted. 
 - reading code descriptor block from
/B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd


Apparently,
there is no data in the encoded attribute &ldquo;pos&rdquo;, so attempting to compress it
fails.


The
reason is that the pos attribute was not created correctly by cwb-encode. One possible reason
for this is that your input file is not properly formatted.  Could you send me it (off list)
so I can check the format?


best


Andrew.



From:
"Andrés Chandía" [mailto:andres at chandia.net] 

Sent: 19 October 2012 14:17
 To: Hardie, Andrew

Cc: Open source development of the Corpus WorkBench

Subject: RE: [CWB] Test corpus indexing

 
Ok I did it with the svn version, (it says
Revission 337)
 I guess I only have to update the CQPweb files, I did it this way:
"Normally, you can update your CQPweb to a more recent version simply by copying
over the
files in the directory with the new versions. If you used Subversion to get
the
CQPweb code,
this can typically be done with the svn
update command."
so probably I'm doing something wrong
because I get this message:
CQPweb encountered an error and could not continue.
cwb-huffcode reported an error! Corpus indexing aborted.  
/usr/local/bin/cwb-makeall -r /B_NFS_P/diposit/corpora/cwb/registry -V PRUEBA6
2>&1 === Makeall: processing corpus PRUEBA6 === Registry directory:
/B_NFS_P/diposit/corpora/cwb/registry ATTRIBUTE word  + creating LEXSRT ... OK  - lexicon
    
OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating REVCIDX ...
OK 
+ creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK ATTRIBUTE pos 
+
creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  - frequencies  OK  -
token
stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP ...
OK 
- index        OK ======================================== /usr/local/bin/cwb-huffcode
-r
/B_NFS_P/diposit/corpora/cwb/registry -A PRUEBA6 2>&1 Problem: No output generated
--
no items? /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd: No such file or directory
ERROR:
reading /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd failed. Aborted. COMPRESSING
TOKEN
STREAM of PRUEBA6.word - writing code descriptor block to
/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.hcd - writing compressed item sequence
to
/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf - writing sync (every 128 tokens)
to
/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf.syn VALIDATING PRUEBA6.word - reading
code
descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.hcd - reading
compressed
item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf - reading sync (mod
128)
from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf.syn !! You can delete the file 
now.
COMPRESSING TOKEN STREAM of PRUEBA6.pos VALIDATING PRUEBA6.pos - reading code descriptor
block
from /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd
... in file
/srv/web/llocs/cqp/lib/admin-install.inc.php line 467.

 
 


 


_______________________
            andrés
chandía

P No imprima
innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121020/4355f81d/attachment-0001.html>


More information about the CWB mailing list