[CWB] Test corpus indexing

"Andrés Chandía" andres at chandia.net
Sat Oct 20 01:57:14 CEST 2012



Ok, here you have it....


El Sab, 20 de Octubre de 2012, 1:53, Hardie, Andrew
escribió:
 <style type="text/css">-></style>





No,
I mean send me *the actual file* - as an attachment - I can&rsquo;t tell from
how it appears when pasted into an email whether or not it is formatted correctly  in the
actual file.


best


Andrew.




From:
cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of
"Andrés Chandía"
 Sent: 20 October 2012
00:52
 To: Open source development of the Corpus WorkBench

Subject: Re: [CWB] Test corpus indexing



 
Here you have it
 
 file name "prueba.txt"
 
 
 texto   N
 crudo   Aj
 para    P
 la   
  D
 prueba  N
 de      P
 la      D
 interfaz        N
 web     N
 de      P
 CQP     N
 
 
 El Sab, 20 de Octubre de 2012, 1:45,
Hardie, Andrew escribió:




Yes,
that&rsquo;s the correct update method. Now I can see the error.
It&rsquo;s
this bit (error lines reordered for clarity):
COMPRESSING TOKEN STREAM of PRUEBA6.pos 
Problem: No output generated -- no items? 
VALIDATING PRUEBA6.pos
/B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd: No
such file or directory 
ERROR: reading
/B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd failed. Aborted. 
- reading code descriptor block from
/B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd
Apparently,
there is no data in the encoded attribute &ldquo;pos&rdquo;, so attempting to compress it
fails.
The
reason is that the pos attribute was not created correctly by cwb-encode. One possible reason 
for this is that your input file is not properly formatted. Could you send me it (off list) so
I can check the format?
best
Andrew.


From: 
"Andrés Chandía" [mailto:andres at chandia.net] 
 Sent: 19 October
2012 14:17
 To: Hardie,
Andrew
 Cc: Open source
development of the Corpus WorkBench
 Subject: RE:
[CWB] Test corpus indexing



Ok I did it with the
svn version, (it says Revission 337)
 I guess I only have to update the CQPweb files, I
did it this way:
"Normally, you can update your CQPweb to a more recent version simply by copying


over the
files in the directory with the new versions. If you used Subversion to
get


the


CQPweb code,
this can typically be done with the svn


update command."


so probably I'm
doing something wrong because I get this message:
CQPweb encountered an error and could not continue.
cwb-huffcode reported an error! Corpus indexing aborted.  
/usr/local/bin/cwb-makeall -r /B_NFS_P/diposit/corpora/cwb/registry -V PRUEBA6


2>&1 === Makeall: processing corpus PRUEBA6 === Registry directory:


/B_NFS_P/diposit/corpora/cwb/registry ATTRIBUTE word  + creating LEXSRT ... OK  -
lexicon


    


OK  + creating FREQS ... OK  - frequencies  OK  - token stream OK  + creating REVCIDX
...


OK 


+ creating REVCORP ... OK  ? validating REVCORP ... OK  - index        OK ATTRIBUTE pos



+


creating LEXSRT ... OK  - lexicon      OK  + creating FREQS ... OK  - frequencies  OK 
-


token


stream OK  + creating REVCIDX ... OK  + creating REVCORP ... OK  ? validating REVCORP
...


OK 


- index        OK ======================================== /usr/local/bin/cwb-huffcode


-r


/B_NFS_P/diposit/corpora/cwb/registry -A PRUEBA6 2>&1 Problem: No output
generated


--


no items? /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd: No such file or directory


ERROR:


reading /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd failed. Aborted. COMPRESSING


TOKEN


STREAM of PRUEBA6.word - writing code descriptor block to


/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.hcd - writing compressed item sequence


to


/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf - writing sync (every 128 tokens)


to


/B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf.syn VALIDATING PRUEBA6.word -
reading


code


descriptor block from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.hcd - reading


compressed


item sequence from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf - reading sync
(mod


128)


from /B_NFS_P/diposit/corpora/cwb/data/prueba6/word.huf.syn !! You can delete the file



now.


COMPRESSING TOKEN STREAM of PRUEBA6.pos VALIDATING PRUEBA6.pos - reading code
descriptor


block


from /B_NFS_P/diposit/corpora/cwb/data/prueba6/pos.hcd


... in file
/srv/web/llocs/cqp/lib/admin-install.inc.php line 467.
 






 
 
 _______________________
            
andrés chandía
 
 P No
imprima innecesariamente. ¡Cuide el medio ambiente!




 


_______________________
            andrés
chandía

P
No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121020/ae18d9e6/attachment-0001.html>
-------------- next part --------------
An embedded and charset-unspecified text was scrubbed...
Name: prueba.txt
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121020/ae18d9e6/attachment-0001.txt>


More information about the CWB mailing list