[CWB] Huffman code error

BOFÍAS ALBERCH, EVA eva.bofias at upf.edu
Wed Oct 10 13:18:55 CEST 2012


Hi,

I have an error, I am not able to solve. I'm trying to build a Latin
corpora but I get this error:

Error: Huffman codes too long (32 bits, current maximum is 31 bits).
       Please contact the CWB development team for assistance.

I got this error when trying to build a  40 words corpora (I cut it to see
if I could detect the error; with 39 words I do not get the error)

-----------
<doc type="CHRISTIAN_LATIN" title="Abelard">
<s>
PETRUS    Petrus    N:nom
ABAELARDUS    UNKNOUN    ADJ
(    (    PUN
1079-1142    card    ADJ:NUM
)    )    PUN
ABAELARDI    UNKNOUN   N:voc
AD    UNKNOUN    N:abl
AMICUM    amicus    ADJ
SUUM    sus    N:gen
CONSOLATORIA    consolatorius    ADJ
Sepe    sepes    N:dat
humanos    humanus    ADJ
affectus    affectus    N:nom
aut    aut    CC
provocant    provoco    V:IND
aut    aut    CC
mittigant   mi    V:IND
amplius    ample    ADV
exempla    exemplum    N:nom
quam    qui    REL
verba    verbum    N:nom
.    .    SENT
</s>
<s>
Unde    unde    ADV
post    post    PREP
nonnullam    nonnullus    ADJ
sermonis    sermo    N:gen
ad    ad    PREP
habiti  habeo    V:PTC
consolationem    consolatio    N:acc
,    ,    PUN
de    de    PREP
ipsis    ipse    DET
calamitatum    calamitas    N:gen
mearum    meus    POSS
experimentis    experimentum    N:abl
</s>
</doc>

-----------------
This are the attributes I use to describe the corpus:

cat $SOURCEFILE | /usr/local/cwb-3.4.1/bin/cwb-encode -c utf8 -d $DATADIR
-R $REGDIR/$CORPUSNAME -xsB -P lema -P pos -V s  -S doc:0+type+title -S
not:0+text

Thanks

Eva Bofias
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121010/b723f9f2/attachment.html>


More information about the CWB mailing list