[CWB] Huffman code error
Stefan Evert
stefanML at collocations.de
Wed Oct 10 15:35:56 CEST 2012
> I have the feeling this bug has come up before
It has, but AFAIR this was in the context of very large corpora (> 1.5 billion words) and has to do with a deficiency in the CWB binary file format, so it cannot be fixed in a backward-compatible way.
> – can you check your version? (cqp –v)
The path indicates CWB 3.4.1, which seems to be rather ancient and will contain a lot of bugs that have been fixed in the meantime.
For what it's worth, I tried the sample input file included in the e-mail with CWB 3.4.3 and 3.4.5 on my Mac and wasn't able to reproduce the error.
Two observations, though:
1) The sample file in the e-mail has only 35 tokens, not 40 tokens as claimed. So perhaps this is a cut-down version that doesn't trigger the error?
2) When copying & pasting from the e-mail, I end up with 4 blanks as column separators rather than the required TABs, which I edited before encoding, of course. If I use blanks instead of TABs, cwb-huffcode will fail, of course, because the attributes "lema" and "pos" are empty. However, this produces a different error message from the one reported.
Best
Stefan
More information about the CWB
mailing list