[CWB] INVALID_CTRL marking \n wrongly? WAS: Failing to Compile CWB Perl Module

Alberto Simões ambs at di.uminho.pt
Wed Jan 5 16:12:31 CET 2011


Hello,

As far as I can tell,  cl_string_validate_encoding is being called with 
a string that ends with a new line (in fact, the first line of the file 
being processed), and INVALID_CTRL marks the newline as an invalid 
character.

Wondering if this is a recent change or, if not, why this is happening 
on this machine.

Thanks



On 05/01/2011 14:22, Alberto Simões wrote:
>
> Found out that encode is failing:
>
> [ambs at search CWB]$ /share/apps/amalandro/bin/cwb-encode -s -x -U '' -R
> tmp/registry/vss -d tmp/vss -f data/vrt/VeryShortStories.vrt -p - -P
> word -P pos -P lemma -0 collection -S 'story:0+num+title+author+year' -S
> 'chapter:0+num' -S 'p:0' -S 's:0'
> Encoding error: an invalid byte or byte sequence for charset "latin1"
> was encountered.
>
> And VeryShortStories.vrt does not include outside latin1 chars.
>
> So, probably CWB is not compiling correctly?
>
> Thanks
>
>
> On 04/01/2011 22:22, Alberto Simões wrote:
>> Hello
>>
>> I am trying to install CWB on a cluster, and when running make check, I
>> get a lot of errors (bellow). This is CWB and Perl CWB from svn head.
>> Let me know if you have any idea of what is going on.
>>
>> Thanks
>>
>> [ambs at search CWB]$ make test
>> PERL_DL_NONLAZY=1 /share/apps/amalandro/perls/perl-5.12.2/bin/perl
>> "-MExtUtils::
>> Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>> t/00_load.t ............ ok
>> t/10_cwb_tools.t ....... okt/11_cwb_file.t ........ ok
>> t/12_cwb_tempfile.t .... ok
>> t/13_cwb_shell.t ....... ok
>> t/14_cwb_registry.t .... ok
>> t/20_encode_vss.t ...... 1/6
>> # Failed test 'corpus encoding and indexing'
>> # at t/20_encode_vss.t line 42.
>> # VSS corpus encoded in 0.1 seconds
>> # data file 'story_num.avs' is corrput
>> # failed to create data file 'word.huf.syn'
>> # failed to create data file 'word.hcd'
>> # data file 'lemma.lexicon' is corrput
>> # data file 'story_author.avs' is corrput
>> # failed to create data file 'word.crc'
>> # data file 'story_year.avx' is corrput
>> # data file 'story_num.rng' is corrput
>> # failed to create data file 'pos.huf.syn'
>> # data file 'story.rng' is corrput
>> # data file 'story_year.avs' is corrput
>> # data file 'chapter_num.avx' is corrput
>> # data file 'pos.lexicon.idx' is corrput
>> # data file 'story_num.avx' is corrput
>> # data file 'chapter.rng' is corrput
>> # data file 'story_year.rng' is corrput
>> # failed to create data file 'lemma.corpus.cnt'
>> # data file 'chapter_num.rng' is corrput
>> # data file 'lemma.lexicon.idx' is corrput
>> # data file 'story_title.rng' is corrput
>> # failed to create data file 'word.huf'
>> # failed to create data file 'lemma.crx'
>> # failed to create data file 'pos.corpus.cnt'
>> # failed to create data file 'pos.lexicon.srt'
>>
>> # Failed test 'validation of created data files'
>> # at t/20_encode_vss.t line 68.
>>
>> # Failed test 'validation of generated registry entry'
>> # at t/20_encode_vss.t line 80.
>> Use of uninitialized value $mode in bitwise and (&) at t/20_encode_vss.t
>> line 85.
>>
>> # Failed test 'correct file access permissions (word.huf)'
>> # at t/20_encode_vss.t line 85.
>> # got: '0000'
>> # expected: '0640'
>> CWB::OpenFile: Can't open file/pipe 'tmp/vss/.info' in mode '<': No such
>> file or directory at t/20_encode_vss.t line 87
>> # Looks like you planned 6 tests but ran 5.
>> # Looks like you failed 4 tests of 5 run.
>> # Looks like your test exited with 2 just after 5.
>> t/20_encode_vss.t ...... Dubious, test returned 2 (wstat 512, 0x200)
>> Failed 5/6 subtests
>> t/30_cqp_basic.t ....... 1/17 # TODO: write many, many more tests for
>> CWB::CQP
>> t/30_cqp_basic.t ....... ok
>>
>

-- 
Alberto Simões


More information about the CWB mailing list