[CWB] INVALID_CTRL marking \n wrongly? WAS: Failing to Compile CWBPerl Module

Alberto Simões ambs at di.uminho.pt
Wed Jan 5 16:26:57 CET 2011


For what I can see, buffer is being called after fgets directly, without 
any kind of pre-process. Therefore, the newline keeps there until 
INVALID_CTRL marks it as invalid.

Stephan or Andrew?

Thanks
Hug
Alberto

On 05/01/2011 15:12, Alberto Simões wrote:
> Hello,
>
> As far as I can tell, cl_string_validate_encoding is being called with a
> string that ends with a new line (in fact, the first line of the file
> being processed), and INVALID_CTRL marks the newline as an invalid
> character.
>
> Wondering if this is a recent change or, if not, why this is happening
> on this machine.
>
> Thanks
>
>
>
> On 05/01/2011 14:22, Alberto Simões wrote:
>>
>> Found out that encode is failing:
>>
>> [ambs at search CWB]$ /share/apps/amalandro/bin/cwb-encode -s -x -U '' -R
>> tmp/registry/vss -d tmp/vss -f data/vrt/VeryShortStories.vrt -p - -P
>> word -P pos -P lemma -0 collection -S 'story:0+num+title+author+year' -S
>> 'chapter:0+num' -S 'p:0' -S 's:0'
>> Encoding error: an invalid byte or byte sequence for charset "latin1"
>> was encountered.
>>
>> And VeryShortStories.vrt does not include outside latin1 chars.
>>
>> So, probably CWB is not compiling correctly?
>>
>> Thanks
>>
>>
>> On 04/01/2011 22:22, Alberto Simões wrote:
>>> Hello
>>>
>>> I am trying to install CWB on a cluster, and when running make check, I
>>> get a lot of errors (bellow). This is CWB and Perl CWB from svn head.
>>> Let me know if you have any idea of what is going on.
>>>
>>> Thanks
>>>
>>> [ambs at search CWB]$ make test
>>> PERL_DL_NONLAZY=1 /share/apps/amalandro/perls/perl-5.12.2/bin/perl
>>> "-MExtUtils::
>>> Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>>> t/00_load.t ............ ok
>>> t/10_cwb_tools.t ....... okt/11_cwb_file.t ........ ok
>>> t/12_cwb_tempfile.t .... ok
>>> t/13_cwb_shell.t ....... ok
>>> t/14_cwb_registry.t .... ok
>>> t/20_encode_vss.t ...... 1/6
>>> # Failed test 'corpus encoding and indexing'
>>> # at t/20_encode_vss.t line 42.
>>> # VSS corpus encoded in 0.1 seconds
>>> # data file 'story_num.avs' is corrput
>>> # failed to create data file 'word.huf.syn'
>>> # failed to create data file 'word.hcd'
>>> # data file 'lemma.lexicon' is corrput
>>> # data file 'story_author.avs' is corrput
>>> # failed to create data file 'word.crc'
>>> # data file 'story_year.avx' is corrput
>>> # data file 'story_num.rng' is corrput
>>> # failed to create data file 'pos.huf.syn'
>>> # data file 'story.rng' is corrput
>>> # data file 'story_year.avs' is corrput
>>> # data file 'chapter_num.avx' is corrput
>>> # data file 'pos.lexicon.idx' is corrput
>>> # data file 'story_num.avx' is corrput
>>> # data file 'chapter.rng' is corrput
>>> # data file 'story_year.rng' is corrput
>>> # failed to create data file 'lemma.corpus.cnt'
>>> # data file 'chapter_num.rng' is corrput
>>> # data file 'lemma.lexicon.idx' is corrput
>>> # data file 'story_title.rng' is corrput
>>> # failed to create data file 'word.huf'
>>> # failed to create data file 'lemma.crx'
>>> # failed to create data file 'pos.corpus.cnt'
>>> # failed to create data file 'pos.lexicon.srt'
>>>
>>> # Failed test 'validation of created data files'
>>> # at t/20_encode_vss.t line 68.
>>>
>>> # Failed test 'validation of generated registry entry'
>>> # at t/20_encode_vss.t line 80.
>>> Use of uninitialized value $mode in bitwise and (&) at t/20_encode_vss.t
>>> line 85.
>>>
>>> # Failed test 'correct file access permissions (word.huf)'
>>> # at t/20_encode_vss.t line 85.
>>> # got: '0000'
>>> # expected: '0640'
>>> CWB::OpenFile: Can't open file/pipe 'tmp/vss/.info' in mode '<': No such
>>> file or directory at t/20_encode_vss.t line 87
>>> # Looks like you planned 6 tests but ran 5.
>>> # Looks like you failed 4 tests of 5 run.
>>> # Looks like your test exited with 2 just after 5.
>>> t/20_encode_vss.t ...... Dubious, test returned 2 (wstat 512, 0x200)
>>> Failed 5/6 subtests
>>> t/30_cqp_basic.t ....... 1/17 # TODO: write many, many more tests for
>>> CWB::CQP
>>> t/30_cqp_basic.t ....... ok
>>>
>>
>

-- 
Alberto Simões


More information about the CWB mailing list