[CWB] INVALID_CTRL marking \n wrongly? (schtepf)

Alberto Simões ambs at di.uminho.pt
Wed Jan 5 16:35:52 CET 2011



On 05/01/2011 15:26, Alberto Simões wrote:
>
> For what I can see, buffer is being called after fgets directly, without
> any kind of pre-process. Therefore, the newline keeps there until
> INVALID_CTRL marks it as invalid.
>
> Stephan or Andrew?

svn blame blames schtepf for that code.
Thanks
>
> Thanks
> Hug
> Alberto
>
> On 05/01/2011 15:12, Alberto Simões wrote:
>> Hello,
>>
>> As far as I can tell, cl_string_validate_encoding is being called with a
>> string that ends with a new line (in fact, the first line of the file
>> being processed), and INVALID_CTRL marks the newline as an invalid
>> character.
>>
>> Wondering if this is a recent change or, if not, why this is happening
>> on this machine.
>>
>> Thanks
>>
>>
>>
>> On 05/01/2011 14:22, Alberto Simões wrote:
>>>
>>> Found out that encode is failing:
>>>
>>> [ambs at search CWB]$ /share/apps/amalandro/bin/cwb-encode -s -x -U '' -R
>>> tmp/registry/vss -d tmp/vss -f data/vrt/VeryShortStories.vrt -p - -P
>>> word -P pos -P lemma -0 collection -S 'story:0+num+title+author+year' -S
>>> 'chapter:0+num' -S 'p:0' -S 's:0'
>>> Encoding error: an invalid byte or byte sequence for charset "latin1"
>>> was encountered.
>>>
>>> And VeryShortStories.vrt does not include outside latin1 chars.
>>>
>>> So, probably CWB is not compiling correctly?
>>>
>>> Thanks
>>>
>>>
>>> On 04/01/2011 22:22, Alberto Simões wrote:
>>>> Hello
>>>>
>>>> I am trying to install CWB on a cluster, and when running make check, I
>>>> get a lot of errors (bellow). This is CWB and Perl CWB from svn head.
>>>> Let me know if you have any idea of what is going on.
>>>>
>>>> Thanks
>>>>
>>>> [ambs at search CWB]$ make test
>>>> PERL_DL_NONLAZY=1 /share/apps/amalandro/perls/perl-5.12.2/bin/perl
>>>> "-MExtUtils::
>>>> Command::MM" "-e" "test_harness(0, 'blib/lib', 'blib/arch')" t/*.t
>>>> t/00_load.t ............ ok
>>>> t/10_cwb_tools.t ....... okt/11_cwb_file.t ........ ok
>>>> t/12_cwb_tempfile.t .... ok
>>>> t/13_cwb_shell.t ....... ok
>>>> t/14_cwb_registry.t .... ok
>>>> t/20_encode_vss.t ...... 1/6
>>>> # Failed test 'corpus encoding and indexing'
>>>> # at t/20_encode_vss.t line 42.
>>>> # VSS corpus encoded in 0.1 seconds
>>>> # data file 'story_num.avs' is corrput
>>>> # failed to create data file 'word.huf.syn'
>>>> # failed to create data file 'word.hcd'
>>>> # data file 'lemma.lexicon' is corrput
>>>> # data file 'story_author.avs' is corrput
>>>> # failed to create data file 'word.crc'
>>>> # data file 'story_year.avx' is corrput
>>>> # data file 'story_num.rng' is corrput
>>>> # failed to create data file 'pos.huf.syn'
>>>> # data file 'story.rng' is corrput
>>>> # data file 'story_year.avs' is corrput
>>>> # data file 'chapter_num.avx' is corrput
>>>> # data file 'pos.lexicon.idx' is corrput
>>>> # data file 'story_num.avx' is corrput
>>>> # data file 'chapter.rng' is corrput
>>>> # data file 'story_year.rng' is corrput
>>>> # failed to create data file 'lemma.corpus.cnt'
>>>> # data file 'chapter_num.rng' is corrput
>>>> # data file 'lemma.lexicon.idx' is corrput
>>>> # data file 'story_title.rng' is corrput
>>>> # failed to create data file 'word.huf'
>>>> # failed to create data file 'lemma.crx'
>>>> # failed to create data file 'pos.corpus.cnt'
>>>> # failed to create data file 'pos.lexicon.srt'
>>>>
>>>> # Failed test 'validation of created data files'
>>>> # at t/20_encode_vss.t line 68.
>>>>
>>>> # Failed test 'validation of generated registry entry'
>>>> # at t/20_encode_vss.t line 80.
>>>> Use of uninitialized value $mode in bitwise and (&) at
>>>> t/20_encode_vss.t
>>>> line 85.
>>>>
>>>> # Failed test 'correct file access permissions (word.huf)'
>>>> # at t/20_encode_vss.t line 85.
>>>> # got: '0000'
>>>> # expected: '0640'
>>>> CWB::OpenFile: Can't open file/pipe 'tmp/vss/.info' in mode '<': No
>>>> such
>>>> file or directory at t/20_encode_vss.t line 87
>>>> # Looks like you planned 6 tests but ran 5.
>>>> # Looks like you failed 4 tests of 5 run.
>>>> # Looks like your test exited with 2 just after 5.
>>>> t/20_encode_vss.t ...... Dubious, test returned 2 (wstat 512, 0x200)
>>>> Failed 5/6 subtests
>>>> t/30_cqp_basic.t ....... 1/17 # TODO: write many, many more tests for
>>>> CWB::CQP
>>>> t/30_cqp_basic.t ....... ok
>>>>
>>>
>>
>

-- 
Alberto Simões


More information about the CWB mailing list