[CWB] Indexing problems

Eros Zanchetta eros at sslmit.unibo.it
Thu Jul 22 16:14:02 CEST 2010


Hi Stefan!

Thanks for the feedback, sorry for not replying earlier (I'm not at my 
computer today).

The <text> lines are very long but somehow that doesn't seem to be a 
problem in another similar corpus that has the very same metadata but 
different data (long story, we're writing a paper on it... ;-) )

I'll do some more detective work, test the CWB version you recommend and 
let you know how it goes.

Cheers,
Eros

On 22/07/2010 15:46, Stefan Evert wrote:
>    
>> Removing them solved most of my problems, I still get the syntax error
>> messages but they are probably caused by stray double quotes in the
>> attributes (line numbers are still not very helpful in identifying the
>> problem though...)
>>      
> They should be, though. :-/
>
> If you can spare a little time, could you test with cwb-encode from the 3.0.0 release, please?  The only possible explanation I can think of is that buffer overflows from very long lines throw off cwb-encode's line counting.
>
> I seem to remember that the bug fix back then was triggered by spurious error messages like the one you got: if the<text ...>  line is very long, it doesn't fit into the line input buffer, and the incomplete tag is reported as a syntax error by the attribute parser.  If there's a real bug hiding behind your problems, though, I'd like to investigate and fix it.
>
> Cheers,
> Stefan
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>    




More information about the CWB mailing list