[CWB] cwb testing

lars nygaard lars.nygaard at iln.uio.no
Thu Jul 27 02:42:30 CEST 2006


Marco Baroni skrev:
> Hi Lars!
>
> Thanks for you reply and offer for help!
>
>> I'll be happy to write a perl script that does the actual testing; I 
>> guess each query should be run directly through cqp (throgh a shell 
>> command) and through the perl modules (since there are different 
>> things that might go wrong).
>
> How about indexing (which is  one of  the most likely things to go 
> wrong, in my past experience)? Should that also be run via a shell 
> command?
Sure.
>
>> As for the testing corpus, I suggest that we use Dickens or German 
>> Law. If we want to automatically generate larger corpora, we could 
>> just duplicate the text in the smaller corpus. 
>
> I thought about that, but it would give us a weird, non-Zipfian 
> distribution unlike any real corpus, wouldn't it?
I guess so; but that does not worry me so much. At least, I don't think 
it should be on top of the list of our priorities.
> Test queries: is it ok if I provide them to you in september? 
That's fine. And other people on the list should feel free to contribute 
also. Though we might want to decide on a first testing corpus. 
"dickens" seems as good as anything, but I don't have access to a copy.

-lars



More information about the CWB mailing list