[CWB] cwb testing
lars nygaard
lars.nygaard at iln.uio.no
Thu Jul 27 02:42:30 CEST 2006
Marco Baroni skrev:
> Hi Lars!
>
> Thanks for you reply and offer for help!
>
>> I'll be happy to write a perl script that does the actual testing; I
>> guess each query should be run directly through cqp (throgh a shell
>> command) and through the perl modules (since there are different
>> things that might go wrong).
>
> How about indexing (which is one of the most likely things to go
> wrong, in my past experience)? Should that also be run via a shell
> command?
Sure.
>
>> As for the testing corpus, I suggest that we use Dickens or German
>> Law. If we want to automatically generate larger corpora, we could
>> just duplicate the text in the smaller corpus.
>
> I thought about that, but it would give us a weird, non-Zipfian
> distribution unlike any real corpus, wouldn't it?
I guess so; but that does not worry me so much. At least, I don't think
it should be on top of the list of our priorities.
> Test queries: is it ok if I provide them to you in september?
That's fine. And other people on the list should feel free to contribute
also. Though we might want to decide on a first testing corpus.
"dickens" seems as good as anything, but I don't have access to a copy.
-lars
More information about the CWB
mailing list