[CWB] cwb testing

Sun Jul 30 08:39:20 CEST 2006

On 29 Jul 2006, at 22:04, lars.nygaard at iln.uio.no wrote:

>
>>
>> I attach the Perl source code of a CQP test suite ...
>
> Well, a lot of it was in German, and my mastery of German is rather  
> weak,
> so that will limit the usefulness. I think I'll just start over (in  
> fact,
> I just did).
>

Very probably a good idea.  In my opinion the whole system of "test  
records" - each of which consists of a query, an optional subset  
expression, followed by optional "group" and "sort" commands, which  
would be exectued in various combinations - is far too complicated to  
be useful in the long term.

A large part of the code dealt with running CQP and collecting  
results (which could now been done through the CQP/Perl interface,  
since CQP knows how to act like a well-behaved backend), and figuring  
out if CQP had crashed or was hanging.  This shouldn't been necessary  
any more, since almost all crashes are out-of-memory problems (which  
print a clear message on STDERR) and there's only one known case  
where CQP will actually get stuck (of course, it would be quite nice  
if the test suite wouldn't hang there and wait indefinitely for an  
answer from CQP ...).

> What would be useful for me would be a list of functions/features that
> need to be supported.

I think the basic feature is: run a CQP command - or any set of CQP  
commands - and compare the output to "gold standard".  The old test  
suite implemented some basic reformatting of the output so that pure  
white-space differences could be ignored automatically.  It should be  
possible to automatically create a "gold standard" from the output of  
a given version of CQP, ather than having to code/verify everything  
manually. In this way, we could at least make sure that new CQP  
versions don't introduce any new bugs, which is one of the most  
important uses of the test suite.  Ideally, it should also be  
possible to specify Perl code (or external programs?) that generates  
items in the gold standard. This makes sense, in particular, for  
algorithmically generated corpora, but in principle one could also  
take, say, DICKENS in one-word-per-line format and write a (slow)  
Perl script that identifies the same tokens or token sequences as a  
CQP query (the assumption being that any errors in the Perl script  
would be independent from those in CQP and thus produce a different  
output for the gold standard).

A lot of testing can be done incrementally, re-using results from  
previous tests.  While this would make the test suite much faster, in  
the interest of a simple and robust implementation it might be most  
sensible to repeat all preparatory commands for each test (perhaps  
aided by something like macros or a "setup" function in the test  
scripts?).  Perhaps you can come up with a clever solution here ...

>
>> The test suite also includes a number of test queries and expected
>> output (if you have the right corpus ...).
>
> I could not find any ...
>

Oops, sorry, what I had sent you there is a cleaned-up version of the  
test suite from when I was planning to start a partial rewrite ...  
I've found the old version with the examples in the meantime. Let me  
know if you want to look at them, but I don't think these queries are  
particularly useful after all.  A better strategy would be to use the  
examples from the CQP tutorial as a first test suite.

>
> It is indeed feasible, and I think we need it for queries that are not
> supported by the Perl interface (unless I can figure out a way to send
> several commands), for example dump/undump.
>

You can just concatenate the commands and send them as a single  
string (separated by semicolons), but this behaviour is not supported  
officially.  Why don't you just send multiple commands and either  
accumulate the output or just look at the output of the last command?

Let me know if you need some help with, or insights into the CWB/Perl  
interface!

> Even simpler: If the encoding/indexing is incorrect, the test  
> queries will
> reveal it, so we can give the encode/decode procedure lower priority.

But only if we have a manually coded gold standard (or one that was  
created by Perl scripts or so).  Still, the test suite for encoding/ 
decoding is very simple, so I'd rather you focussed on the CQP test  
suite.

All the best & thanks again for working on the test suite,
Stefan