[CWB] cwb-describe-corpus not finding corpora

Stefan Evert stefanML at collocations.de
Tue Feb 9 20:12:34 CET 2010


> Found that the problem is with corpora names in uppercase.
> Probably this should be documented and a warning be raised, or then,
> make the code support file names with uppercase characters.
>
> Should I prepare a patch in what way? O:)

No, that would break too many things, not least CQP.

The (mandatory!) convention is that corpus IDs are all uppercase in  
CQP, and all lowercase in registry files (including the filename of  
the registry file).  When opening a corpus, the CL library  
automatically converts the corpus ID to lowercase in order to locate  
the registry file.

It would be good for cwb-encode to abort with an error message if the  
filename specified with -R isn't all lowercase (except for the  
directory part, of course); preferably before starting to encode the  
entire corpus.  Patches are highly welcome. :-)

Best,
Stefan




More information about the CWB mailing list