[CWB] cwb-describe-corpus not finding corpora
Stefan Evert
stefanML at collocations.de
Tue Feb 9 20:12:34 CET 2010
> Found that the problem is with corpora names in uppercase.
> Probably this should be documented and a warning be raised, or then,
> make the code support file names with uppercase characters.
>
> Should I prepare a patch in what way? O:)
No, that would break too many things, not least CQP.
The (mandatory!) convention is that corpus IDs are all uppercase in
CQP, and all lowercase in registry files (including the filename of
the registry file). When opening a corpus, the CL library
automatically converts the corpus ID to lowercase in order to locate
the registry file.
It would be good for cwb-encode to abort with an error message if the
filename specified with -R isn't all lowercase (except for the
directory part, of course); preferably before starting to encode the
entire corpus. Patches are highly welcome. :-)
Best,
Stefan
More information about the CWB
mailing list