[CWB] [PATCH] cwb-encode checking registry directory
Stefan Evert
stefanML at collocations.de
Wed Feb 10 00:30:24 CET 2010
¡Hola!
Thanks for the patch. I had to / wanted to change a couple of things
this time.
> - moved the code that checks corpus directory earlier (where I added
> it
> was too late on some cases)
>
> - added code to check if the registry directory exists (if not,
> complain and abort)
That's a misunderstanding: -R specifies the full path (directory +
filename) of the registry entry to be created, not just the registry
directory. So registry_file must _not_ be a directory, and in normal
usage it will often not exist. I've moved the test up to the filename
validity check, where I temporarily shorten the string to the
directory part only.
> - added code to check if the registry directory (the last portion,
> that
> is) just includes lowercase letters, digits or underscores (let me
> know
> to enlarge this set)
I've made the check less strict, as the CWB traditionally allowed
almost everything (except uppercase letters) in the registry filenames
and I don't want to break backward compatibility. cwb-encode only
aborts if it detects an uppercase letter (or some other characters
known to be problematic); but it will issue a warning if the filename
is not in canonical format (only a-z, 0-9, _ and -).
> - added code to check if the registry directory was supplied (if not,
> complain and abort)
I've removed this check, also for backward compatibility. Some users
may have build systems that generate the registry entry beforehand and
then run cwb-encode without -R. This certainly makes sense if you
want to add information that cwb-encode doesn't generate, such as
charset (until recently), alignment attributes, etc.
Patch with changes has been committed.
Cheers,
Stefan
More information about the CWB
mailing list