[CWB] Accents and codification
Stefan Evert
stefanML at collocations.de
Thu Dec 17 08:25:21 CET 2015
> On 16 Dec 2015, at 21:34, Daniel Renau <alphak87 at gmail.com> wrote:
>
> Now my doubts are...
> 1- Better modify the script to call the encoder with "-c utf8"?
Don't use the script from the command line, but rather write a small Perl script using the CWB::Encoder module. The command-line script you're running is basically the same thing, and just sets some parameters from command-line flags, others to immutable default values.
With your own Perl script, you can then use the ->charset() method to encode a UTF-8 corpus. If you know a little Perl, it would also be easy to change the command-line script so that it accepts a new flag for setting the charset.
Best,
Stefan
More information about the CWB
mailing list