[CWB] Accents and codification

Stefan Evert stefanML at collocations.de
Thu Dec 17 08:25:21 CET 2015


> On 16 Dec 2015, at 21:34, Daniel Renau <alphak87 at gmail.com> wrote:
> 
> Now my doubts are...
> 1- Better modify the script to call the encoder with "-c utf8"?

Don't use the script from the command line, but rather write a small Perl script using the CWB::Encoder module.  The command-line script you're running is basically the same thing, and just sets some parameters from command-line flags, others to immutable default values.

With your own Perl script, you can then use the ->charset() method to encode a UTF-8 corpus.  If you know a little Perl, it would also be easy to change the command-line script so that it accepts a new flag for setting the charset.

Best,
Stefan


More information about the CWB mailing list