[CWB] change charset to latin1
Stefan Evert
stefanML at collocations.de
Tue Mar 9 19:08:29 CET 2010
To complement Andrew's explanation:
> I want to change uncommenting the sentence:
> :: charset = "latin1"
> or
> charset = "latin1"
This isn't valid registry file syntax.
> I have a corpus and the diacritic argument (%d) doesn't run. I think
> that my charset is UTF8 because I look the commented sentence in the
> registry:
> ##:: charset = "latin1" # character encoding of corpus data
Actually, this is not a comment -- although it looks like one -- but
rather a "corpus property", i.e. a key-value pair that specifies
corpus metadata. The registry file parsers recognises ##:: as a
special token that starts a corpus property definition.
The reason for this peculiar format is backwards compatibility to
earlier CWB versions, which was a big issue when we were working on
CQP at the IMS (because most people would use a stable release and we
had to make sure everything still worked for them).
Best,
Stefan
More information about the CWB
mailing list