[CWB] [ cwb-Feature Requests-2806335 ] CQPweb: non-UTF-8 (ie Latin-1 and other 8-bit codepages)

SourceForge.net noreply at sourceforge.net
Sat Nov 28 09:03:39 CET 2009


Feature Requests item #2806335, was opened at 2009-06-14 23:48
Message generated for change (Comment added) made by andrewhardie
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=2806335&group_id=131809

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CQPweb
Group: None
>Status: Closed
Priority: 5
Private: No
Submitted By: Andrew Hardie (andrewhardie)
Assigned to: Andrew Hardie (andrewhardie)
Summary: CQPweb: non-UTF-8 (ie Latin-1 and other 8-bit codepages)

Initial Comment:
This is best accomplished by an (optional) filter at the CQP interface level, which switches (UTF-8) input from the web-scripts to ISO-8859 input for CQP (or, obviously, not) and then does reverse translation with strings returned from CQP.

This would need to be governed by a per-corpus setting that is passed to the CQP class upon calling the __construct method and becomes part of that class's setup, being checked by the CQP::execute() method (and any others that pass text back-and-forth).

Every script's call to CQP::__construct() would have to be modified for this.

----------------------------------------------------------------------

>Comment By: Andrew Hardie (andrewhardie)
Date: 2009-11-28 08:03

Message:
done in v 2.08

----------------------------------------------------------------------

Comment By: Stefan Evert (schtepf)
Date: 2009-10-25 17:19

Message:
If a corpus is properly encoded with the ##::charset property set, then the
CQP interface might be able to figure out the corpus encoding by itself. 
Basically, it would have to watch for a corpus activation command and then
access the corpus properties.  Currently, this can be done with the "info;"
command if one doesn't mind the overhead of reading and printing the entire
.info file.  But it would always be possible to add a few special tricks to
CQP for the sake of better CQPweb compatibility.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=2806335&group_id=131809


More information about the CWB mailing list