[CWB] [ cwb-Feature Requests-3585285 ] Make cwb-encode handle non-POSIX (win32) linebreaks
SourceForge.net
noreply at sourceforge.net
Thu Nov 8 03:52:39 CET 2012
Feature Requests item #3585285, was opened at 2012-11-07 18:52
Message generated for change (Tracker Item Submitted) made by andrewhardie
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=3585285&group_id=131809
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CWB engine
Group: TODO-3.5
Status: Open
Priority: 5
Private: No
Submitted By: Andrew Hardie (andrewhardie)
Assigned to: Andrew Hardie (andrewhardie)
Summary: Make cwb-encode handle non-POSIX (win32) linebreaks
Initial Comment:
Moving CWB input text files between Win and *nix can result in CRLF (0x0d, 0x0a) linebreaks being input: if this happens, the CR is encoded as part of the final p-attribute on each line. cwb-encode should be able to spot this and work round it (likewise, in the Win build, be able to cope with POSIX line-breaks; this may already work, but needs checking).
Suggestions for fixing it by Stefan:
- We could extend -B to remove all whitespace characters around tokens, not just blanks.
- We should probably change line #46 of cwb-encode.c to
#define FIELDSEPS "\t\n\r"
These solutions need evaluating and one or both implementing for v 3.5.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=3585285&group_id=131809
More information about the CWB
mailing list