[CWB] [cwb:feature-requests] #47 Make cwb-encode handle non-POSIX (win32) linebreaks
Stefan Evert
schtepf at users.sf.net
Sat Jul 1 14:48:04 CEST 2017
Some old Mac software might also produce files with CR-only linebreaks, but these probably can't be fixed.
---
** [feature-requests:#47] Make cwb-encode handle non-POSIX (win32) linebreaks**
**Status:** open
**Group:** TODO-3.5
**Labels:** CWB engine
**Created:** Thu Nov 08, 2012 02:52 AM UTC by Andrew Hardie
**Last Updated:** Wed Dec 12, 2012 05:26 AM UTC
**Owner:** Andrew Hardie
Moving CWB input text files between Win and \*nix can result in CRLF \(0x0d, 0x0a\) linebreaks being input: if this happens, the CR is encoded as part of the final p-attribute on each line. cwb-encode should be able to spot this and work round it \(likewise, in the Win build, be able to cope with POSIX line-breaks; this may already work, but needs checking\).
Suggestions for fixing it by Stefan:
\- We could extend -B to remove all whitespace characters around tokens, not just blanks.
\- We should probably change line \#46 of cwb-encode.c to
\#define FIELDSEPS "\t\n\r"
These solutions need evaluating and one or both implementing for v 3.5.
---
Sent from sourceforge.net because cwb at sslmit.unibo.it is subscribed to https://sourceforge.net/p/cwb/feature-requests/
To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/cwb/admin/feature-requests/options. Or, if this is a mailing list, you can unsubscribe from the mailing list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170701/df023bf1/attachment-0001.html>
More information about the CWB
mailing list