[CWB] [cwb:feature-requests] #47 Make cwb-encode handle non-POSIX (win32) linebreaks

Stefan Evert schtepf at users.sf.net
Sat Jul 1 14:48:04 CEST 2017


Some old Mac software might also produce files with CR-only linebreaks, but these probably can't be fixed.


---

** [feature-requests:#47] Make cwb-encode handle non-POSIX (win32) linebreaks**

**Status:** open
**Group:** TODO-3.5
**Labels:** CWB engine 
**Created:** Thu Nov 08, 2012 02:52 AM UTC by Andrew Hardie
**Last Updated:** Wed Dec 12, 2012 05:26 AM UTC
**Owner:** Andrew Hardie


Moving CWB input text files between Win and \*nix can result in CRLF \(0x0d, 0x0a\) linebreaks being input: if this happens, the CR is encoded as part of the final p-attribute on each line. cwb-encode should be able to spot this and work round it \(likewise, in the Win build, be able to cope with POSIX line-breaks; this may already work, but needs checking\).

Suggestions for fixing it by Stefan:

\- We could extend -B to remove all whitespace characters around tokens, not just blanks.

\- We should probably change line \#46 of cwb-encode.c to 

	\#define FIELDSEPS  "\t\n\r"

These solutions need evaluating and one or both implementing for v 3.5.


---

Sent from sourceforge.net because cwb at sslmit.unibo.it is subscribed to https://sourceforge.net/p/cwb/feature-requests/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/cwb/admin/feature-requests/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170701/df023bf1/attachment-0001.html>


More information about the CWB mailing list