[CWB] [ cwb-Feature Requests-2891967 ] Read undump files without explicit line count

SourceForge.net noreply at sourceforge.net
Wed Nov 4 16:00:27 CET 2009


Feature Requests item #2891967, was opened at 2009-11-04 15:57
Message generated for change (Settings changed) made by schtepf
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=2891967&group_id=131809

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CWB engine
Group: None
>Status: Closed
Priority: 6
Private: No
Submitted By: Stefan Evert (schtepf)
Assigned to: Stefan Evert (schtepf)
Summary: Read undump files without explicit line count

Initial Comment:
The "undump" command in CQP requires an explicit line count header in the first line of the undump file, so that arrays can be pre-allocated.  This is a major hassle for exchanging data with spreadsheets, SQL database engines, R, and other software that would otherwise work quite well with the TAB-delimited format of dump/undump files.  Without this restriction, it would also be possible to use dump files as a platform-independent serialization format for query results (unlike "save", which produces unportable binary files that even store the registry directory of the base corpus). 

----------------------------------------------------------------------

>Comment By: Stefan Evert (schtepf)
Date: 2009-11-04 16:00

Message:
Fixed in version 2.2.b101.

The header line is now optional if the undump is loaded from a regular
file.  CQP will automatically detect the new format and read the undump
file in two passes (first to determine number of lines, then to read actual
data).  The new format cannot be used when reading from a pipe or from
standard input (because pipes cannot be re-read).

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722306&aid=2891967&group_id=131809


More information about the CWB mailing list