[CWB] TEITOK

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Nov 20 15:31:18 CET 2015


>>>>>>
Without having made any concrete plans for CWB4 input formats, I expect that there will be a tool very similar to cwb-s-encode at least at the Ziggurat level, i.e. a tool which allows you to create a new layer with a given set of variables, and with base layer links specified in terms of layer positions.  For a segmentation layer with a single string variable, the input of this tool might look exactly like the input of cwb-s-encode.

(At least that's what I imagine for the first set of encoding tools. More sophisticated encoders to be added at a later stage. :)
<<<<<<

Yeah, no specific plans yet, but in the back of my mind I have been assuming -

1) a set of simple zig-utils that are wafer-thin wrappers round the Ziggurat library functions, for creating layers and variables of different kinds based on very limited input formats, without any assumptions about the nature of a corpus (which we will need simply for developing and testing Ziggurat)

2) revised CWB-utilities for general use, with (a) one to create a corpus all at once from a compound input file format identical to what CWB3 accepts (like cwb-encode but going directly to the final indexed format); (b) one to add a new attribute or attributes to an existing corpus. (Unless perhaps one program does both a and b?)

The reason I said that cwb-s-encode was unlikely exist is that, since all types of attributes are defined in terms of Ziggurat layers/variables, there does not seem to be a need for *separate* utilities for adding different types of extra attributes to an existing corpus. IE we can just have one util to add any kind of attribute.

Andrew.




More information about the CWB mailing list