[CWB] Concordance printing -- opinions?

Peter Ljunglöf peter.ljunglof at gu.se
Tue Sep 21 12:35:10 CEST 2010


Hi,

Personally, I'm in favour of Stefan's version B. It's more standardized, easier to maintain (I think). And it's easier (at least for me) to write XML-transformations than to learn how to configure kwic formatting.

/Peter Ljunglöf

21 sep 2010 kl. 12.13 skrev Stefan Evert:

> Version A: Keep the kwic-formatting subsystem mostly as is, and just clean up the implementation and make it as configurable as possible within this framework.  The current kwic formatter (and similar code for "group" output etc.) uses a common algorithm to determine context size and put together kwic lines.  Different output modes are implemented by specifying strings that are inserted in various places in the kwic line, e.g. to separate attributes, separate tokens, before/after XML tags, etc.  By clever use of these strings, you can produce output the looks very much like HTML, SGML or LaTeX.
> 
> (...)
> 
> Version B: Reimplement kwic formatting from scratch, offering only two output modes: XML and plain text.
> 
> Plain text output would be targeted exclusively at interactive terminals and would have a special implementation (i) to produce correct and robust highlighting/colours and (ii) to display fixed-character context efficiently.  It wouldn't be intended for further automatic processing, offering no special escaping tricks to produce unambiguous output.  For most automatic processing needs, "tabulate" is a much better starting point than "cat" anyway.
> 
> XML output would be a standardised, precisely defined intermediate format for further processing by a GUI front-end etc.  This format need not be user-configurable, since all necessary transformations can easily be achieved with an XSLT stylesheet, Perl script, etc.  It would be shared between cwb-decode and CQP, possibly using a common implementation within the low-level library.
> 
> Importantly, the XML output would _not_ support fixed-character context -- that is just a form of presentation, not a sensible definition of context size.  It's easy enough to produce the required display with a short Perl or Python script, anyway.

________________________________________________________________________________
peter ljunglöf, språkbanken, göteborgs universitet




More information about the CWB mailing list