[CWB] File format of encoded cwb corpora

Stefan Evert stefanML at collocations.de
Sat Jul 14 10:58:51 CEST 2012


I agree wholeheartedly.  The problem is just that for the NITE XML Toolkit we actually developed the (almost) formal specification _before_ we implemented the software.  For the CWB, it's the other way round (and worse yet, somebody else implemented the software).  In fact, the semantics of the query language are only defined by the existing (and sometimes buggy) implementation. :-(

Any chance of organising a one-week CWB hackathon?

Best wishes,
Stefan


On 13 Jul 2012, at 17:33, Serge Heiden wrote:

> Something nice would be to do documents
> like the ones Stefan Evert has done for the NXT Search engine :
> http://www.ims.uni-stuttgart.de/projekte/nite
> 
> A) a CQP object model justifying a detailed description of index files architecture
> (like the "CQP Corpus Administrator's Manual" schema p. 14 but
> with real file names to begin with)
> Like this document:
> Formal specification of the NITE Object Model, the abstract data model used by the NITE XML Toolkit.
> -> http://www.ltg.ed.ac.uk/NITE/documents/NiteObjectModel.v2.1.pdf
> 
> B) a CQL formal specification
> Like this document:
> Formal specification of NiteQL, the query language that operates over data conforming to the NITE Object Model.
> -> http://www.ltg.ed.ac.uk/NITE/documents/NiteQL.v2.1.pdf
> I once started a list of all the CQL syntax features I know of
> in a Googledoc, but it hasn't evolved to something readable:
> https://docs.google.com/document/d/1rz39LixYl6uegx35kIj6JLYbMPEOsy2ycg4JuCBZ68Y/edit?hl=fr&pli=1
> 
> Best,
> Serge



More information about the CWB mailing list