[CWB] Sample corpus for IMS Corpus Workbench

Stefan Evert stefanML at collocations.de
Sun May 20 11:40:04 CEST 2012


On 19 May 2012, at 23:08, Hardie, Andrew wrote:

> As you can see from the description here (http://cwb.sourceforge.net/files/CQP_Tutorial/node47.html#sec:appendix:tutorial-corpora), the Dickens corpus does not have the requisite text tags – it has <novel> tags instead, and these do not have unique identifiers (they have a title attribute instead). The CWB encoding tutorial was not designed with CQPweb in mind!

Hm, that's a bit inconvenient, isn't it?

I guess we should provide updates of the sample corpora with extra <text id="..."> attributes and a corresponding metadata table, so people can easily take CQPweb for a test drive.  Or is there some other public demo corpus for use with CQPweb?

Andrew, would it be possible for you to document installation of the DICKENS corpus in CQPweb somewhere as a tutorial?  Perhaps as part of the HTML docs that come with CQPweb?

I can provide an updated version of the corpus within the next few days if necessary.

Best,
Stefan



More information about the CWB mailing list