[CWB] Sample corpus for IMS Corpus Workbench

Hardie, Andrew a.hardie at lancaster.ac.uk
Sat May 19 23:08:13 CEST 2012


Hi Kurt,

There's a reasonably extended explanation, with example, on pg 11 to 15 of this draft paper:

http://www.lancs.ac.uk/staff/hardiea/cqpweb-paper.pdf

As you can see from the description here (http://cwb.sourceforge.net/files/CQP_Tutorial/node47.html#sec:appendix:tutorial-corpora), the Dickens corpus does not have the requisite text tags - it has <novel> tags instead, and these do not have unique identifiers (they have a title attribute instead). The CWB encoding tutorial was not designed with CQPweb in mind!

So if you want to put this corpus into CQPweb, you will need to change <novel> ... </novel> to <text> .... </text> and add some single-word unique identifiers e.g. for "A Christmas Carol" you could put <text id="ACC">

Alternatively, if you are not bothered about the system being aware of the text divisions, you can make CQPweb treat the whole thing as one giant text by simply adding an opening <text id="Dickens"> tag at the beginning of the corpus and a closing </text> tag at the end.

Hope that helps

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Kurt Sultana
Sent: 19 May 2012 21:27
To: cwb at sslmit.unibo.it
Subject: [CWB] Sample corpus for IMS Corpus Workbench

Hi all,

I'm new to CQP and CQP Web. I've installed the IMS Corpus Workbench. Now I'm trying to install a sample corpus namely the Dickens corpus available at: http://cwb.sourceforge.net/download.php#corpora

I've installed it in CQP as per instructions in the readme file of the corpus

If you want to install the corpus permanently, copy the file
<registry/dickens> to the global registry directory, and insert the
correct absolute path to the data/ subdirectory in the HOME and INFO
entries.

Then I went to IMS Corpus Workbench Admin, chose Install Corpus and chose the option "Click here to install a corpus you have already indexed in CWB". After submitting however, I get this error

CQPweb encountered an error and could not continue.
Pre-indexed corpora require s-attributes text and text_id!!
... in file /var/www/cqpweb/lib/admin-lib.inc.php line 138.

Where should I add the text_id so? I'd appreciate if you'd explain in a slow-paced manner since I'm very new to this.

Or would someone happen to know of an English sample corpus available which is compatible with IMS corpus workbench?

Thanks in advance,
Kurt

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20120519/dd95d3d0/attachment-0001.htm


More information about the CWB mailing list