[CWB] Sample corpus for IMS Corpus Workbench

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon May 21 20:52:14 CEST 2012


Hi Kurt,

Hang steady - you do not need to start from scratch with plaintext of the Dickens, adjusting the existing tutorial data to make it CQPweb-compatible is much easier, as outlined. And if you can hang on till I and/or Stefan finds a suitable schedule hole (which alas can take a very long time as neither of us works on CWB as our main job), we'll do it for you, as Stefan said!

If you do want to start from scratch with some other dataset, the easiest way to get CWB-input-format is to take your data and tag it with TreeTagger. http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/ Just remember to add the <text id="..."> tags to make later insertion into CQPweb possible.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Kurt Sultana
Sent: 20 May 2012 21:07
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Sample corpus for IMS Corpus Workbench

Hi all,

Thanks for your input, and yes an IMS-compatible corpus would be great :)

So I'd need to get the original Dickens text (or any text) and change it to the vertical format and proceed from there, is that right? I presume there are tools for this conversion... would nltk do the trick?

Thanks in advance,
Kurt

On Sun, May 20, 2012 at 11:56 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:
>>> Andrew, would it be possible for you to document installation of the DICKENS corpus in CQPweb somewhere as a tutorial?
Yes, certainly. As with everything else, though, it's finding the time... (and also, since upgrading the install-corpus interface is prominent on the todo list, I don't want to invest too much time in documenting features that will be obsolete in a version or two's time).

But if you can prep a CQPweb-compatible version & put it on the website for download, I can put together a walkthrough when I get a few spare hours.

Andrew.


_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20120521/e54fde84/attachment.htm


More information about the CWB mailing list