[CWB] cqp and very large corpora

Hardie, Andrew a.hardie at lancaster.ac.uk
Sun Nov 11 22:16:33 CET 2012


Better hardware?

I know this sounds glib, but re-engineering CWB to make it multithreaded or to use ancillary database indexes would be a huge undertaking. Throwing better hardware at the problem will almost certainly cost you less than the programmer time to rewrite large chunks of CWB from the ground up.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Nikola Tulechki
Sent: 11 November 2012 07:58
To: Open source development of the Corpus WorkBench
Subject: [CWB] cqp and very large corpora

Hello

I am using cqp with the *WAC corpora (1.5G words) and, while not prohibiting, response times are still in the minutes range.
Are there any ways to further speed-up the tool?
Multithreading? Indexes stored in RAM, in DB?

Thanks
NT
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121111/5d3a4aec/attachment.html>


More information about the CWB mailing list