[CWB] [CQPWeb] dealing with large corpora

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Mar 30 19:05:52 CEST 2012


Hi Sylvain,

You can circumvent the timeout (which is to limit the amount of time on the server that users can occupy) by running frequency list creation outside the web interface. To do this:

- go on the commandline to the webdirectory of the corpus in question, e.g. /home/www/cqpweb/my_corpus or whatever

- run this command: php ../bin/offline-freqlists.php

That should do the trick. Let me know if it doesn't.

Alternatively, if you don't need the timeout protection, you can disable it in your php.ini file.

best

Andrew.


-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Sylvain LOISEAU
Sent: 29 March 2012 13:07
To: cwb at sslmit.unibo.it
Subject: [CWB] [CQPWeb] dealing with large corpora

Dear all,

I've installed a rather large corpus on CQPWeb (500 million+ tokens). However I've not been able to categorize the corpus neither to create frequency list: it seems that the corpus is to big and "time out" is reached before the processes have completed.

Do you know how to cope with that? Does a list of the command-lines / SQL instructions performed by CQP web is available, so that I try to execute them one by one, in a terminal?

Best regards,
Sylvain Loiseau
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list