[CWB] [CQPWeb] dealing with large corpora
Hardie, Andrew
a.hardie at lancaster.ac.uk
Fri Mar 30 19:05:52 CEST 2012
Hi Sylvain,
You can circumvent the timeout (which is to limit the amount of time on the server that users can occupy) by running frequency list creation outside the web interface. To do this:
- go on the commandline to the webdirectory of the corpus in question, e.g. /home/www/cqpweb/my_corpus or whatever
- run this command: php ../bin/offline-freqlists.php
That should do the trick. Let me know if it doesn't.
Alternatively, if you don't need the timeout protection, you can disable it in your php.ini file.
best
Andrew.
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Sylvain LOISEAU
Sent: 29 March 2012 13:07
To: cwb at sslmit.unibo.it
Subject: [CWB] [CQPWeb] dealing with large corpora
Dear all,
I've installed a rather large corpus on CQPWeb (500 million+ tokens). However I've not been able to categorize the corpus neither to create frequency list: it seems that the corpus is to big and "time out" is reached before the processes have completed.
Do you know how to cope with that? Does a list of the command-lines / SQL instructions performed by CQP web is available, so that I try to execute them one by one, in a terminal?
Best regards,
Sylvain Loiseau
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list