[CWB] [CQPWeb] dealing with large corpora

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue Apr 3 09:01:19 CEST 2012


Hi Sylvain,

I moved the scripts that aren't web-accessible from lib to bin a few versions ago. You should EITHER have a bin directory which contains it, OR no bin directory at all. If you have a mixture, you may be between versions.

Assuming the latter, simply go to $root/my_corpus and run

php ../lib/offline-freqlists.php

This is the correct path *but only if you are already in one of the corpus directories*.You need to run it from inside the web directory of the corpus you want to operate on. Note the .. at the start of the path, not . , which I think  might be your problem.

The directory layout should be like this:

$root
 |--lib
 |--doc
 |--corpus1
 |--corpus2
 ...

Your working directory needs to be $root/corpus1 in order to pick up the settings file for corpus1. Thus why the path to the script is ../lib (or ../bin in more recent versions). That is what your error message indicates - corpus settings file couldn't be found.

Best

Andrew.


Best

Andrew.

> -----Original Message-----
> From: Sylvain Loiseau [mailto:sylvain.loiseau at univ-paris13.fr] 
> Sent: 03 April 2012 07:52
> To: Hardie, Andrew
> Cc: Open source development of the Corpus WorkBench
> Subject: Re: [CWB] [CQPWeb] dealing with large corpora
> 
> Hello,
> 
> Thanks a lot. This was exactly the kind of script I was hopping.
> Two more question if I can :
> 
> - the script "offline-freqlists.php" is located under 
> $root/lib/ (and in bin, under my_corpus). Maybe I run an 
> outdated version of CQPWeb ?
> 
> - when I try "php ./lib/offline-freqlists.php" it outputs :
> 
> PHP Warning:  require(settings.inc.php): failed to open 
> stream: No such file or directory in 
> /home/sloiseau/www/cqpweb/lib/offline-freqlists.php on line 
> 45 PHP Fatal error:  require(): Failed opening required 
> 'settings.inc.php' 
> (include_path='.:/usr/share/php:/usr/share/pear') in 
> /home/sloiseau/www/cqpweb/lib/offline-freqlists.php on line 45
> 
> . Have you any ideas?
> 
> Thanks a lot again,
> Best
> Sylvain
> 
> Le 30 mars 2012 à 19:05, Hardie, Andrew a écrit :
> 
> > Hi Sylvain,
> > 
> > You can circumvent the timeout (which is to limit the 
> amount of time on the server that users can occupy) by 
> running frequency list creation outside the web interface. To do this:
> > 
> > - go on the commandline to the webdirectory of the corpus 
> in question, 
> > e.g. /home/www/cqpweb/my_corpus or whatever
> > 
> > - run this command: php ../bin/offline-freqlists.php
> > 
> > That should do the trick. Let me know if it doesn't.
> > 
> > Alternatively, if you don't need the timeout protection, 
> you can disable it in your php.ini file.
> > 
> > best
> > 
> > Andrew.
> > 
> > 
> > -----Original Message-----
> > From: cwb-bounces at sslmit.unibo.it 
> [mailto:cwb-bounces at sslmit.unibo.it] 
> > On Behalf Of Sylvain LOISEAU
> > Sent: 29 March 2012 13:07
> > To: cwb at sslmit.unibo.it
> > Subject: [CWB] [CQPWeb] dealing with large corpora
> > 
> > Dear all,
> > 
> > I've installed a rather large corpus on CQPWeb (500 
> million+ tokens). However I've not been able to categorize 
> the corpus neither to create frequency list: it seems that 
> the corpus is to big and "time out" is reached before the 
> processes have completed.
> > 
> > Do you know how to cope with that? Does a list of the 
> command-lines / SQL instructions performed by CQP web is 
> available, so that I try to execute them one by one, in a terminal?
> > 
> > Best regards,
> > Sylvain Loiseau
> > _______________________________________________
> > CWB mailing list
> > CWB at sslmit.unibo.it
> > http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> 
> 
> 
> -----
> Sylvain Loiseau
> sylvain.loiseau at univ-paris13.fr
> 
> Université Paris 13-Nord
> Laboratoire Lexiques, Dictionnaires, Informatique (UMR 7187 
> CNRS/Université Paris 13-Nord)
> 99 avenue Jean-Baptiste Clément
> F-93410 Villetaneuse
> 
> 


More information about the CWB mailing list