[CWB] [CQPWeb] dealing with large corpora
Hardie, Andrew
a.hardie at lancaster.ac.uk
Tue Apr 3 09:01:19 CEST 2012
Hi Sylvain,
I moved the scripts that aren't web-accessible from lib to bin a few versions ago. You should EITHER have a bin directory which contains it, OR no bin directory at all. If you have a mixture, you may be between versions.
Assuming the latter, simply go to $root/my_corpus and run
php ../lib/offline-freqlists.php
This is the correct path *but only if you are already in one of the corpus directories*.You need to run it from inside the web directory of the corpus you want to operate on. Note the .. at the start of the path, not . , which I think might be your problem.
The directory layout should be like this:
$root
|--lib
|--doc
|--corpus1
|--corpus2
...
Your working directory needs to be $root/corpus1 in order to pick up the settings file for corpus1. Thus why the path to the script is ../lib (or ../bin in more recent versions). That is what your error message indicates - corpus settings file couldn't be found.
Best
Andrew.
Best
Andrew.
> -----Original Message-----
> From: Sylvain Loiseau [mailto:sylvain.loiseau at univ-paris13.fr]
> Sent: 03 April 2012 07:52
> To: Hardie, Andrew
> Cc: Open source development of the Corpus WorkBench
> Subject: Re: [CWB] [CQPWeb] dealing with large corpora
>
> Hello,
>
> Thanks a lot. This was exactly the kind of script I was hopping.
> Two more question if I can :
>
> - the script "offline-freqlists.php" is located under
> $root/lib/ (and in bin, under my_corpus). Maybe I run an
> outdated version of CQPWeb ?
>
> - when I try "php ./lib/offline-freqlists.php" it outputs :
>
> PHP Warning: require(settings.inc.php): failed to open
> stream: No such file or directory in
> /home/sloiseau/www/cqpweb/lib/offline-freqlists.php on line
> 45 PHP Fatal error: require(): Failed opening required
> 'settings.inc.php'
> (include_path='.:/usr/share/php:/usr/share/pear') in
> /home/sloiseau/www/cqpweb/lib/offline-freqlists.php on line 45
>
> . Have you any ideas?
>
> Thanks a lot again,
> Best
> Sylvain
>
> Le 30 mars 2012 à 19:05, Hardie, Andrew a écrit :
>
> > Hi Sylvain,
> >
> > You can circumvent the timeout (which is to limit the
> amount of time on the server that users can occupy) by
> running frequency list creation outside the web interface. To do this:
> >
> > - go on the commandline to the webdirectory of the corpus
> in question,
> > e.g. /home/www/cqpweb/my_corpus or whatever
> >
> > - run this command: php ../bin/offline-freqlists.php
> >
> > That should do the trick. Let me know if it doesn't.
> >
> > Alternatively, if you don't need the timeout protection,
> you can disable it in your php.ini file.
> >
> > best
> >
> > Andrew.
> >
> >
> > -----Original Message-----
> > From: cwb-bounces at sslmit.unibo.it
> [mailto:cwb-bounces at sslmit.unibo.it]
> > On Behalf Of Sylvain LOISEAU
> > Sent: 29 March 2012 13:07
> > To: cwb at sslmit.unibo.it
> > Subject: [CWB] [CQPWeb] dealing with large corpora
> >
> > Dear all,
> >
> > I've installed a rather large corpus on CQPWeb (500
> million+ tokens). However I've not been able to categorize
> the corpus neither to create frequency list: it seems that
> the corpus is to big and "time out" is reached before the
> processes have completed.
> >
> > Do you know how to cope with that? Does a list of the
> command-lines / SQL instructions performed by CQP web is
> available, so that I try to execute them one by one, in a terminal?
> >
> > Best regards,
> > Sylvain Loiseau
> > _______________________________________________
> > CWB mailing list
> > CWB at sslmit.unibo.it
> > http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> -----
> Sylvain Loiseau
> sylvain.loiseau at univ-paris13.fr
>
> Université Paris 13-Nord
> Laboratoire Lexiques, Dictionnaires, Informatique (UMR 7187
> CNRS/Université Paris 13-Nord)
> 99 avenue Jean-Baptiste Clément
> F-93410 Villetaneuse
>
>
More information about the CWB
mailing list