[CWB] CQPweb: error with subcorpus creation

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Jun 15 10:50:53 CEST 2018


Ah, I see.

The issue is that when subcorpora are defined with XML elements, compiling frequency lists can take a very long time – even if compiling the FL for a similar-sized subcorpus of whole texts takes a second or two. FL compile for full-text corpora is supported by precalculated tables of raw frequencies. Compile for XML-based subcorpora isn’t (because there are too many different possibilities).

PHP’s internal time limit can be controlled within scripts. For CQPweb admin processes, the time limit is disabled. But for normal-user processes, it isn’t.

I am reluctant just to switch off the time limit for that procedure, since users are likely to assume that nothing is happening and re-click on things – which is bad. But obviously, not being able to compile FLs is as bad.

I can imagine two ways around this: with an extra type of privilege, or with a configuration setting. I think that the latter will be less confusing. I’ll implement it in the next version.

FOR NOW here is how you can fix this:


  *   open the file freqtable.inc.php
  *   go to line 195, which should be an empty line if you are using the latest check-in
  *   add the following to that line:

if ($User->is_admin()) php_execute_time_unlimit();

… which will turn off the time limit for admin users.

Or, if you are confident turning it off for all users, just :

php_execute_time_unlimit();

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of José Manuel Martínez Martínez
Sent: 14 June 2018 15:30
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: Re: [CWB] CQPweb: error with subcorpus creation

Hi Andrew,

thank you very much for your answer. I've allocated more RAM to PHP modifying the memory limit. But I was still getting some errors with bigger sizes (46,564 Structure ``s'' units and 2,413,480 words). Then I looked into the log file while keeping an eye on the system watching top.

Now the creation of the subcorpus work. But what it is failing is the compilation of the frequency list.

It seems that my CQPweb has enough RAM but it is failing due to maximum execution time. I've modified the PHP variable max_execution_time. I started with 60 seconds, 120, and it still fails with 600.

This is the error in the log

[pid 1579] PHP Fatal error:  Maximum execution time of 600 seconds exceeded in /var/www/html/cqpweb/lib/subcorpus.inc.php on line 4037

This is some additional information on the PID 1579


ps -fp 1579

UID        PID  PPID  C STIME TTY          TIME CMD

www-data  1579  1431  9 12:03 ?        00:11:58 /usr/sbin/apache2 -k start

When I recreate the frequency lists for the whole corpus, it takes a fairly long time, but it normally does not fail. Could be there something in the way subcorpus compiles the frequency list when compared with the creation of frequency lists for the whole corpus?

Cheers,


--
José Manuel Martínez Martínez
https://chozelinek.github.io

On Thu, May 31, 2018 at 12:44 PM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:
You’re probably running out of RAM. Wrangling subcorpora that use sub-text regions is very memory-intensive (I have some ideas in the works to make it less-so).  The way to check this is (a) look in php.ini to find out how much RAM each PHP process is allowed (the memory_limit setting)  (b) watch in “top” on your server as it runs, and note that it will probably time out when the CQPweb process hits that amount of allocated memory.

(Your httpd error log may also contain a note of this error, something like “Allowed memory size of BIGNUMBER bytes exhausted (tried to allocate BIGNUMBER bytes) in php”. Any http 500 error should leave an error message in the log!)

The fix is to let PHP use more RAM. (At least for CQPweb processes). I would not worry about over-allocating RAM as long as you have an adequate swap disk your server for virtual memory when needed!

best

Andrew.

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of José Manuel Martínez Martínez
Sent: 30 May 2018 10:48
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Subject: [CWB] CQPweb: error with subcorpus creation

Dear all,

I'm getting an internal server error when I try to create a subcorpus from a saved query.

The saved query has 58000 hits. I try to define the new subcorpus via "partial-text regions found in a saved query". I select the saved query and I use as sub-text region the structural attribute 's' that in my case denotes sentences.

After a few minutes I get an HTTP 500 ERROR.

However, if I try it with the same query but on a smaller set of hits (9615) the process is successful (the size of the resulting subcorpus is 402,802 tokens and 7700 sentences). However, sometimes I get an error when I try to generate the frequency list. I tried with a saved query slightly bigger (11600 hits) and it fails too.

Is there a way to now what's going wrong?


Cheers,
--
José Manuel Martínez Martínez
https://chozelinek.github.io

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180615/cf7b5ad1/attachment-0001.html>


More information about the CWB mailing list