[CWB] Unable to index a corpus

Hardie, Andrew a.hardie at lancaster.ac.uk
Wed Jul 26 10:22:45 CEST 2017


As I noted before, the problem is actually error messages. Line 644 simply collects error messages – so an out-of-memory error here indicates you have generated > 4GB of error messages.

I suggested increasing the memory previously because it would let you see the problem – but actually, with 4GB of error messages, I’d suggest that doing that is not likely to help much,

So what I would suggest instead is hacking the code to find out the error message.

Open admin-lib.inbc.php
Go to line 644
Find the line nearby that says $output_lines_from_cwb = array($encode_command);

AFTER THAT LINE, but before the line that says exec($encode_command, $output_lines_from_cwb, $exit_status_from_cwb); add the following:

if (count($output_lines_from_cwb) > 1000) {show_var($output_lines_from_cwb); exiterror("abort"); }

What this line does is make things abort if it detects too many error messages.

If you then get a readable error message, that might give you a hint what the real problem is. If not, try again moving the location o fthe hack line down the file, before the following lines:

before exec($makeall_command, $output_lines_from_cwb, $exit_status_from_cwb);
before exec($compress_command, $compression_output, $exit_status_from_cwb);
before the second example fo exec($makeall_command, $output_lines_from_cwb, $exit_status_from_cwb);
before } /* end else (from if cwb index already exists) */

Hopefully, as I say, doing this will get you a gimpse of the first 1,000 lines of erro, which may tell you what the underlying problem is.

Hope this helps

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of VIVALDI PALATRESI, JORGE
Sent: 26 July 2017 08:44
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Unable to index a corpus

Hi Andrew,

2017-07-25 17:37 GMT+02:00 Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>>:
Did you restart Apache after changing the php.ini file?
Yes

Apache only loads php.ini on startup.
I also check the change with phpinfo() .
Currently the limit is placed 4096M and the error message is:

[Wed Jul 26 09:33:57.230338 2017] [:error] [pid 9248] [client 10.80.10.56:63665<http://10.80.10.56:63665>] PHP Fatal error:  Allowed memory size of 4294967296 bytes exhausted (tried to allocate 1073741832 bytes) in /var/www/html/UPFcorpora/lib/admin-install.inc.php on line 644, referer: http://corpora-tcl-iula.s.upf.edu/UPFcorpora/adm/index.php?thisF=installCorpus&uT=y
Does it make sense?
Should I again the memory_limit value?
My corpus size is:
-rw-r--r-- 1 root staff 666666964 jul 20 12:37 /var/local/CQPweb/upload/ca_allDocs.cqp

Best,
Jorge



best

Andrew.

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of VIVALDI PALATRESI, JORGE
Sent: 25 July 2017 12:47
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Unable to index a corpus

Hi Andrew

I changed permissions in the HTML document tree and change php.ini as suggested and try to reinstall my corpus. Unfortunately, it still fails issuing again the message:

[Tue Jul 25 13:27:23.297261 2017] [:error] [pid 28750] [client 10.80.10.56:60272<http://10.80.10.56:60272>] PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 33554440 bytes) in /var/www/html/UPFcorpora/lib/admin-install.inc.php on line 644, referer: http://corpora-tcl-iula.s.upf.edu/UPFcorpora/adm/index.php?thisF=installCorpus&uT=y
I tested several sizes of the memory_limit in the php.ini file but none seems to be correct. Currently it is set to 1024M (I also tested 2024M).
Is this value reasonable?
The corpus that I am using for testing contains something less than 30 M tokens.
Any suggestion is very welcomed.
Thank you in advance,

Jorge



2017-07-24 19:04 GMT+02:00 Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>>:
Hi Jorge,


•       PHP Warning:  chmod():     Operation not permitted in /var/www/html/UPFcorpora/lib/admin-install.inc.php on line 601, referer: http://corpora-tcl-iula.s.upf.edu/UPFcorpora/adm/index.php?thisF=installCorpus&uT=y<http://corpora-tcl-iula.s.upf.edu/UPFcorpora/adm/index.php?thisF=installCorpus&uT=y>

This arises from a permissions issue as you suspect, but the issue arises from your HTML document tree – not your data spaces. You need to make sure that the http daemon has permissions to create files within the main “cqpweb” web folder. (i.e. Alongside lib, exe and friends)


•       PHP Fatal error:  Allowed memory size of 134217728 bytes exhausted (tried to allocate 33554440 bytes) in /var/www/html/UPFcorpora/lib/admin-install.inc.php on line 644, referer: http://corpora-tcl-iula.s.upf.edu/UPFcorpora/adm/index.php?thisF=installCorpus&uT=y
128 MB of RAM is very stingy for CQPweb if you are working with large corpora. I suggest increasing the limit in php.ini.

Fixing this will allow you to see if there is another error message at that point – I suspect there is but that the memory error is blocking you from seeing what it is.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of VIVALDI PALATRESI, JORGE
Sent: 24 July 2017 10:01
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Unable to index a corpus

Andrew,

2017-07-23 0:56 GMT+02:00 Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>>:
The most likely cause here is that CQP is not on the path *used by the http daemon* (which, at least when the daemon is Apache, can be and often is different to the usual shell path).

Have you tried explicitly setting the $path_to_cwb config variable?

The problem seems to be solved setting up this config variable.
Unfortunately a new problem arise. The web client screen still blank the now the message in the error.los is:

 [Mon Jul 24 10:44:35.349201 2017] [:error] [pid 18979] [client 10.80.10.56:50557<http://10.80.10.56:50557>] PHP Warning:  chmod():     Operation not permitted in /var/www/html/UPFcorpora/lib/admin-install.inc.php on line 601, referer: htt    p://corpora-tcl-iula.s.upf.edu/UPFcorpora/adm/index.php?thisF=installCorpus&uT=y<http://corpora-tcl-iula.s.upf.edu/UPFcorpora/adm/index.php?thisF=installCorpus&uT=y>
[Mon Jul 24 10:44:41.850762 2017] [:error] [pid 18979] [client 10.80.10.56:50557<http://10.80.10.56:50557>] PHP Fatal error:  Allo    wed memory size of 134217728 bytes exhausted (tried to allocate 33554440 bytes) in /var/www/html/UPFcorp    ora/lib/admin-install.inc.php on line 644, referer: http://corpora-tcl-iula.s.upf.edu/UPFcorpora/adm/ind    ex.php?thisF=installCorpus&uT=y

It seems that the problem now regards to CQPweb disk locations/permissions. My location and permission are identical to previous working installation (using CQPweb 3.1.12 and CWB 3.1) and are the following:

jvivaldi at corpora-tcl-iula:~$ ls -l /var/local/CQPweb/
total 16
drwxrwsrwx 3 www-data www-data 4096 jul 24 10:44 index
drwxrwsrwx 2 www-data www-data 4096 jul 24 10:47 registry
drwxrwsrwx 2 www-data www-data 4096 jul 17 10:58 temp
drwxrwsrwx 2 www-data www-data 4096 jul 20 12:37 upload
jvivaldi at corpora-tcl-iula:~$
I am the owner of the root directory (/var/local/CQPweb/).

Are these permissions correct?


You can also try going to the admin control panel and then to "system diagnostics" and then " Run a system check on the CQP back-end process connection". This function does not yet check everything that can go wrong but it does check some of the major things.

Yes. I ran this check and it is ok.

Any suggestion will be welcomed
Thank you in advance
Jorge

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of Stefan Evert
Sent: 21 July 2017 12:04
To: VIVALDI PALATRESI, JORGE
Cc: CWBdev Mailing List
Subject: Re: [CWB] Unable to index a corpus


> On 21 Jul 2017, at 12:10, VIVALDI PALATRESI, JORGE <jorge.vivaldi at upf.edu<mailto:jorge.vivaldi at upf.edu>> wrote:
>
> I revert CQPweb as suggested but now the message in the browser is just:
> ERROR: CQP backend startup failed
> Obviously, cqp and related programs are all on the path.
> Any suggestion?

No idea, this goes far beyond my knowledge of CQPweb, so we'll have to wait for Andrew's expertise.

Did the revert work cleanly or do you have local modifications that might cause the error?

Best,
Stefan


_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb



--
Jorge Vivaldi Palatresi
Institut Universitari de Lingüística Aplicada
Universitat Pompeu Fabra
C/ Roc Boronat, 138
08018 Barcelona
Espanya

+34 93 542 2332<tel:+34%20935%2042%2023%2032>
https://www.upf.edu/pdi/iula/jorge.vivaldi/index_esp.htm

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb



--
Jorge Vivaldi Palatresi
Institut Universitari de Lingüística Aplicada
Universitat Pompeu Fabra
C/ Roc Boronat, 138
08018 Barcelona
Espanya

+34 93 542 2332<tel:+34%20935%2042%2023%2032>
https://www.upf.edu/pdi/iula/jorge.vivaldi/index_esp.htm

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb



--
Jorge Vivaldi Palatresi
Institut Universitari de Lingüística Aplicada
Universitat Pompeu Fabra
C/ Roc Boronat, 138
08018 Barcelona
Espanya

+34 93 542 2332
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170726/5a74d03b/attachment-0001.html>


More information about the CWB mailing list