[CWB] CQPweb: Problems with Metadata indexing for two different corpora

Hannah Kermes h.kermes at mx.uni-saarland.de
Thu Feb 11 08:30:37 CET 2016


Hi Andrew

Am 10.02.2016 um 14:53 schrieb Hardie, Andrew:
> Hi Hannha,
>
> ==========
> At first I couldn't create any metadata table. When I reduced the amount
> of metadata per text, I could at least create most of the Metadata table
> ==========
>
> What was the error message here? I have never encountered this problem, although I can imagine how it might theoretically arise due to limits in MySQL.....
We looked up the error messages in Apache2/logs/error_log, the ones on 
the web - if we got some - were not helpful.

First error (not enough memory) leading to a white really empty (html 
source) page in the browser:
error_log:
[Tue Jan 19 12:59:04.577561 2016] [fcgid:warn] [pid 3464:tid 
140711420450560] [c
lient 134.96.94.205:34044] mod_fcgid: stderr: PHP Fatal error: Allowed 
memory s
ize of 134217728 bytes exhausted (tried to allocate 71 bytes) in 
/data2/htdocs/c
qpweb/lib/cqp.inc.php on line 478, referer: 
https://fedora.clarin-d.uni-saarland
.de/cqpweb/OBC/index.php?thisQ=manageMetadata&uT=y

Second error after increasing the memory resulted in a standard apache 
error page in the browser (... contact your service admin)
error_log:
[Tue Jan 19 16:49:56.767080 2016] [fcgid:warn] [pid 3562:tid 
140711403665152] [c
lient 134.96.94.205:34638] mod_fcgid: stderr: PHP Fatal error: Maximum 
executio
n time of 30 seconds exceeded in 
/data2/htdocs/cqpweb/lib/admin-lib.inc.php on l
ine 738, referer: 
https://fedora.clarin-d.uni-saarland.de/cqpweb/OBC/index.php?t
hisQ=manageMetadata&createMetadataFromXml=1&uT=y

Third error after increasing the runtime: either the standard apache 
error page again or the process simply stops in the browser with nothing 
having changed.
error_log:
[Wed Jan 27 20:13:35.300911 2016] [fcgid:warn] [pid 11272:tid 
140361473201920] (
104)Connection reset by peer: [client 95.208.248.46:62545] mod_fcgid: 
error read
ing data from FastCGI server, referer: 
https://fedora.clarin-d.uni-saarland.de/c
qpweb/OBC/index.php?thisQ=manageMetadata&uT=y

[Wed Jan 27 20:13:35.301033 2016] [core:error] [pid 11272:tid 
140361473201920] [
client 95.208.248.46:62545] End of script output before headers: 
execute.php, re
ferer: 
https://fedora.clarin-d.uni-saarland.de/cqpweb/OBC/index.php?thisQ=manage
Metadata&uT=y
>
> ==========
> My question is now. What is the problem and what can I do to generate
> the 'Text begin/end positions'
> ==========
>
> Use the "offline-freqlists" script. Admin manual sec 4.9, page 34.
Thanks for the hint, I completely forgot about it. Yet we tried it out 
as it says in the manual and got the following error message indicating 
- I assume - that it could not find text_metadata_for the corpus. Yet, 
in the browser the metadata table is set and only one of the frequency 
lists have not been created (either begin/end text or the frequency list 
itself)
We executed the script in the directory of the respective corpus.

Thanks in advance
Hannah

Error message for offline-freqlist.php:


php ../bin/offline-freqlists.php rsc_v1_17


/***********************/




This script runs all the setup for frequency lists for a corpus.

Full debug messages are printed.

Note, if you run this script before setting up the text metdata table, 
things WILL go badly wrong.


About to run the function populating corpus CQP positions...

PHP Warning:  Missing argument 1 for populate_corpus_cqp_positions(), 
called in /data2/htdocs/cqpweb/bin/offline-freqlists.php on line 96 and 
defined in /data2/htdocs/cqpweb/lib/admin-lib.inc.php on line 295
PHP Notice:  Undefined variable: corpus in 
/data2/htdocs/cqpweb/lib/admin-lib.inc.php on line 297
About to run the following MySQL query:
     update text_metadata_for_
             set cqp_begin = 0, cqp_end = 2145
             where text_id = '101032'


A mySQL query did not run successfully!

Original query:

update text_metadata_for_
             set cqp_begin = 0, cqp_end = 2145
             where text_id = '101032'



Error # 1146: Table 'cqpweb.text_metadata_for_' doesn't exist



PHP debugging backtrace
=======================
array(4) {
   [0]=>
   array(4) {
     ["file"]=>
     string(42) "/data2/htdocs/cqpweb/lib/exiterror.inc.php"
     ["line"]=>
     int(289)
     ["function"]=>
     string(17) "exiterror_endpage"
     ["args"]=>
     array(0) {
     }
   }
   [1]=>
   array(4) {
     ["file"]=>
     string(40) "/data2/htdocs/cqpweb/lib/library.inc.php"
     ["line"]=>
     int(234)
     ["function"]=>
     string(20) "exiterror_mysqlquery"
     ["args"]=>
     array(3) {
       [0]=>
       &int(1146)
       [1]=>
       &string(47) "Table 'cqpweb.text_metadata_for_' doesn't exist"
       [2]=>
       &string(90) "update text_metadata_for_
             set cqp_begin = 0, cqp_end = 2145
             where text_id = '101032'"
     }
   }
   [2]=>
   array(4) {
     ["file"]=>
     string(42) "/data2/htdocs/cqpweb/lib/admin-lib.inc.php"
     ["line"]=>
     int(321)
     ["function"]=>
     string(14) "do_mysql_query"
     ["args"]=>
     array(1) {
       [0]=>
       &string(90) "update text_metadata_for_
             set cqp_begin = 0, cqp_end = 2145
             where text_id = '101032'"
     }
   }
   [3]=>
   array(4) {
     ["file"]=>
     string(46) "/data2/htdocs/cqpweb/bin/offline-freqlists.php"
     ["line"]=>
     int(96)
     ["function"]=>
     string(29) "populate_corpus_cqp_positions"
     ["args"]=>
     array(0) {
     }
   }
}

>
> best
>
> Andrew.
>
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
> Sent: 08 February 2016 14:09
> To: Open source development of the Corpus WorkBench
> Subject: [CWB] CQPweb: Problems with Metadata indexing for two different corpora
>
> Hi all,
>
> we recently had problems with the indexing of Metadata for two different
> corpora in CQPweb
>
> Both corpora are mid-sized corpora 24 Million and 35 Million tokens and
> have been preencoded successfully in CQP.
>
> The first corpus with approx. 24 Million, with almost 700.000 texts, and
> quite a bit of metadata for each text (categorical and free text).
> At first I couldn't create any metadata table. When I reduced the amount
> of metadata per text, I could at least create most of the Metadata table
> except the 'Text begin/end positions'.
> We found out that we initially had a memory problem, which we solved by
> increasing the allowed memory size. Then we had a run-time problem
> originating from the server, so we increased the run-time here. Now we
> have a time-out originating from the browser.
> My question is now. What is the problem and what can I do to generate
> the 'Text begin/end positions'. I think it is not very useful to
> increase the time-out of my browser, as I would have to keep the
> connection open all the time. Is there a possibility to generate the
> metadata from the command-line on the server?
>
> For the second corpus with approx. 35 Million tokens, the 'text
> begin/end positions' are unproblematic. Here we have a problem
> generating the 'Frequency talbes'. The corpus has 8 positional
> attributes, for of which contain floats.
>
> Thanks in advance
> Hannah
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list