[CWB] CQPweb: Problems with Metadata indexing for two different corpora

Hannah Kermes h.kermes at mx.uni-saarland.de
Thu Feb 11 10:53:08 CET 2016


Hi Andrew,

Am 11.02.2016 um 09:51 schrieb Hardie, Andrew:
> Hi Hannah,
>
> Re the memory errors, I'll have a think about those. RE the error with offline freqlists,
>
>>>> called in /data2/htdocs/cqpweb/bin/offline-freqlists.php on line 96 and
> The call for that function has not been on line 96 for some time. what version are you running? It looks like 3.1.16.
its version 3.1.4, we haven't updated for a while, because we couldn't 
find the time and were a bit afraid too, because we lost our complete 
users the last time we updated.

Best
Hannah
>
> best
>
> Andrew.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
> Sent: 11 February 2016 07:31
> To: Open source development of the Corpus WorkBench
> Subject: Re: [CWB] CQPweb: Problems with Metadata indexing for two different corpora
>
> Hi Andrew
>
> Am 10.02.2016 um 14:53 schrieb Hardie, Andrew:
>> Hi Hannha,
>>
>> ==========
>> At first I couldn't create any metadata table. When I reduced the amount
>> of metadata per text, I could at least create most of the Metadata table
>> ==========
>>
>> What was the error message here? I have never encountered this problem, although I can imagine how it might theoretically arise due to limits in MySQL.....
> We looked up the error messages in Apache2/logs/error_log, the ones on
> the web - if we got some - were not helpful.
>
> First error (not enough memory) leading to a white really empty (html
> source) page in the browser:
> error_log:
> [Tue Jan 19 12:59:04.577561 2016] [fcgid:warn] [pid 3464:tid
> 140711420450560] [c
> lient 134.96.94.205:34044] mod_fcgid: stderr: PHP Fatal error: Allowed
> memory s
> ize of 134217728 bytes exhausted (tried to allocate 71 bytes) in
> /data2/htdocs/c
> qpweb/lib/cqp.inc.php on line 478, referer:
> https://fedora.clarin-d.uni-saarland
> .de/cqpweb/OBC/index.php?thisQ=manageMetadata&uT=y
>
> Second error after increasing the memory resulted in a standard apache
> error page in the browser (... contact your service admin)
> error_log:
> [Tue Jan 19 16:49:56.767080 2016] [fcgid:warn] [pid 3562:tid
> 140711403665152] [c
> lient 134.96.94.205:34638] mod_fcgid: stderr: PHP Fatal error: Maximum
> executio
> n time of 30 seconds exceeded in
> /data2/htdocs/cqpweb/lib/admin-lib.inc.php on l
> ine 738, referer:
> https://fedora.clarin-d.uni-saarland.de/cqpweb/OBC/index.php?t
> hisQ=manageMetadata&createMetadataFromXml=1&uT=y
>
> Third error after increasing the runtime: either the standard apache
> error page again or the process simply stops in the browser with nothing
> having changed.
> error_log:
> [Wed Jan 27 20:13:35.300911 2016] [fcgid:warn] [pid 11272:tid
> 140361473201920] (
> 104)Connection reset by peer: [client 95.208.248.46:62545] mod_fcgid:
> error read
> ing data from FastCGI server, referer:
> https://fedora.clarin-d.uni-saarland.de/c
> qpweb/OBC/index.php?thisQ=manageMetadata&uT=y
>
> [Wed Jan 27 20:13:35.301033 2016] [core:error] [pid 11272:tid
> 140361473201920] [
> client 95.208.248.46:62545] End of script output before headers:
> execute.php, re
> ferer:
> https://fedora.clarin-d.uni-saarland.de/cqpweb/OBC/index.php?thisQ=manage
> Metadata&uT=y
>> ==========
>> My question is now. What is the problem and what can I do to generate
>> the 'Text begin/end positions'
>> ==========
>>
>> Use the "offline-freqlists" script. Admin manual sec 4.9, page 34.
> Thanks for the hint, I completely forgot about it. Yet we tried it out
> as it says in the manual and got the following error message indicating
> - I assume - that it could not find text_metadata_for the corpus. Yet,
> in the browser the metadata table is set and only one of the frequency
> lists have not been created (either begin/end text or the frequency list
> itself)
> We executed the script in the directory of the respective corpus.
>
> Thanks in advance
> Hannah
>
> Error message for offline-freqlist.php:
>
>
> php ../bin/offline-freqlists.php rsc_v1_17
>
>
> /***********************/
>
>
>
>
> This script runs all the setup for frequency lists for a corpus.
>
> Full debug messages are printed.
>
> Note, if you run this script before setting up the text metdata table,
> things WILL go badly wrong.
>
>
> About to run the function populating corpus CQP positions...
>
> PHP Warning:  Missing argument 1 for populate_corpus_cqp_positions(),
> called in /data2/htdocs/cqpweb/bin/offline-freqlists.php on line 96 and
> defined in /data2/htdocs/cqpweb/lib/admin-lib.inc.php on line 295
> PHP Notice:  Undefined variable: corpus in
> /data2/htdocs/cqpweb/lib/admin-lib.inc.php on line 297
> About to run the following MySQL query:
>       update text_metadata_for_
>               set cqp_begin = 0, cqp_end = 2145
>               where text_id = '101032'
>
>
> A mySQL query did not run successfully!
>
> Original query:
>
> update text_metadata_for_
>               set cqp_begin = 0, cqp_end = 2145
>               where text_id = '101032'
>
>
>
> Error # 1146: Table 'cqpweb.text_metadata_for_' doesn't exist
>
>
>
> PHP debugging backtrace
> =======================
> array(4) {
>     [0]=>
>     array(4) {
>       ["file"]=>
>       string(42) "/data2/htdocs/cqpweb/lib/exiterror.inc.php"
>       ["line"]=>
>       int(289)
>       ["function"]=>
>       string(17) "exiterror_endpage"
>       ["args"]=>
>       array(0) {
>       }
>     }
>     [1]=>
>     array(4) {
>       ["file"]=>
>       string(40) "/data2/htdocs/cqpweb/lib/library.inc.php"
>       ["line"]=>
>       int(234)
>       ["function"]=>
>       string(20) "exiterror_mysqlquery"
>       ["args"]=>
>       array(3) {
>         [0]=>
>         &int(1146)
>         [1]=>
>         &string(47) "Table 'cqpweb.text_metadata_for_' doesn't exist"
>         [2]=>
>         &string(90) "update text_metadata_for_
>               set cqp_begin = 0, cqp_end = 2145
>               where text_id = '101032'"
>       }
>     }
>     [2]=>
>     array(4) {
>       ["file"]=>
>       string(42) "/data2/htdocs/cqpweb/lib/admin-lib.inc.php"
>       ["line"]=>
>       int(321)
>       ["function"]=>
>       string(14) "do_mysql_query"
>       ["args"]=>
>       array(1) {
>         [0]=>
>         &string(90) "update text_metadata_for_
>               set cqp_begin = 0, cqp_end = 2145
>               where text_id = '101032'"
>       }
>     }
>     [3]=>
>     array(4) {
>       ["file"]=>
>       string(46) "/data2/htdocs/cqpweb/bin/offline-freqlists.php"
>       ["line"]=>
>       int(96)
>       ["function"]=>
>       string(29) "populate_corpus_cqp_positions"
>       ["args"]=>
>       array(0) {
>       }
>     }
> }
>
>> best
>>
>> Andrew.
>>
>>
>> -----Original Message-----
>> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
>> Sent: 08 February 2016 14:09
>> To: Open source development of the Corpus WorkBench
>> Subject: [CWB] CQPweb: Problems with Metadata indexing for two different corpora
>>
>> Hi all,
>>
>> we recently had problems with the indexing of Metadata for two different
>> corpora in CQPweb
>>
>> Both corpora are mid-sized corpora 24 Million and 35 Million tokens and
>> have been preencoded successfully in CQP.
>>
>> The first corpus with approx. 24 Million, with almost 700.000 texts, and
>> quite a bit of metadata for each text (categorical and free text).
>> At first I couldn't create any metadata table. When I reduced the amount
>> of metadata per text, I could at least create most of the Metadata table
>> except the 'Text begin/end positions'.
>> We found out that we initially had a memory problem, which we solved by
>> increasing the allowed memory size. Then we had a run-time problem
>> originating from the server, so we increased the run-time here. Now we
>> have a time-out originating from the browser.
>> My question is now. What is the problem and what can I do to generate
>> the 'Text begin/end positions'. I think it is not very useful to
>> increase the time-out of my browser, as I would have to keep the
>> connection open all the time. Is there a possibility to generate the
>> metadata from the command-line on the server?
>>
>> For the second corpus with approx. 35 Million tokens, the 'text
>> begin/end positions' are unproblematic. Here we have a problem
>> generating the 'Frequency talbes'. The corpus has 8 positional
>> attributes, for of which contain floats.
>>
>> Thanks in advance
>> Hannah
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list