[CWB] CQPweb: Problems with Metadata indexing for two different corpora

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Feb 11 11:04:20 CET 2016


The 3.2.6 branch is currently usable. The trunk (which has 3.2.7 in it) is undergoing a major rewrite and is severely broken!

I am aiming to have it unbroken in the next 3 days.

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
Sent: 11 February 2016 10:03
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] CQPweb: Problems with Metadata indexing for two different corpora

Hi Andrew,

thanks for the help, then the solution would be to update to 3.2.7.
Is that a stable version? I remember you reported problems with updating 
to one of the versions, but can't remember which one you said would be save.

Thank you again
Hannah

Am 11.02.2016 um 10:57 schrieb Hardie, Andrew:
> Alas I can't help with bugs in old versions !
>
> I have put a fix for the timeout issues you've encountered in v. 3.2.7.
>
> Andrew.
>
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
> Sent: 11 February 2016 09:53
> To: Open source development of the Corpus WorkBench
> Subject: Re: [CWB] CQPweb: Problems with Metadata indexing for two different corpora
>
> Hi Andrew,
>
> Am 11.02.2016 um 09:51 schrieb Hardie, Andrew:
>> Hi Hannah,
>>
>> Re the memory errors, I'll have a think about those. RE the error with offline freqlists,
>>
>>>>> called in /data2/htdocs/cqpweb/bin/offline-freqlists.php on line 96 and
>> The call for that function has not been on line 96 for some time. what version are you running? It looks like 3.1.16.
> its version 3.1.4, we haven't updated for a while, because we couldn't
> find the time and were a bit afraid too, because we lost our complete
> users the last time we updated.
>
> Best
> Hannah
>> best
>>
>> Andrew.
>>
>> -----Original Message-----
>> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
>> Sent: 11 February 2016 07:31
>> To: Open source development of the Corpus WorkBench
>> Subject: Re: [CWB] CQPweb: Problems with Metadata indexing for two different corpora
>>
>> Hi Andrew
>>
>> Am 10.02.2016 um 14:53 schrieb Hardie, Andrew:
>>> Hi Hannha,
>>>
>>> ==========
>>> At first I couldn't create any metadata table. When I reduced the amount
>>> of metadata per text, I could at least create most of the Metadata table
>>> ==========
>>>
>>> What was the error message here? I have never encountered this problem, although I can imagine how it might theoretically arise due to limits in MySQL.....
>> We looked up the error messages in Apache2/logs/error_log, the ones on
>> the web - if we got some - were not helpful.
>>
>> First error (not enough memory) leading to a white really empty (html
>> source) page in the browser:
>> error_log:
>> [Tue Jan 19 12:59:04.577561 2016] [fcgid:warn] [pid 3464:tid
>> 140711420450560] [c
>> lient 134.96.94.205:34044] mod_fcgid: stderr: PHP Fatal error: Allowed
>> memory s
>> ize of 134217728 bytes exhausted (tried to allocate 71 bytes) in
>> /data2/htdocs/c
>> qpweb/lib/cqp.inc.php on line 478, referer:
>> https://fedora.clarin-d.uni-saarland
>> .de/cqpweb/OBC/index.php?thisQ=manageMetadata&uT=y
>>
>> Second error after increasing the memory resulted in a standard apache
>> error page in the browser (... contact your service admin)
>> error_log:
>> [Tue Jan 19 16:49:56.767080 2016] [fcgid:warn] [pid 3562:tid
>> 140711403665152] [c
>> lient 134.96.94.205:34638] mod_fcgid: stderr: PHP Fatal error: Maximum
>> executio
>> n time of 30 seconds exceeded in
>> /data2/htdocs/cqpweb/lib/admin-lib.inc.php on l
>> ine 738, referer:
>> https://fedora.clarin-d.uni-saarland.de/cqpweb/OBC/index.php?t
>> hisQ=manageMetadata&createMetadataFromXml=1&uT=y
>>
>> Third error after increasing the runtime: either the standard apache
>> error page again or the process simply stops in the browser with nothing
>> having changed.
>> error_log:
>> [Wed Jan 27 20:13:35.300911 2016] [fcgid:warn] [pid 11272:tid
>> 140361473201920] (
>> 104)Connection reset by peer: [client 95.208.248.46:62545] mod_fcgid:
>> error read
>> ing data from FastCGI server, referer:
>> https://fedora.clarin-d.uni-saarland.de/c
>> qpweb/OBC/index.php?thisQ=manageMetadata&uT=y
>>
>> [Wed Jan 27 20:13:35.301033 2016] [core:error] [pid 11272:tid
>> 140361473201920] [
>> client 95.208.248.46:62545] End of script output before headers:
>> execute.php, re
>> ferer:
>> https://fedora.clarin-d.uni-saarland.de/cqpweb/OBC/index.php?thisQ=manage
>> Metadata&uT=y
>>> ==========
>>> My question is now. What is the problem and what can I do to generate
>>> the 'Text begin/end positions'
>>> ==========
>>>
>>> Use the "offline-freqlists" script. Admin manual sec 4.9, page 34.
>> Thanks for the hint, I completely forgot about it. Yet we tried it out
>> as it says in the manual and got the following error message indicating
>> - I assume - that it could not find text_metadata_for the corpus. Yet,
>> in the browser the metadata table is set and only one of the frequency
>> lists have not been created (either begin/end text or the frequency list
>> itself)
>> We executed the script in the directory of the respective corpus.
>>
>> Thanks in advance
>> Hannah
>>
>> Error message for offline-freqlist.php:
>>
>>
>> php ../bin/offline-freqlists.php rsc_v1_17
>>
>>
>> /***********************/
>>
>>
>>
>>
>> This script runs all the setup for frequency lists for a corpus.
>>
>> Full debug messages are printed.
>>
>> Note, if you run this script before setting up the text metdata table,
>> things WILL go badly wrong.
>>
>>
>> About to run the function populating corpus CQP positions...
>>
>> PHP Warning:  Missing argument 1 for populate_corpus_cqp_positions(),
>> called in /data2/htdocs/cqpweb/bin/offline-freqlists.php on line 96 and
>> defined in /data2/htdocs/cqpweb/lib/admin-lib.inc.php on line 295
>> PHP Notice:  Undefined variable: corpus in
>> /data2/htdocs/cqpweb/lib/admin-lib.inc.php on line 297
>> About to run the following MySQL query:
>>        update text_metadata_for_
>>                set cqp_begin = 0, cqp_end = 2145
>>                where text_id = '101032'
>>
>>
>> A mySQL query did not run successfully!
>>
>> Original query:
>>
>> update text_metadata_for_
>>                set cqp_begin = 0, cqp_end = 2145
>>                where text_id = '101032'
>>
>>
>>
>> Error # 1146: Table 'cqpweb.text_metadata_for_' doesn't exist
>>
>>
>>
>> PHP debugging backtrace
>> =======================
>> array(4) {
>>      [0]=>
>>      array(4) {
>>        ["file"]=>
>>        string(42) "/data2/htdocs/cqpweb/lib/exiterror.inc.php"
>>        ["line"]=>
>>        int(289)
>>        ["function"]=>
>>        string(17) "exiterror_endpage"
>>        ["args"]=>
>>        array(0) {
>>        }
>>      }
>>      [1]=>
>>      array(4) {
>>        ["file"]=>
>>        string(40) "/data2/htdocs/cqpweb/lib/library.inc.php"
>>        ["line"]=>
>>        int(234)
>>        ["function"]=>
>>        string(20) "exiterror_mysqlquery"
>>        ["args"]=>
>>        array(3) {
>>          [0]=>
>>          &int(1146)
>>          [1]=>
>>          &string(47) "Table 'cqpweb.text_metadata_for_' doesn't exist"
>>          [2]=>
>>          &string(90) "update text_metadata_for_
>>                set cqp_begin = 0, cqp_end = 2145
>>                where text_id = '101032'"
>>        }
>>      }
>>      [2]=>
>>      array(4) {
>>        ["file"]=>
>>        string(42) "/data2/htdocs/cqpweb/lib/admin-lib.inc.php"
>>        ["line"]=>
>>        int(321)
>>        ["function"]=>
>>        string(14) "do_mysql_query"
>>        ["args"]=>
>>        array(1) {
>>          [0]=>
>>          &string(90) "update text_metadata_for_
>>                set cqp_begin = 0, cqp_end = 2145
>>                where text_id = '101032'"
>>        }
>>      }
>>      [3]=>
>>      array(4) {
>>        ["file"]=>
>>        string(46) "/data2/htdocs/cqpweb/bin/offline-freqlists.php"
>>        ["line"]=>
>>        int(96)
>>        ["function"]=>
>>        string(29) "populate_corpus_cqp_positions"
>>        ["args"]=>
>>        array(0) {
>>        }
>>      }
>> }
>>
>>> best
>>>
>>> Andrew.
>>>
>>>
>>> -----Original Message-----
>>> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
>>> Sent: 08 February 2016 14:09
>>> To: Open source development of the Corpus WorkBench
>>> Subject: [CWB] CQPweb: Problems with Metadata indexing for two different corpora
>>>
>>> Hi all,
>>>
>>> we recently had problems with the indexing of Metadata for two different
>>> corpora in CQPweb
>>>
>>> Both corpora are mid-sized corpora 24 Million and 35 Million tokens and
>>> have been preencoded successfully in CQP.
>>>
>>> The first corpus with approx. 24 Million, with almost 700.000 texts, and
>>> quite a bit of metadata for each text (categorical and free text).
>>> At first I couldn't create any metadata table. When I reduced the amount
>>> of metadata per text, I could at least create most of the Metadata table
>>> except the 'Text begin/end positions'.
>>> We found out that we initially had a memory problem, which we solved by
>>> increasing the allowed memory size. Then we had a run-time problem
>>> originating from the server, so we increased the run-time here. Now we
>>> have a time-out originating from the browser.
>>> My question is now. What is the problem and what can I do to generate
>>> the 'Text begin/end positions'. I think it is not very useful to
>>> increase the time-out of my browser, as I would have to keep the
>>> connection open all the time. Is there a possibility to generate the
>>> metadata from the command-line on the server?
>>>
>>> For the second corpus with approx. 35 Million tokens, the 'text
>>> begin/end positions' are unproblematic. Here we have a problem
>>> generating the 'Frequency talbes'. The corpus has 8 positional
>>> attributes, for of which contain floats.
>>>
>>> Thanks in advance
>>> Hannah
>>> _______________________________________________
>>> CWB mailing list
>>> CWB at sslmit.unibo.it
>>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>>> _______________________________________________
>>> CWB mailing list
>>> CWB at sslmit.unibo.it
>>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list