[CWB] problems with corpus word count
Benedikt Singpiel
benedikt.singpiel at uni-leipzig.de
Thu May 19 13:53:28 CEST 2016
Hello Andrew,
- concerning message 2: your instructions on the manual token reset
did the trick for the moment. My corpus is up and running, I can now
work with the collocation tool. Thanks a lot for your great help so far!
- concerning Message 3: incorrect start/end points
These are my query results for the metadata check and the error
message for the text-size generation:
---
mysql> select text_id, words, cqp_begin, cqp_end from text_metadata_for_pilot;
+-----------+-------+-----------+---------+
| text_id | words | cqp_begin | cqp_end |
+-----------+-------+-----------+---------+
| SK_LEHR_2 | 1 | 0 | 0 |
| SK_LEHR_3 | 1 | 0 | 0 |
| SK_LEHR_4 | 1 | 0 | 0 |
| SK_LEHR_5 | 1 | 0 | 0 |
+-----------+-------+-----------+---------+
---
user at CQPwebInABox:/$ cd /var/www/html/cqpweb/bin
user at CQPwebInABox:/var/www/html/cqpweb/bin$ php execute-cli.php
populate_corpus_cqp_positions pilot
CQPweb -- CQP reports errors!
=============================
<table class="concordtable" width="100%">
<tr>
<th class="concordtable">CQPweb encountered an error and could not
continue.</th>
</tr>
CQP sent back these error messages:
**** CQP ERROR ****
CQP Error:
No corpus activated
PHP debugging backtrace
=======================
array(8) {
[1]=>
array(2) {
["function"]=>
string(13) "exiterror_cqp"
["args"]=>
array(1) {
[0]=>
&array(3) {
[0]=>
string(19) "**** CQP ERROR ****"
[1]=>
string(10) "CQP Error:"
[2]=>
string(19) "No corpus activated"
}
}
}
[2]=>
array(4) {
["file"]=>
string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
["line"]=>
int(1021)
["function"]=>
string(14) "call_user_func"
["args"]=>
array(2) {
[0]=>
&string(13) "exiterror_cqp"
[1]=>
&array(3) {
[0]=>
string(19) "**** CQP ERROR ****"
[1]=>
string(10) "CQP Error:"
[2]=>
string(19) "No corpus activated"
}
}
}
[3]=>
array(7) {
["file"]=>
string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
["line"]=>
int(922)
["function"]=>
string(5) "error"
["class"]=>
string(3) "CQP"
["object"]=>
object(CQP)#14 (15) {
["process":"CQP":private]=>
resource(47) of type (process)
["handle":"CQP":private]=>
array(3) {
[0]=>
resource(44) of type (stream)
[1]=>
resource(45) of type (stream)
[2]=>
resource(46) of type (stream)
}
["major_version"]=>
int(3)
["minor_version"]=>
int(4)
["revision_version"]=>
int(8)
["revision_version_flagged_beta":"CQP":private]=>
bool(false)
["compile_date"]=>
NULL
["error_handler":"CQP":private]=>
string(13) "exiterror_cqp"
["status":"CQP":private]=>
string(5) "error"
["error_message"]=>
array(2) {
[0]=>
string(10) "CQP Error:"
[1]=>
string(19) "No corpus activated"
}
["progress_handler":"CQP":private]=>
bool(false)
["has_been_disconnected":"CQP":private]=>
bool(false)
["gzip_path":"CQP":private]=>
string(0) ""
["debug_mode":"CQP":private]=>
bool(false)
["corpus_charset":"CQP":private]=>
int(0)
}
["type"]=>
string(2) "->"
["args"]=>
array(1) {
[0]=>
&array(3) {
[0]=>
string(19) "**** CQP ERROR ****"
[1]=>
string(10) "CQP Error:"
[2]=>
string(19) "No corpus activated"
}
}
}
[4]=>
array(7) {
["file"]=>
string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
["line"]=>
int(492)
["function"]=>
string(8) "checkerr"
["class"]=>
string(3) "CQP"
["object"]=>
object(CQP)#14 (15) {
["process":"CQP":private]=>
resource(47) of type (process)
["handle":"CQP":private]=>
array(3) {
[0]=>
resource(44) of type (stream)
[1]=>
resource(45) of type (stream)
[2]=>
resource(46) of type (stream)
}
["major_version"]=>
int(3)
["minor_version"]=>
int(4)
["revision_version"]=>
int(8)
["revision_version_flagged_beta":"CQP":private]=>
bool(false)
["compile_date"]=>
NULL
["error_handler":"CQP":private]=>
string(13) "exiterror_cqp"
["status":"CQP":private]=>
string(5) "error"
["error_message"]=>
array(2) {
[0]=>
string(10) "CQP Error:"
[1]=>
string(19) "No corpus activated"
}
["progress_handler":"CQP":private]=>
bool(false)
["has_been_disconnected":"CQP":private]=>
bool(false)
["gzip_path":"CQP":private]=>
string(0) ""
["debug_mode":"CQP":private]=>
bool(false)
["corpus_charset":"CQP":private]=>
int(0)
}
["type"]=>
string(2) "->"
["args"]=>
array(0) {
}
}
[5]=>
array(7) {
["file"]=>
string(42) "/var/www/html/cqpweb/lib/admin-lib.inc.php"
["line"]=>
int(329)
["function"]=>
string(7) "execute"
["class"]=>
string(3) "CQP"
["object"]=>
object(CQP)#14 (15) {
["process":"CQP":private]=>
resource(47) of type (process)
["handle":"CQP":private]=>
array(3) {
[0]=>
resource(44) of type (stream)
[1]=>
resource(45) of type (stream)
[2]=>
resource(46) of type (stream)
}
["major_version"]=>
int(3)
["minor_version"]=>
int(4)
["revision_version"]=>
int(8)
["revision_version_flagged_beta":"CQP":private]=>
bool(false)
["compile_date"]=>
NULL
["error_handler":"CQP":private]=>
string(13) "exiterror_cqp"
["status":"CQP":private]=>
string(5) "error"
["error_message"]=>
array(2) {
[0]=>
string(10) "CQP Error:"
[1]=>
string(19) "No corpus activated"
}
["progress_handler":"CQP":private]=>
bool(false)
["has_been_disconnected":"CQP":private]=>
bool(false)
["gzip_path":"CQP":private]=>
string(0) ""
["debug_mode":"CQP":private]=>
bool(false)
["corpus_charset":"CQP":private]=>
int(0)
}
["type"]=>
string(2) "->"
["args"]=>
array(1) {
[0]=>
&string(28) "A = <text> [] expand to text"
}
}
[6]=>
array(2) {
["function"]=>
string(29) "populate_corpus_cqp_positions"
["args"]=>
array(1) {
[0]=>
&string(5) "pilot"
}
}
[7]=>
array(4) {
["file"]=>
string(40) "/var/www/html/cqpweb/lib/execute.inc.php"
["line"]=>
int(158)
["function"]=>
string(20) "call_user_func_array"
["args"]=>
array(2) {
[0]=>
&string(29) "populate_corpus_cqp_positions"
[1]=>
&array(1) {
[0]=>
string(5) "pilot"
}
}
}
[8]=>
array(4) {
["file"]=>
string(40) "/var/www/html/cqpweb/bin/execute-cli.php"
["line"]=>
int(60)
["args"]=>
array(1) {
[0]=>
string(40) "/var/www/html/cqpweb/lib/execute.inc.php"
}
["function"]=>
string(7) "require"
}
}
---
Hope this helps to find the bug.
best
Benedikt
> Message: 2
> Date: Thu, 19 May 2016 00:09:59 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] problems with corpus word count
> Message-ID:
> <28078EC3FBF1B940A3EF3D0D19BE351D7FB3EAF4 at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Benedikt,
>
> The word-counting code has been updated recently. I am not sure, off
> the top of my head, what version is currently on the VM image. Looks
> to me like it is a version containing conflicting assumptions
> resulting in somehow the n of texts being inserted into the n of
> tokens field.... I'll have to fix that. IT's not something I've seen
> on my own server or my development machine so I am not 100% sure how
> it happened.
>
> In the mean time you can patch things manually by running the
> following SQL statement
>
> update corpus_info set size_texts = NUMBER_GOES_HERE, size_tokens =
> 458874 where corpus = "PILOT";
>
> and you can fix things for future corproa by running "svn up" within
> the VM's web-directory for CQPWeb (enable networking to do this).
>
> best
>
> Andrew.
>
> Message: 3
> Date: Thu, 19 May 2016 00:22:09 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] problems with corpus word count
> Message-ID:
> <28078EC3FBF1B940A3EF3D0D19BE351D7FB3EB1D at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> Oh, one further note: having checked the code, a possible cause of
> this bug is that the text metadata table contains incorrect
> start/end points for the texts. You can check this with the
> following query:
>
> select text_id, words, cqp_begin, cqp_end from text_metadata_for_PILOT;
>
> If the three numeric columns contain zero, that explains your problem.
>
> The cause of this would be failure to get accurate text-size mysq
> information from CQP. To work out why *that* is, I'd need to see the
> errors from running the " Generate CWB text-position records "
> process - which is the first step in frequency list setup - and
> which you can re-run on its own by going to the CQPweb web-root and
> then typing:
>
> cd bin
> php execute-cli.php populate_corpus_cqp_positions PILOT
>
> and see what error message you get.
>
> One possible source of error is that it looks like you've used an
> all-upper corpus name ie "PILOT" not "pilot"... this may interact
> badly with the way the CWB registry works, which in turn could have
> caused the problem. Possibly.
>
> best
>
> Andrew.
More information about the CWB
mailing list