[CWB] problems with corpus word count

Benedikt Singpiel benedikt.singpiel at uni-leipzig.de
Thu May 19 13:53:28 CEST 2016


Hello Andrew,

- concerning message 2: your instructions on the manual token reset  
did the trick for the moment. My corpus is up and running, I can now  
work with the collocation tool. Thanks a lot for your great help so far!


- concerning Message 3: incorrect start/end points

These are my query results for the metadata check and the error  
message for the text-size generation:


---
mysql> select text_id, words, cqp_begin, cqp_end from text_metadata_for_pilot;
+-----------+-------+-----------+---------+
| text_id   | words | cqp_begin | cqp_end |
+-----------+-------+-----------+---------+
| SK_LEHR_2 |     1 |         0 |       0 |
| SK_LEHR_3 |     1 |         0 |       0 |
| SK_LEHR_4 |     1 |         0 |       0 |
| SK_LEHR_5 |     1 |         0 |       0 |
+-----------+-------+-----------+---------+

---

user at CQPwebInABox:/$ cd /var/www/html/cqpweb/bin
user at CQPwebInABox:/var/www/html/cqpweb/bin$ php execute-cli.php  
populate_corpus_cqp_positions pilot
CQPweb -- CQP reports errors!
=============================

	<table class="concordtable" width="100%">
		<tr>
			<th class="concordtable">CQPweb encountered an error and could not  
continue.</th>
		</tr>
	CQP sent back these error messages:

**** CQP ERROR ****

CQP Error:

No corpus activated



PHP debugging backtrace
=======================
array(8) {
   [1]=>
   array(2) {
     ["function"]=>
     string(13) "exiterror_cqp"
     ["args"]=>
     array(1) {
       [0]=>
       &array(3) {
         [0]=>
         string(19) "**** CQP ERROR ****"
         [1]=>
         string(10) "CQP Error:"
         [2]=>
         string(19) "No corpus activated"
       }
     }
   }
   [2]=>
   array(4) {
     ["file"]=>
     string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
     ["line"]=>
     int(1021)
     ["function"]=>
     string(14) "call_user_func"
     ["args"]=>
     array(2) {
       [0]=>
       &string(13) "exiterror_cqp"
       [1]=>
       &array(3) {
         [0]=>
         string(19) "**** CQP ERROR ****"
         [1]=>
         string(10) "CQP Error:"
         [2]=>
         string(19) "No corpus activated"
       }
     }
   }
   [3]=>
   array(7) {
     ["file"]=>
     string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
     ["line"]=>
     int(922)
     ["function"]=>
     string(5) "error"
     ["class"]=>
     string(3) "CQP"
     ["object"]=>
     object(CQP)#14 (15) {
       ["process":"CQP":private]=>
       resource(47) of type (process)
       ["handle":"CQP":private]=>
       array(3) {
         [0]=>
         resource(44) of type (stream)
         [1]=>
         resource(45) of type (stream)
         [2]=>
         resource(46) of type (stream)
       }
       ["major_version"]=>
       int(3)
       ["minor_version"]=>
       int(4)
       ["revision_version"]=>
       int(8)
       ["revision_version_flagged_beta":"CQP":private]=>
       bool(false)
       ["compile_date"]=>
       NULL
       ["error_handler":"CQP":private]=>
       string(13) "exiterror_cqp"
       ["status":"CQP":private]=>
       string(5) "error"
       ["error_message"]=>
       array(2) {
         [0]=>
         string(10) "CQP Error:"
         [1]=>
         string(19) "No corpus activated"
       }
       ["progress_handler":"CQP":private]=>
       bool(false)
       ["has_been_disconnected":"CQP":private]=>
       bool(false)
       ["gzip_path":"CQP":private]=>
       string(0) ""
       ["debug_mode":"CQP":private]=>
       bool(false)
       ["corpus_charset":"CQP":private]=>
       int(0)
     }
     ["type"]=>
     string(2) "->"
     ["args"]=>
     array(1) {
       [0]=>
       &array(3) {
         [0]=>
         string(19) "**** CQP ERROR ****"
         [1]=>
         string(10) "CQP Error:"
         [2]=>
         string(19) "No corpus activated"
       }
     }
   }
   [4]=>
   array(7) {
     ["file"]=>
     string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
     ["line"]=>
     int(492)
     ["function"]=>
     string(8) "checkerr"
     ["class"]=>
     string(3) "CQP"
     ["object"]=>
     object(CQP)#14 (15) {
       ["process":"CQP":private]=>
       resource(47) of type (process)
       ["handle":"CQP":private]=>
       array(3) {
         [0]=>
         resource(44) of type (stream)
         [1]=>
         resource(45) of type (stream)
         [2]=>
         resource(46) of type (stream)
       }
       ["major_version"]=>
       int(3)
       ["minor_version"]=>
       int(4)
       ["revision_version"]=>
       int(8)
       ["revision_version_flagged_beta":"CQP":private]=>
       bool(false)
       ["compile_date"]=>
       NULL
       ["error_handler":"CQP":private]=>
       string(13) "exiterror_cqp"
       ["status":"CQP":private]=>
       string(5) "error"
       ["error_message"]=>
       array(2) {
         [0]=>
         string(10) "CQP Error:"
         [1]=>
         string(19) "No corpus activated"
       }
       ["progress_handler":"CQP":private]=>
       bool(false)
       ["has_been_disconnected":"CQP":private]=>
       bool(false)
       ["gzip_path":"CQP":private]=>
       string(0) ""
       ["debug_mode":"CQP":private]=>
       bool(false)
       ["corpus_charset":"CQP":private]=>
       int(0)
     }
     ["type"]=>
     string(2) "->"
     ["args"]=>
     array(0) {
     }
   }
   [5]=>
   array(7) {
     ["file"]=>
     string(42) "/var/www/html/cqpweb/lib/admin-lib.inc.php"
     ["line"]=>
     int(329)
     ["function"]=>
     string(7) "execute"
     ["class"]=>
     string(3) "CQP"
     ["object"]=>
     object(CQP)#14 (15) {
       ["process":"CQP":private]=>
       resource(47) of type (process)
       ["handle":"CQP":private]=>
       array(3) {
         [0]=>
         resource(44) of type (stream)
         [1]=>
         resource(45) of type (stream)
         [2]=>
         resource(46) of type (stream)
       }
       ["major_version"]=>
       int(3)
       ["minor_version"]=>
       int(4)
       ["revision_version"]=>
       int(8)
       ["revision_version_flagged_beta":"CQP":private]=>
       bool(false)
       ["compile_date"]=>
       NULL
       ["error_handler":"CQP":private]=>
       string(13) "exiterror_cqp"
       ["status":"CQP":private]=>
       string(5) "error"
       ["error_message"]=>
       array(2) {
         [0]=>
         string(10) "CQP Error:"
         [1]=>
         string(19) "No corpus activated"
       }
       ["progress_handler":"CQP":private]=>
       bool(false)
       ["has_been_disconnected":"CQP":private]=>
       bool(false)
       ["gzip_path":"CQP":private]=>
       string(0) ""
       ["debug_mode":"CQP":private]=>
       bool(false)
       ["corpus_charset":"CQP":private]=>
       int(0)
     }
     ["type"]=>
     string(2) "->"
     ["args"]=>
     array(1) {
       [0]=>
       &string(28) "A = <text> [] expand to text"
     }
   }
   [6]=>
   array(2) {
     ["function"]=>
     string(29) "populate_corpus_cqp_positions"
     ["args"]=>
     array(1) {
       [0]=>
       &string(5) "pilot"
     }
   }
   [7]=>
   array(4) {
     ["file"]=>
     string(40) "/var/www/html/cqpweb/lib/execute.inc.php"
     ["line"]=>
     int(158)
     ["function"]=>
     string(20) "call_user_func_array"
     ["args"]=>
     array(2) {
       [0]=>
       &string(29) "populate_corpus_cqp_positions"
       [1]=>
       &array(1) {
         [0]=>
         string(5) "pilot"
       }
     }
   }
   [8]=>
   array(4) {
     ["file"]=>
     string(40) "/var/www/html/cqpweb/bin/execute-cli.php"
     ["line"]=>
     int(60)
     ["args"]=>
     array(1) {
       [0]=>
       string(40) "/var/www/html/cqpweb/lib/execute.inc.php"
     }
     ["function"]=>
     string(7) "require"
   }
}

---


Hope this helps to find the bug.

best



Benedikt









> Message: 2
> Date: Thu, 19 May 2016 00:09:59 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> 	<cwb at sslmit.unibo.it>
> Subject: Re: [CWB] problems with corpus word count
> Message-ID:
> 	<28078EC3FBF1B940A3EF3D0D19BE351D7FB3EAF4 at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Benedikt,
>
> The word-counting code has been updated recently. I am not sure, off  
> the top of my head, what version is currently on the VM image. Looks  
> to me like it is a version containing conflicting assumptions  
> resulting in somehow the n of texts being inserted into the n of  
> tokens field.... I'll have to fix that. IT's not something I've seen  
> on my own server or my development machine so I am not 100% sure how  
> it happened.
>
> In the mean time you can patch things manually by running the  
> following SQL statement
>
> update corpus_info set size_texts = NUMBER_GOES_HERE, size_tokens =  
> 458874 where corpus = "PILOT";
>
> and you can fix things for future corproa by running "svn up" within  
> the VM's web-directory for CQPWeb (enable networking to do this).
>
> best
>
> Andrew.
>

> Message: 3
> Date: Thu, 19 May 2016 00:22:09 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> 	<cwb at sslmit.unibo.it>
> Subject: Re: [CWB] problems with corpus word count
> Message-ID:
> 	<28078EC3FBF1B940A3EF3D0D19BE351D7FB3EB1D at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> Oh, one further note: having checked the code, a possible cause of  
> this bug is that the text metadata table contains incorrect  
> start/end points for the texts. You can check this with the  
> following query:
>
> select text_id, words, cqp_begin, cqp_end from text_metadata_for_PILOT;
>
> If the three numeric columns contain zero, that explains your problem.
>
> The cause of this would be failure to get accurate text-size mysq
> information from CQP. To work out why *that* is, I'd need to see the  
> errors from running the " Generate CWB text-position records "  
> process - which is the first step in frequency list setup - and  
> which you can re-run on its own by going to the CQPweb web-root and  
> then typing:
>
> cd bin
> php execute-cli.php populate_corpus_cqp_positions PILOT
>
> and see what error message you get.
>
> One possible source of error is that it looks like you've used an  
> all-upper corpus name ie "PILOT" not "pilot"... this may interact  
> badly with the way the CWB registry works, which in turn could have  
> caused the problem. Possibly.
>
> best
>
> Andrew.








More information about the CWB mailing list