[CWB] problems with corpus word count

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu May 19 16:21:06 CEST 2016


I now very strongly suspect it's a case issue. Can you do the following:

select corpus, cqp_name from corpus_info;

what I'd expect to see is "pilot" in the first column and "PILOT" in the second (the result of indexing with "pilot" as the corpus name. If you don't see that - then it may be that your error was using an uppercase corpus name....

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Benedikt Singpiel
Sent: 19 May 2016 12:53
To: cwb at sslmit.unibo.it
Subject: [CWB] problems with corpus word count

Hello Andrew,

- concerning message 2: your instructions on the manual token reset did the trick for the moment. My corpus is up and running, I can now work with the collocation tool. Thanks a lot for your great help so far!


- concerning Message 3: incorrect start/end points

These are my query results for the metadata check and the error message for the text-size generation:


---
mysql> select text_id, words, cqp_begin, cqp_end from 
mysql> text_metadata_for_pilot;
+-----------+-------+-----------+---------+
| text_id   | words | cqp_begin | cqp_end |
+-----------+-------+-----------+---------+
| SK_LEHR_2 |     1 |         0 |       0 |
| SK_LEHR_3 |     1 |         0 |       0 |
| SK_LEHR_4 |     1 |         0 |       0 |
| SK_LEHR_5 |     1 |         0 |       0 |
+-----------+-------+-----------+---------+

---

user at CQPwebInABox:/$ cd /var/www/html/cqpweb/bin user at CQPwebInABox:/var/www/html/cqpweb/bin$ php execute-cli.php populate_corpus_cqp_positions pilot CQPweb -- CQP reports errors!
=============================

	<table class="concordtable" width="100%">
		<tr>
			<th class="concordtable">CQPweb encountered an error and could not continue.</th>
		</tr>
	CQP sent back these error messages:

**** CQP ERROR ****

CQP Error:

No corpus activated



PHP debugging backtrace
=======================
array(8) {
   [1]=>
   array(2) {
     ["function"]=>
     string(13) "exiterror_cqp"
     ["args"]=>
     array(1) {
       [0]=>
       &array(3) {
         [0]=>
         string(19) "**** CQP ERROR ****"
         [1]=>
         string(10) "CQP Error:"
         [2]=>
         string(19) "No corpus activated"
       }
     }
   }
   [2]=>
   array(4) {
     ["file"]=>
     string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
     ["line"]=>
     int(1021)
     ["function"]=>
     string(14) "call_user_func"
     ["args"]=>
     array(2) {
       [0]=>
       &string(13) "exiterror_cqp"
       [1]=>
       &array(3) {
         [0]=>
         string(19) "**** CQP ERROR ****"
         [1]=>
         string(10) "CQP Error:"
         [2]=>
         string(19) "No corpus activated"
       }
     }
   }
   [3]=>
   array(7) {
     ["file"]=>
     string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
     ["line"]=>
     int(922)
     ["function"]=>
     string(5) "error"
     ["class"]=>
     string(3) "CQP"
     ["object"]=>
     object(CQP)#14 (15) {
       ["process":"CQP":private]=>
       resource(47) of type (process)
       ["handle":"CQP":private]=>
       array(3) {
         [0]=>
         resource(44) of type (stream)
         [1]=>
         resource(45) of type (stream)
         [2]=>
         resource(46) of type (stream)
       }
       ["major_version"]=>
       int(3)
       ["minor_version"]=>
       int(4)
       ["revision_version"]=>
       int(8)
       ["revision_version_flagged_beta":"CQP":private]=>
       bool(false)
       ["compile_date"]=>
       NULL
       ["error_handler":"CQP":private]=>
       string(13) "exiterror_cqp"
       ["status":"CQP":private]=>
       string(5) "error"
       ["error_message"]=>
       array(2) {
         [0]=>
         string(10) "CQP Error:"
         [1]=>
         string(19) "No corpus activated"
       }
       ["progress_handler":"CQP":private]=>
       bool(false)
       ["has_been_disconnected":"CQP":private]=>
       bool(false)
       ["gzip_path":"CQP":private]=>
       string(0) ""
       ["debug_mode":"CQP":private]=>
       bool(false)
       ["corpus_charset":"CQP":private]=>
       int(0)
     }
     ["type"]=>
     string(2) "->"
     ["args"]=>
     array(1) {
       [0]=>
       &array(3) {
         [0]=>
         string(19) "**** CQP ERROR ****"
         [1]=>
         string(10) "CQP Error:"
         [2]=>
         string(19) "No corpus activated"
       }
     }
   }
   [4]=>
   array(7) {
     ["file"]=>
     string(36) "/var/www/html/cqpweb/lib/cqp.inc.php"
     ["line"]=>
     int(492)
     ["function"]=>
     string(8) "checkerr"
     ["class"]=>
     string(3) "CQP"
     ["object"]=>
     object(CQP)#14 (15) {
       ["process":"CQP":private]=>
       resource(47) of type (process)
       ["handle":"CQP":private]=>
       array(3) {
         [0]=>
         resource(44) of type (stream)
         [1]=>
         resource(45) of type (stream)
         [2]=>
         resource(46) of type (stream)
       }
       ["major_version"]=>
       int(3)
       ["minor_version"]=>
       int(4)
       ["revision_version"]=>
       int(8)
       ["revision_version_flagged_beta":"CQP":private]=>
       bool(false)
       ["compile_date"]=>
       NULL
       ["error_handler":"CQP":private]=>
       string(13) "exiterror_cqp"
       ["status":"CQP":private]=>
       string(5) "error"
       ["error_message"]=>
       array(2) {
         [0]=>
         string(10) "CQP Error:"
         [1]=>
         string(19) "No corpus activated"
       }
       ["progress_handler":"CQP":private]=>
       bool(false)
       ["has_been_disconnected":"CQP":private]=>
       bool(false)
       ["gzip_path":"CQP":private]=>
       string(0) ""
       ["debug_mode":"CQP":private]=>
       bool(false)
       ["corpus_charset":"CQP":private]=>
       int(0)
     }
     ["type"]=>
     string(2) "->"
     ["args"]=>
     array(0) {
     }
   }
   [5]=>
   array(7) {
     ["file"]=>
     string(42) "/var/www/html/cqpweb/lib/admin-lib.inc.php"
     ["line"]=>
     int(329)
     ["function"]=>
     string(7) "execute"
     ["class"]=>
     string(3) "CQP"
     ["object"]=>
     object(CQP)#14 (15) {
       ["process":"CQP":private]=>
       resource(47) of type (process)
       ["handle":"CQP":private]=>
       array(3) {
         [0]=>
         resource(44) of type (stream)
         [1]=>
         resource(45) of type (stream)
         [2]=>
         resource(46) of type (stream)
       }
       ["major_version"]=>
       int(3)
       ["minor_version"]=>
       int(4)
       ["revision_version"]=>
       int(8)
       ["revision_version_flagged_beta":"CQP":private]=>
       bool(false)
       ["compile_date"]=>
       NULL
       ["error_handler":"CQP":private]=>
       string(13) "exiterror_cqp"
       ["status":"CQP":private]=>
       string(5) "error"
       ["error_message"]=>
       array(2) {
         [0]=>
         string(10) "CQP Error:"
         [1]=>
         string(19) "No corpus activated"
       }
       ["progress_handler":"CQP":private]=>
       bool(false)
       ["has_been_disconnected":"CQP":private]=>
       bool(false)
       ["gzip_path":"CQP":private]=>
       string(0) ""
       ["debug_mode":"CQP":private]=>
       bool(false)
       ["corpus_charset":"CQP":private]=>
       int(0)
     }
     ["type"]=>
     string(2) "->"
     ["args"]=>
     array(1) {
       [0]=>
       &string(28) "A = <text> [] expand to text"
     }
   }
   [6]=>
   array(2) {
     ["function"]=>
     string(29) "populate_corpus_cqp_positions"
     ["args"]=>
     array(1) {
       [0]=>
       &string(5) "pilot"
     }
   }
   [7]=>
   array(4) {
     ["file"]=>
     string(40) "/var/www/html/cqpweb/lib/execute.inc.php"
     ["line"]=>
     int(158)
     ["function"]=>
     string(20) "call_user_func_array"
     ["args"]=>
     array(2) {
       [0]=>
       &string(29) "populate_corpus_cqp_positions"
       [1]=>
       &array(1) {
         [0]=>
         string(5) "pilot"
       }
     }
   }
   [8]=>
   array(4) {
     ["file"]=>
     string(40) "/var/www/html/cqpweb/bin/execute-cli.php"
     ["line"]=>
     int(60)
     ["args"]=>
     array(1) {
       [0]=>
       string(40) "/var/www/html/cqpweb/lib/execute.inc.php"
     }
     ["function"]=>
     string(7) "require"
   }
}

---


Hope this helps to find the bug.

best



Benedikt









> Message: 2
> Date: Thu, 19 May 2016 00:09:59 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> 	<cwb at sslmit.unibo.it>
> Subject: Re: [CWB] problems with corpus word count
> Message-ID:
> 	<28078EC3FBF1B940A3EF3D0D19BE351D7FB3EAF4 at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Benedikt,
>
> The word-counting code has been updated recently. I am not sure, off 
> the top of my head, what version is currently on the VM image. Looks 
> to me like it is a version containing conflicting assumptions 
> resulting in somehow the n of texts being inserted into the n of 
> tokens field.... I'll have to fix that. IT's not something I've seen 
> on my own server or my development machine so I am not 100% sure how 
> it happened.
>
> In the mean time you can patch things manually by running the 
> following SQL statement
>
> update corpus_info set size_texts = NUMBER_GOES_HERE, size_tokens =
> 458874 where corpus = "PILOT";
>
> and you can fix things for future corproa by running "svn up" within 
> the VM's web-directory for CQPWeb (enable networking to do this).
>
> best
>
> Andrew.
>

> Message: 3
> Date: Thu, 19 May 2016 00:22:09 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> 	<cwb at sslmit.unibo.it>
> Subject: Re: [CWB] problems with corpus word count
> Message-ID:
> 	<28078EC3FBF1B940A3EF3D0D19BE351D7FB3EB1D at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> Oh, one further note: having checked the code, a possible cause of 
> this bug is that the text metadata table contains incorrect start/end 
> points for the texts. You can check this with the following query:
>
> select text_id, words, cqp_begin, cqp_end from 
> text_metadata_for_PILOT;
>
> If the three numeric columns contain zero, that explains your problem.
>
> The cause of this would be failure to get accurate text-size mysq 
> information from CQP. To work out why *that* is, I'd need to see the 
> errors from running the " Generate CWB text-position records "
> process - which is the first step in frequency list setup - and which 
> you can re-run on its own by going to the CQPweb web-root and then 
> typing:
>
> cd bin
> php execute-cli.php populate_corpus_cqp_positions PILOT
>
> and see what error message you get.
>
> One possible source of error is that it looks like you've used an 
> all-upper corpus name ie "PILOT" not "pilot"... this may interact 
> badly with the way the CWB registry works, which in turn could have 
> caused the problem. Possibly.
>
> best
>
> Andrew.






_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list