[CWB] problems with corpus word count
Hardie, Andrew
a.hardie at lancaster.ac.uk
Thu May 19 02:22:09 CEST 2016
Oh, one further note: having checked the code, a possible cause of this bug is that the text metadata table contains incorrect start/end points for the texts. You can check this with the following query:
select text_id, words, cqp_begin, cqp_end from text_metadata_for_PILOT;
If the three numeric columns contain zero, that explains your problem.
The cause of this would be failure to get accurate text-size information from CQP. To work out why *that* is, I'd need to see the errors from running the " Generate CWB text-position records " process - which is the first step in frequency list setup - and which you can re-run on its own by going to the CQPweb web-root and then typing:
cd bin
php execute-cli.php populate_corpus_cqp_positions PILOT
and see what error message you get.
One possible source of error is that it looks like you've used an all-upper corpus name ie "PILOT" not "pilot"... this may interact badly with the way the CWB registry works, which in turn could have caused the problem. Possibly.
best
Andrew.
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hardie, Andrew
Sent: 19 May 2016 01:10
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] problems with corpus word count
Hi Benedikt,
The word-counting code has been updated recently. I am not sure, off the top of my head, what version is currently on the VM image. Looks to me like it is a version containing conflicting assumptions resulting in somehow the n of texts being inserted into the n of tokens field.... I'll have to fix that. IT's not something I've seen on my own server or my development machine so I am not 100% sure how it happened.
In the mean time you can patch things manually by running the following SQL statement
update corpus_info set size_texts = NUMBER_GOES_HERE, size_tokens = 458874 where corpus = "PILOT";
and you can fix things for future corproa by running "svn up" within the VM's web-directory for CQPWeb (enable networking to do this).
best
Andrew.
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Benedikt Singpiel
Sent: 18 May 2016 11:58
To: cwb at sslmit.unibo.it
Subject: [CWB] problems with corpus word count
Hello Andrew,
Now I got it! Here are my values:
mysql> select count(*) from freq_corpus_pilot_word;
+----------+
| count(*) |
+----------+
| 37250 |
+----------+
1 row in set (0.14 sec)
mysql> select sum(freq) from freq_corpus_pilot_word;
+-----------+
| sum(freq) |
+-----------+
| 458874 |
+-----------+
1 row in set (0.07 sec)
mysql> select size_types, size_tokens, size_texts from corpus_info
where corpus = "PILOT";
+------------+-------------+------------+
| size_types | size_tokens | size_texts |
+------------+-------------+------------+
| 37250 | 4 | 4 |
+------------+-------------+------------+
Frequency tables seem complete, corpus log isn't.
best
Benedikt
> ------------------------------
>
> Message: 3
> Date: Tue, 17 May 2016 12:16:11 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] problems with lemma query and corpus word count
> Message-ID:
> <28078EC3FBF1B940A3EF3D0D19BE351D7FB3DA97 at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> You have to select the database first.
>
> http://dev.mysql.com/doc/refman/5.7/en/use.html
>
> best
>
> Andrew.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it
> [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Benedikt Singpiel
> Sent: 17 May 2016 13:15
> To: cwb at sslmit.unibo.it
> Subject: [CWB] problems with lemma query and corpus word count
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list