[CWB] CQPweb offline-freqlists.php problems

Hardie, Andrew a.hardie at lancaster.ac.uk
Wed Apr 6 12:03:51 CEST 2022


Hi Peter,

Thanks for the bug report. I've just updated this script in SVN - it had fallen out of sync with the main code (largely cos I've not had cause to use it in a while ...) so as to fix all the bugs you report except the segfault and the creation of freqlists even when not wanted for an annotation. But I hope those were side-effects of earlier bugs. So, there is a chance they will be fixed also. If not - let me know.

(btw that instance of cwb-makeall was running after a segfault hit either cwb-encode or cwb-decode - so it can safely be killed)

best

Andrew.

From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Uhrig, Peter
Sent: 04 April 2022 21:14
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: [CWB] CQPweb offline-freqlists.php problems

Dear all,

The script offline-freqlists.php causes me some loss of sleep:

First, it throws the following error in my setup:
PHP Fatal error:  Uncaught TypeError: Argument 1 passed to drop_unneeded_corpus_freqtable_components() must be of the type int, string given, called in /var/www/html/web/bin/offline-freqlists.php on line 137 and defined in /var/www/html/web/lib/freqtable-lib.php:345
Stack trace:
#0 /var/www/html/web/bin/offline-freqlists.php(137): drop_unneeded_corpus_freqtable_components()
#1 {main}
  thrown in /var/www/html/web/lib/freqtable-lib.php on line 345

This seems to be a bug in the code because the function drop_unneeded_corpus_freqtable_components really requires a corpus_id, but is called with a corpus_name from offline-freqlists.php:

drop_unneeded_corpus_freqtable_components($corpus);

I have thus replaced the line with

$corpus_id = corpus_name_to_id($corpus);
drop_unneeded_corpus_freqtable_components($corpus_id);

This means it continued past the previous error, but I was greeted with a similar one:
About to run the function populating corpus CQP positions...

PHP Fatal error:  Uncaught TypeError: Argument 1 passed to populate_corpus_cqp_positions() must be of the type int, string given, called in /var/www/html/web/bin/offline-freqlists.php on line 150 and defined in /var/www/html/web/lib/corpus-lib.php:1098
Stack trace:
#0 /var/www/html/web/bin/offline-freqlists.php(150): populate_corpus_cqp_positions()
#1 {main}
  thrown in /var/www/html/web/lib/corpus-lib.php on line 1098

OK, same thing, replace $corpus with $corpus_id and try again. This time it gets further:

About to run the function populating corpus CQP positions...

Done populating corpus CQP positions.

Function calculating category sizes was not run because there aren't any text classifications.

According to my corpus metadata table, there ARE text classifications. Why does it say there aren't?

And finally:

About to run the function making the CWB text-by-text frequency index...

Beginning to filter data from decode to encode to build the frequency-by-text CWB index...
Segmentation fault
Encoding of the by-text CWB frequency index is now complete.

That segmentation fault most likely comes from CQP, but unfortunately it does not say what exactly was going on at the time. I noticed that the __freq folder contains indexes even for p-attributes for which I specifically selected "N" in the "Needs FT" column of the "Manage Annotation" dialogue. Is this expected?

It is still running, just not saying what it is doing, but I can see that "cwb-makeall -M 1000 -r /data/corpora/cqpweb/registry -V MY_CORPUS_NAME__FREQ" is currently running, so I guess the segmentation fault may not have been critical. I'll probably find out soon...

Any help would be greatly appreciated!

Thanks and all the best!
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20220406/1b5f69bb/attachment.html>


More information about the CWB mailing list