[CWB] CQPweb offline-freqlists.php problems
Uhrig, Peter
peter.uhrig at fau.de
Thu Aug 4 21:53:52 CEST 2022
Hi Andrew,
Thanks for your reply! I have updated SVN and database today and returned to this problem.
The creation of frequency lists that are not wanted definitely still happens, and the segmentation fault (which I guess is a consequence of this, because some of my columns are known to mess with CQP) persists.
In addition, I had to comment out two lines in general_lib.php:
/* NKD changes more things than ND does, e.g. sharp s to normal s, or fi-lig to fi */
#$str = normalizer_normalize($str, Normalizer::FORM_KD);
$str = preg_replace('/\p{M}/u', '', $str);
#$str = normalizer_normalize($str, Normalizer::FORM_C);
/* i.e. the last thing we do is re-combine anything that wasn't scrubbed */
}
It seems the normalizer_normalize() function is not implemented yet.
I'd be grateful for any pointers or help!
Best wishes,
Peter
Von: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> Im Auftrag von Hardie, Andrew
Gesendet: Mittwoch, 6. April 2022 12:04
An: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Betreff: Re: [CWB] CQPweb offline-freqlists.php problems
Hi Peter,
Thanks for the bug report. I've just updated this script in SVN - it had fallen out of sync with the main code (largely cos I've not had cause to use it in a while ...) so as to fix all the bugs you report except the segfault and the creation of freqlists even when not wanted for an annotation. But I hope those were side-effects of earlier bugs. So, there is a chance they will be fixed also. If not - let me know.
(btw that instance of cwb-makeall was running after a segfault hit either cwb-encode or cwb-decode - so it can safely be killed)
best
Andrew.
From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> On Behalf Of Uhrig, Peter
Sent: 04 April 2022 21:14
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Subject: [CWB] CQPweb offline-freqlists.php problems
Dear all,
The script offline-freqlists.php causes me some loss of sleep:
First, it throws the following error in my setup:
PHP Fatal error: Uncaught TypeError: Argument 1 passed to drop_unneeded_corpus_freqtable_components() must be of the type int, string given, called in /var/www/html/web/bin/offline-freqlists.php on line 137 and defined in /var/www/html/web/lib/freqtable-lib.php:345
Stack trace:
#0 /var/www/html/web/bin/offline-freqlists.php(137): drop_unneeded_corpus_freqtable_components()
#1 {main}
thrown in /var/www/html/web/lib/freqtable-lib.php on line 345
This seems to be a bug in the code because the function drop_unneeded_corpus_freqtable_components really requires a corpus_id, but is called with a corpus_name from offline-freqlists.php:
drop_unneeded_corpus_freqtable_components($corpus);
I have thus replaced the line with
$corpus_id = corpus_name_to_id($corpus);
drop_unneeded_corpus_freqtable_components($corpus_id);
This means it continued past the previous error, but I was greeted with a similar one:
About to run the function populating corpus CQP positions...
PHP Fatal error: Uncaught TypeError: Argument 1 passed to populate_corpus_cqp_positions() must be of the type int, string given, called in /var/www/html/web/bin/offline-freqlists.php on line 150 and defined in /var/www/html/web/lib/corpus-lib.php:1098
Stack trace:
#0 /var/www/html/web/bin/offline-freqlists.php(150): populate_corpus_cqp_positions()
#1 {main}
thrown in /var/www/html/web/lib/corpus-lib.php on line 1098
OK, same thing, replace $corpus with $corpus_id and try again. This time it gets further:
About to run the function populating corpus CQP positions...
Done populating corpus CQP positions.
Function calculating category sizes was not run because there aren't any text classifications.
According to my corpus metadata table, there ARE text classifications. Why does it say there aren't?
And finally:
About to run the function making the CWB text-by-text frequency index...
Beginning to filter data from decode to encode to build the frequency-by-text CWB index...
Segmentation fault
Encoding of the by-text CWB frequency index is now complete.
That segmentation fault most likely comes from CQP, but unfortunately it does not say what exactly was going on at the time. I noticed that the __freq folder contains indexes even for p-attributes for which I specifically selected "N" in the "Needs FT" column of the "Manage Annotation" dialogue. Is this expected?
It is still running, just not saying what it is doing, but I can see that "cwb-makeall -M 1000 -r /data/corpora/cqpweb/registry -V MY_CORPUS_NAME__FREQ" is currently running, so I guess the segmentation fault may not have been critical. I'll probably find out soon...
Any help would be greatly appreciated!
Thanks and all the best!
Peter
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20220804/210a36ec/attachment.html>
More information about the CWB
mailing list