[CWB] Indexing of metadata problem + no display of query results

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Nov 20 18:26:42 CET 2015


>>> it will be a good feature if we can have the error message displayed on the web page;

This is something that you configure yourself in PHP - by default, on a production server, errors go to the log file but not the browser, and on a dev server it is the opposite. Find out about the relevant php.ini directives here:

http://php.net/manual/en/errorfunc.configuration.php

The ones you need are

display_errors
log_errors

which should both be present in your php.ini already, but commented out and/or set to production settings. Change them, restart Apache, and you will see errors onscreen.

>>> additionnaly, there is a (quite long) time lapse between the user 
query and the display, I will be fine to have a (partial) display 
quicker even on huge results, 

I thought it already did that? One of the steps in startup is to set implicit flush (i.e. flush buffer after every write) to On.

Of course, the entirety of any MySQL or CQP activity must be completed before CQPweb can even *begin* to write the output.

>>> for example with asynchronous queries

All queries are asynchronous. CQPweb is single-threaded. When its process hands off work to other programs, it lies dormant till they are done. 

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Emmanuel CARTIER
Sent: 20 November 2015 16:30
To: cwb at sslmit.unibo.it
Subject: Re: [CWB] Indexing of metadata problem + no display of query results

Dear Andrew and Noah,

Thanks for your advices.

As for point A, I still have to investigate, as your suggestions do not 
apply in my case - perhaps I will try to change the php code to have the 
exact position of the problem in the input corpus.

As for point B, the error.log was clear : it was an error triggered by a 
too low memory limit allocation in php.ini, I corrected that and it 
works (I had more than 7 million answers...). But a suggestion : it will 
be a good feature if we can have the error message displayed on the web 
page; additionnaly, there is a (quite long) time lapse between the user 
query and the display, I will be fine to have a (partial) display 
quicker even on huge results, for example with asynchronous queries;

Thanks a lot anyway!

Emmanuel

Emmanuel Cartier
Enseignant-Chercheur en Linguistique Informatique
LIPN CNRS UMR 7030 - équipe RCLN
http://lipn.univ-paris13.fr/fr/rcln
Université Paris 13 Sorbonne Paris Cité
99 avenue Jean-Baptiste Clement
93430 Villetaneuse
tél. : (+33) 06 46 79 12 86
email : emmanuel.cartier at univ-paris13.fr

Le 20/11/2015 14:51, Hardie, Andrew a écrit :
> Ah yes, that might well be the case.
>
> I can't remember about old versions, but the current text_id limit is 255.
>
> Other ID codes have shorter limits imposed by the limitations of MySQL (which does not allow identifiers longer than 64 characters).
>
> Another possible cause of trouble is non-allowed characters in the text id. It must be only ascii alphanumerics plus _
>
> best
>
> Andrew.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Noah Bubenhofer
> Sent: 20 November 2015 13:47
> To: Open source development of the Corpus WorkBench
> Subject: Re: [CWB] Indexing of metadata problem + no display of query results
>
> In addition to Andrew's answer: I often had the error complaining about
> lines outside <text>. The not so obvious reason for that (after having
> checked the xml structure in depth) were the text id's: of course, a
> unique id per text is necessary. But you should know, that cqpweb allows
> id's of a max length of I think about 25 characters and I sometimes had
> longer ones. Cqpweb then just crops the id's and as a result you may
> have non unique id's in your data... CQP does not complain about that,
> but CQPweb...
>
> Perhaps this is also the case in your data.
>
> Noah
>
>
>
> Am 20.11.15 um 13:10 schrieb Emmanuel CARTIER:
>> Hi,
>>
>> I am currently working with the last version of CWB and with CQPWeb
>> version 3.0.16.
>> I managed to index big corpora (from 100 to 500 Go) on the command line
>> and install the corpora on CQPWeb.
>>
>> I have two problems:
>> A.
>> When I launch the offline-freq-list.php (php
>> ../bin/offline-freqlists.php <corpora name in lowercase) for generating
>> metadata indexes, it generates the following error :
>> </pre>
>> <p class="errormessage">CQPweb encountered an error and could not
>> continue.</p>
>>
>> <p class="errormessage">Unexpected line outside &lt;text&gt; tags while
>> creating corpus
>>                      POLOGNE_2015__FREQ! -- creation aborted</p>
>>
>> <p class="errormessage">... in file
>> <b>/var/www/CQPweb/lib/freqtable-cwb.inc.php</b> line <b>177</b>.</p>
>>
>> Afterwards, it does not unable to query the corpus, but can you indicate
>> me some hints to debug it?
>>
>> B. When querying my corpus (pos-tagged with treetagger, then post
>> processed to transform <unknown> lemma to "unknown") with the following
>> CQP query [lemma="unknown'], the web interface always ends with a blank
>> page. But when I use the commandline cqp utility, it is outputing the
>> results normally. can you give me some hints on that?
>>
>> Thanks a lot for your work and help,
>>
>> Emmanuel
>>
>>

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list