[CWB] Indexing of metadata problem + no display of query results

Emmanuel CARTIER emmanuel.cartier at lipn.univ-paris13.fr
Fri Nov 20 17:30:28 CET 2015


Dear Andrew and Noah,

Thanks for your advices.

As for point A, I still have to investigate, as your suggestions do not 
apply in my case - perhaps I will try to change the php code to have the 
exact position of the problem in the input corpus.

As for point B, the error.log was clear : it was an error triggered by a 
too low memory limit allocation in php.ini, I corrected that and it 
works (I had more than 7 million answers...). But a suggestion : it will 
be a good feature if we can have the error message displayed on the web 
page; additionnaly, there is a (quite long) time lapse between the user 
query and the display, I will be fine to have a (partial) display 
quicker even on huge results, for example with asynchronous queries;

Thanks a lot anyway!

Emmanuel

Emmanuel Cartier
Enseignant-Chercheur en Linguistique Informatique
LIPN CNRS UMR 7030 - équipe RCLN
http://lipn.univ-paris13.fr/fr/rcln
Université Paris 13 Sorbonne Paris Cité
99 avenue Jean-Baptiste Clement
93430 Villetaneuse
tél. : (+33) 06 46 79 12 86
email : emmanuel.cartier at univ-paris13.fr

Le 20/11/2015 14:51, Hardie, Andrew a écrit :
> Ah yes, that might well be the case.
>
> I can't remember about old versions, but the current text_id limit is 255.
>
> Other ID codes have shorter limits imposed by the limitations of MySQL (which does not allow identifiers longer than 64 characters).
>
> Another possible cause of trouble is non-allowed characters in the text id. It must be only ascii alphanumerics plus _
>
> best
>
> Andrew.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Noah Bubenhofer
> Sent: 20 November 2015 13:47
> To: Open source development of the Corpus WorkBench
> Subject: Re: [CWB] Indexing of metadata problem + no display of query results
>
> In addition to Andrew's answer: I often had the error complaining about
> lines outside <text>. The not so obvious reason for that (after having
> checked the xml structure in depth) were the text id's: of course, a
> unique id per text is necessary. But you should know, that cqpweb allows
> id's of a max length of I think about 25 characters and I sometimes had
> longer ones. Cqpweb then just crops the id's and as a result you may
> have non unique id's in your data... CQP does not complain about that,
> but CQPweb...
>
> Perhaps this is also the case in your data.
>
> Noah
>
>
>
> Am 20.11.15 um 13:10 schrieb Emmanuel CARTIER:
>> Hi,
>>
>> I am currently working with the last version of CWB and with CQPWeb
>> version 3.0.16.
>> I managed to index big corpora (from 100 to 500 Go) on the command line
>> and install the corpora on CQPWeb.
>>
>> I have two problems:
>> A.
>> When I launch the offline-freq-list.php (php
>> ../bin/offline-freqlists.php <corpora name in lowercase) for generating
>> metadata indexes, it generates the following error :
>> </pre>
>> <p class="errormessage">CQPweb encountered an error and could not
>> continue.</p>
>>
>> <p class="errormessage">Unexpected line outside &lt;text&gt; tags while
>> creating corpus
>>                      POLOGNE_2015__FREQ! -- creation aborted</p>
>>
>> <p class="errormessage">... in file
>> <b>/var/www/CQPweb/lib/freqtable-cwb.inc.php</b> line <b>177</b>.</p>
>>
>> Afterwards, it does not unable to query the corpus, but can you indicate
>> me some hints to debug it?
>>
>> B. When querying my corpus (pos-tagged with treetagger, then post
>> processed to transform <unknown> lemma to "unknown") with the following
>> CQP query [lemma="unknown'], the web interface always ends with a blank
>> page. But when I use the commandline cqp utility, it is outputing the
>> results normally. can you give me some hints on that?
>>
>> Thanks a lot for your work and help,
>>
>> Emmanuel
>>
>>



More information about the CWB mailing list