[CWB] Indexing of metadata problem + no display of query results

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Nov 20 14:51:20 CET 2015


Ah yes, that might well be the case. 

I can't remember about old versions, but the current text_id limit is 255.

Other ID codes have shorter limits imposed by the limitations of MySQL (which does not allow identifiers longer than 64 characters).

Another possible cause of trouble is non-allowed characters in the text id. It must be only ascii alphanumerics plus _ 

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Noah Bubenhofer
Sent: 20 November 2015 13:47
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Indexing of metadata problem + no display of query results

In addition to Andrew's answer: I often had the error complaining about
lines outside <text>. The not so obvious reason for that (after having
checked the xml structure in depth) were the text id's: of course, a
unique id per text is necessary. But you should know, that cqpweb allows
id's of a max length of I think about 25 characters and I sometimes had
longer ones. Cqpweb then just crops the id's and as a result you may
have non unique id's in your data... CQP does not complain about that,
but CQPweb...

Perhaps this is also the case in your data.

Noah



Am 20.11.15 um 13:10 schrieb Emmanuel CARTIER:
> Hi,
> 
> I am currently working with the last version of CWB and with CQPWeb
> version 3.0.16.
> I managed to index big corpora (from 100 to 500 Go) on the command line
> and install the corpora on CQPWeb.
> 
> I have two problems:
> A.
> When I launch the offline-freq-list.php (php
> ../bin/offline-freqlists.php <corpora name in lowercase) for generating
> metadata indexes, it generates the following error :
> </pre>
> <p class="errormessage">CQPweb encountered an error and could not
> continue.</p>
> 
> <p class="errormessage">Unexpected line outside &lt;text&gt; tags while
> creating corpus
>                     POLOGNE_2015__FREQ! -- creation aborted</p>
> 
> <p class="errormessage">... in file
> <b>/var/www/CQPweb/lib/freqtable-cwb.inc.php</b> line <b>177</b>.</p>
> 
> Afterwards, it does not unable to query the corpus, but can you indicate
> me some hints to debug it?
> 
> B. When querying my corpus (pos-tagged with treetagger, then post
> processed to transform <unknown> lemma to "unknown") with the following
> CQP query [lemma="unknown'], the web interface always ends with a blank
> page. But when I use the commandline cqp utility, it is outputing the
> results normally. can you give me some hints on that?
> 
> Thanks a lot for your work and help,
> 
> Emmanuel
> 
> 

-- 
Universität Zürich
Institut für Computerlinguistik
Projekt "Visual Linguistics"
Binzmühlestr. 14
CH-8050 Zürich

www.bubenhofer.com
www.visual-linguistics.net
bubenhofer at cl.uzh.ch (PGP-Schlüssel vorhanden)
Tel. +41 44 635 67 18
Büro 2.A.14
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list