[CWB] CQPweb: Problems with Metadata indexing for two different corpora

Wed Feb 10 14:53:26 CET 2016

Hi Hannha,

==========
At first I couldn't create any metadata table. When I reduced the amount 
of metadata per text, I could at least create most of the Metadata table
==========

What was the error message here? I have never encountered this problem, although I can imagine how it might theoretically arise due to limits in MySQL.....

==========
My question is now. What is the problem and what can I do to generate 
the 'Text begin/end positions'
==========

Use the "offline-freqlists" script. Admin manual sec 4.9, page 34.

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
Sent: 08 February 2016 14:09
To: Open source development of the Corpus WorkBench
Subject: [CWB] CQPweb: Problems with Metadata indexing for two different corpora

Hi all,

we recently had problems with the indexing of Metadata for two different 
corpora in CQPweb

Both corpora are mid-sized corpora 24 Million and 35 Million tokens and 
have been preencoded successfully in CQP.

The first corpus with approx. 24 Million, with almost 700.000 texts, and 
quite a bit of metadata for each text (categorical and free text).
At first I couldn't create any metadata table. When I reduced the amount 
of metadata per text, I could at least create most of the Metadata table 
except the 'Text begin/end positions'.
We found out that we initially had a memory problem, which we solved by 
increasing the allowed memory size. Then we had a run-time problem 
originating from the server, so we increased the run-time here. Now we 
have a time-out originating from the browser.
My question is now. What is the problem and what can I do to generate 
the 'Text begin/end positions'. I think it is not very useful to 
increase the time-out of my browser, as I would have to keep the 
connection open all the time. Is there a possibility to generate the 
metadata from the command-line on the server?

For the second corpus with approx. 35 Million tokens, the 'text 
begin/end positions' are unproblematic. Here we have a problem 
generating the 'Frequency talbes'. The corpus has 8 positional 
attributes, for of which contain floats.

Thanks in advance
Hannah
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb