[CWB] CQPweb: Problems with Metadata indexing for two different corpora
Hardie, Andrew
a.hardie at lancaster.ac.uk
Wed Feb 10 14:53:26 CET 2016
Hi Hannha,
==========
At first I couldn't create any metadata table. When I reduced the amount
of metadata per text, I could at least create most of the Metadata table
==========
What was the error message here? I have never encountered this problem, although I can imagine how it might theoretically arise due to limits in MySQL.....
==========
My question is now. What is the problem and what can I do to generate
the 'Text begin/end positions'
==========
Use the "offline-freqlists" script. Admin manual sec 4.9, page 34.
best
Andrew.
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Hannah Kermes
Sent: 08 February 2016 14:09
To: Open source development of the Corpus WorkBench
Subject: [CWB] CQPweb: Problems with Metadata indexing for two different corpora
Hi all,
we recently had problems with the indexing of Metadata for two different
corpora in CQPweb
Both corpora are mid-sized corpora 24 Million and 35 Million tokens and
have been preencoded successfully in CQP.
The first corpus with approx. 24 Million, with almost 700.000 texts, and
quite a bit of metadata for each text (categorical and free text).
At first I couldn't create any metadata table. When I reduced the amount
of metadata per text, I could at least create most of the Metadata table
except the 'Text begin/end positions'.
We found out that we initially had a memory problem, which we solved by
increasing the allowed memory size. Then we had a run-time problem
originating from the server, so we increased the run-time here. Now we
have a time-out originating from the browser.
My question is now. What is the problem and what can I do to generate
the 'Text begin/end positions'. I think it is not very useful to
increase the time-out of my browser, as I would have to keep the
connection open all the time. Is there a possibility to generate the
metadata from the command-line on the server?
For the second corpus with approx. 35 Million tokens, the 'text
begin/end positions' are unproblematic. Here we have a problem
generating the 'Frequency talbes'. The corpus has 8 positional
attributes, for of which contain floats.
Thanks in advance
Hannah
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list