[CWB] CQPweb: Problems with Metadata indexing for two different corpora
Hannah Kermes
h.kermes at mx.uni-saarland.de
Mon Feb 8 15:08:46 CET 2016
Hi all,
we recently had problems with the indexing of Metadata for two different
corpora in CQPweb
Both corpora are mid-sized corpora 24 Million and 35 Million tokens and
have been preencoded successfully in CQP.
The first corpus with approx. 24 Million, with almost 700.000 texts, and
quite a bit of metadata for each text (categorical and free text).
At first I couldn't create any metadata table. When I reduced the amount
of metadata per text, I could at least create most of the Metadata table
except the 'Text begin/end positions'.
We found out that we initially had a memory problem, which we solved by
increasing the allowed memory size. Then we had a run-time problem
originating from the server, so we increased the run-time here. Now we
have a time-out originating from the browser.
My question is now. What is the problem and what can I do to generate
the 'Text begin/end positions'. I think it is not very useful to
increase the time-out of my browser, as I would have to keep the
connection open all the time. Is there a possibility to generate the
metadata from the command-line on the server?
For the second corpus with approx. 35 Million tokens, the 'text
begin/end positions' are unproblematic. Here we have a problem
generating the 'Frequency talbes'. The corpus has 8 positional
attributes, for of which contain floats.
Thanks in advance
Hannah
More information about the CWB
mailing list