[CWB] CQPweb: Problems with Metadata indexing for two different corpora

Hannah Kermes h.kermes at mx.uni-saarland.de
Mon Feb 8 15:08:46 CET 2016


Hi all,

we recently had problems with the indexing of Metadata for two different 
corpora in CQPweb

Both corpora are mid-sized corpora 24 Million and 35 Million tokens and 
have been preencoded successfully in CQP.

The first corpus with approx. 24 Million, with almost 700.000 texts, and 
quite a bit of metadata for each text (categorical and free text).
At first I couldn't create any metadata table. When I reduced the amount 
of metadata per text, I could at least create most of the Metadata table 
except the 'Text begin/end positions'.
We found out that we initially had a memory problem, which we solved by 
increasing the allowed memory size. Then we had a run-time problem 
originating from the server, so we increased the run-time here. Now we 
have a time-out originating from the browser.
My question is now. What is the problem and what can I do to generate 
the 'Text begin/end positions'. I think it is not very useful to 
increase the time-out of my browser, as I would have to keep the 
connection open all the time. Is there a possibility to generate the 
metadata from the command-line on the server?

For the second corpus with approx. 35 Million tokens, the 'text 
begin/end positions' are unproblematic. Here we have a problem 
generating the 'Frequency talbes'. The corpus has 8 positional 
attributes, for of which contain floats.

Thanks in advance
Hannah


More information about the CWB mailing list