[CWB] Unable to index a corpus

Stefan Evert stefanML at collocations.de
Wed Aug 2 09:00:14 CEST 2017


> On 2 Aug 2017, at 02:54, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:
> 
> At present you have a choice of 3 bodges available in command-line cwb-encode: (a) with +N, to automatically rename nested elements so you get tag1, tag2, tag3 as your attributes; (b) with no +N, to treat every new <tag> as the beginning of a new non-nested region even if the previous one is unclosed; (c) with +0, to totally ignore nested regions.

I think Jorge wants to go with the :0 solution (not "+0", by the way), which he found in the Corpus Encoding Tutorial.  The main question was how to tell CQPweb to use this option when indexing the corpus.

Jorge, if you have admin access to the Web server('s data directories), it's usually better to index the CWB corpus yourself (perhaps even on your local computer), and then simply copy the index files and registry entry to the Web server, put them into the right directories and adjust paths accordingly.

I always use this approach, even for small corpora, and try to put all the metadata into <text> tags so that the entire CQPweb installation procedure runs from the pre-indexed corpus and I don't have to upload any additional files via the Web interface.  Works very well for me.

Best,
Stefan


More information about the CWB mailing list