[CWB] Unable to index a corpus
VIVALDI PALATRESI, JORGE
jorge.vivaldi at upf.edu
Thu Aug 3 11:04:41 CEST 2017
Andrew, Stefan,
As I have fully access to web data directories, I will try to manually
index my corpora and copy index file and registry entries to the right
places and adjust paths accordingly. According to this suggetion the
full procedure would be a follow:
- use CQPweb to register the corpus
- it will fail to index so I will do it manually and copy files in the
data directories
At this point, will CQPweb see the new indexed corpus?
Regarding the metadata, each corpus file must have it own metadata.
Therefore the corpus cqp file should have the following format:
<text id="m00105" title="title of document m00105"
domain="medicine"> ... </text>
<text id="d00016" title="title of document d00016" domain="law"> ... </text>
...
Assuming this is correct. May I perform the same queries to this
corpus that in any other corpus indexed with CQPweb regular procedure?
Thank you very much for your help
Best,
Jorge
2017-08-02 9:00 GMT+02:00, Stefan Evert <stefanML at collocations.de>:
>
>> On 2 Aug 2017, at 02:54, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:
>>
>> At present you have a choice of 3 bodges available in command-line
>> cwb-encode: (a) with +N, to automatically rename nested elements so you
>> get tag1, tag2, tag3 as your attributes; (b) with no +N, to treat every
>> new <tag> as the beginning of a new non-nested region even if the previous
>> one is unclosed; (c) with +0, to totally ignore nested regions.
>
> I think Jorge wants to go with the :0 solution (not "+0", by the way), which
> he found in the Corpus Encoding Tutorial. The main question was how to tell
> CQPweb to use this option when indexing the corpus.
>
> Jorge, if you have admin access to the Web server('s data directories), it's
> usually better to index the CWB corpus yourself (perhaps even on your local
> computer), and then simply copy the index files and registry entry to the
> Web server, put them into the right directories and adjust paths
> accordingly.
>
> I always use this approach, even for small corpora, and try to put all the
> metadata into <text> tags so that the entire CQPweb installation procedure
> runs from the pre-indexed corpus and I don't have to upload any additional
> files via the Web interface. Works very well for me.
>
> Best,
> Stefan
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
--
Jorge Vivaldi Palatresi
Institut Universitari de Lingüística Aplicada
Universitat Pompeu Fabra
C/ Roc Boronat, 138
08018 Barcelona
Espanya
+34 93 542 2332
More information about the CWB
mailing list