Hi Andrew,<div>I understand the suggested procedure.<br><div>Thank you for your valuable help.</div><div>Best,</div><div>Jorge</div><div><br><br>El divendres, 4 d’agost de 2017, Hardie, Andrew <<a href="mailto:a.hardie@lancaster.ac.uk">a.hardie@lancaster.ac.uk</a>> va escriure:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Hi Jorge,<br>
<br>
No, it's rather simpler than that:<br>
<br>
Step 1 - index the corpus using command-line CWB (wherever you like on the system, as long as the files/directories you create are in a location on the file system where the web server's user account has permission to read them)<br>
<br>
Step 2 - go to the "Install new corpus" page in CQPweb, and click on the link at the top that says "Click here to install a corpus you have already indexed in CWB."<br>
<br>
Step 3 - specify the location of the registry file. (this will be copied into CQPweb's own registry if not already there; the index files themselves will not be copied or moved.)<br>
<br>
Step 4 - once you've installed the corpus thusly, proceed onto the other installation steps (generate your text metadata from the XML attributes on <text>, setup frequency lists, etc.)<br>
<br>
At this point the corpus ought to behave identically to one set up in CQPweb from the start.<br>
<br>
best<br>
<br>
Andrew.<br>
<br>
-----Original Message-----<br>
From: <a href="javascript:;" onclick="_e(event, 'cvml', 'cwb-bounces@sslmit.unibo.it')">cwb-bounces@sslmit.unibo.it</a> [mailto:<a href="javascript:;" onclick="_e(event, 'cvml', 'cwb-bounces@sslmit.unibo.it')">cwb-bounces@sslmit.unibo.it</a>] On Behalf Of VIVALDI PALATRESI, JORGE<br>
Sent: 03 August 2017 10:05<br>
To: Open source development of the Corpus WorkBench<br>
Subject: Re: [CWB] Unable to index a corpus<br>
<br>
Andrew, Stefan,<br>
<br>
As I have fully access to web data directories, I will try to manually<br>
index my corpora and copy index file and registry entries to the right<br>
places and adjust paths accordingly. According to this suggetion the<br>
full procedure would be a follow:<br>
- use CQPweb to register the corpus<br>
- it will fail to index so I will do it manually and copy files in the<br>
data directories<br>
At this point, will CQPweb see the new indexed corpus?<br>
<br>
Regarding the metadata, each corpus file must have it own metadata.<br>
Therefore the corpus cqp file should have the following format:<br>
<text id="m00105" title="title of document m00105"<br>
domain="medicine"> ... </text><br>
<text id="d00016" title="title of document d00016" domain="law"> ... </text><br>
...<br>
Assuming this is correct. May I perform the same queries to this<br>
corpus that in any other corpus indexed with CQPweb regular procedure?<br>
<br>
Thank you very much for your help<br>
<br>
Best,<br>
Jorge<br>
<br>
<br>
2017-08-02 9:00 GMT+02:00, Stefan Evert <<a href="javascript:;" onclick="_e(event, 'cvml', 'stefanML@collocations.de')">stefanML@collocations.de</a>>:<br>
><br>
>> On 2 Aug 2017, at 02:54, Hardie, Andrew <<a href="javascript:;" onclick="_e(event, 'cvml', 'a.hardie@lancaster.ac.uk')">a.hardie@lancaster.ac.uk</a>> wrote:<br>
>><br>
>> At present you have a choice of 3 bodges available in command-line<br>
>> cwb-encode: (a) with +N, to automatically rename nested elements so you<br>
>> get tag1, tag2, tag3 as your attributes; (b) with no +N, to treat every<br>
>> new <tag> as the beginning of a new non-nested region even if the previous<br>
>> one is unclosed; (c) with +0, to totally ignore nested regions.<br>
><br>
> I think Jorge wants to go with the :0 solution (not "+0", by the way), which<br>
> he found in the Corpus Encoding Tutorial. The main question was how to tell<br>
> CQPweb to use this option when indexing the corpus.<br>
><br>
> Jorge, if you have admin access to the Web server('s data directories), it's<br>
> usually better to index the CWB corpus yourself (perhaps even on your local<br>
> computer), and then simply copy the index files and registry entry to the<br>
> Web server, put them into the right directories and adjust paths<br>
> accordingly.<br>
><br>
> I always use this approach, even for small corpora, and try to put all the<br>
> metadata into <text> tags so that the entire CQPweb installation procedure<br>
> runs from the pre-indexed corpus and I don't have to upload any additional<br>
> files via the Web interface. Works very well for me.<br>
><br>
> Best,<br>
> Stefan<br>
> ______________________________<wbr>_________________<br>
> CWB mailing list<br>
> <a href="javascript:;" onclick="_e(event, 'cvml', 'CWB@sslmit.unibo.it')">CWB@sslmit.unibo.it</a><br>
> <a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" target="_blank">http://liste.sslmit.unibo.it/<wbr>mailman/listinfo/cwb</a><br>
><br>
<br>
<br>
--<br>
Jorge Vivaldi Palatresi<br>
Institut Universitari de Lingüística Aplicada<br>
Universitat Pompeu Fabra<br>
C/ Roc Boronat, 138<br>
08018 Barcelona<br>
Espanya<br>
<br>
+34 93 542 2332<br>
______________________________<wbr>_________________<br>
CWB mailing list<br>
<a href="javascript:;" onclick="_e(event, 'cvml', 'CWB@sslmit.unibo.it')">CWB@sslmit.unibo.it</a><br>
<a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" target="_blank">http://liste.sslmit.unibo.it/<wbr>mailman/listinfo/cwb</a><br>
______________________________<wbr>_________________<br>
CWB mailing list<br>
<a href="javascript:;" onclick="_e(event, 'cvml', 'CWB@sslmit.unibo.it')">CWB@sslmit.unibo.it</a><br>
<a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" target="_blank">http://liste.sslmit.unibo.it/<wbr>mailman/listinfo/cwb</a><br>
</blockquote></div></div><br><br>-- <br><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div><div dir="ltr"><div>Jorge Vivaldi Palatresi<br>Institut Universitari de Lingüística
Aplicada<br>Universitat Pompeu Fabra<br>C/ Roc Boronat, 138<br>08018
Barcelona<br>Espanya<br><br>+34 93 542 2332<br><br></div></div></div></div></div></div></div></div></div></div><br>