<div style="line-height:1.7;color:#000000;font-size:14px;font-family:arial"><!--[if gte mso 9]><xml>
 <w:WordDocument>
  <w:View>Normal</w:View>
  <w:Zoom>0</w:Zoom>
  <w:PunctuationKerning/>
  <w:DrawingGridVerticalSpacing>7.8 磅</w:DrawingGridVerticalSpacing>
  <w:DisplayHorizontalDrawingGridEvery>0</w:DisplayHorizontalDrawingGridEvery>
  <w:DisplayVerticalDrawingGridEvery>2</w:DisplayVerticalDrawingGridEvery>
  <w:ValidateAgainstSchemas/>
  <w:SaveIfXMLInvalid>false</w:SaveIfXMLInvalid>
  <w:IgnoreMixedContent>false</w:IgnoreMixedContent>
  <w:AlwaysShowPlaceholderText>false</w:AlwaysShowPlaceholderText>
  <w:Compatibility>
   <w:SpaceForUL/>
   <w:BalanceSingleByteDoubleByteWidth/>
   <w:DoNotLeaveBackslashAlone/>
   <w:ULTrailSpace/>
   <w:DoNotExpandShiftReturn/>
   <w:AdjustLineHeightInTable/>
   <w:BreakWrappedTables/>
   <w:SnapToGridInCell/>
   <w:WrapTextWithPunct/>
   <w:UseAsianBreakRules/>
   <w:DontGrowAutofit/>
   <w:UseFELayout/>
  </w:Compatibility>
  <w:BrowserLevel>MicrosoftInternetExplorer4</w:BrowserLevel>
 </w:WordDocument>
</xml><![endif]--><!--[if gte mso 9]><xml>
 <w:LatentStyles DefLockedState="false" LatentStyleCount="156">
 </w:LatentStyles>
</xml><![endif]--><!--[if gte mso 10]>
<style>
 /* Style Definitions */
 table.MsoNormalTable
        {mso-style-name:普通表格;
        mso-tstyle-rowband-size:0;
        mso-tstyle-colband-size:0;
        mso-style-noshow:yes;
        mso-style-parent:"";
        mso-padding-alt:0cm 5.4pt 0cm 5.4pt;
        mso-para-margin:0cm;
        mso-para-margin-bottom:.0001pt;
        mso-pagination:widow-orphan;
        font-size:10.0pt;
        font-family:"Times New Roman";
        mso-fareast-font-family:"Times New Roman";
        mso-ansi-language:#0400;
        mso-fareast-language:#0400;
        mso-bidi-language:#0400;}
</style>
<![endif]-->

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">I’m also
interested in this topic, as I can see that a pre-indexing approach is the
only(?) way to put a very big corpus online.</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">I did the
patch work and the </span><em><span style="font-style:normal;
mso-bidi-font-style:italic" lang="EN-US">new</span></em><i style="mso-bidi-font-style:normal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US"> </span></i><span style="mso-bidi-font-size:24.0pt" lang="EN-US">DICKENS corpus can be queried from the
terminal via cqp. Good work, Andrew. Now, I wanted to loaded it into CQPweb via
the browser.</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">Here is
the layout of the new DICKENS corpus on my computer:</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">/Dickens/DICKENS-cqbweb-edition$
ls</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">data<span style="mso-spacerun:yes">&nbsp; </span>registry</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">/Dickens/DICKENS-cqbweb-edition/registry$
ls</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">dickens</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">Two lines
are changed in the registry file "dickens" to make it CQPweb
compatible:</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US"># data
file directory (relative or absolute path)</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">HOME
/home/ray/Dickens/DICKENS-cqbweb-edition/data </span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">#
optional info file (displayed by "info" command in CQP)</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">INFO
/home/ray/Dickens/DICKENS-cqbweb-edition/data/.info</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">My metadata
for DICKENS:</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">/usr/local/apache2/cqpweb_aux/upload$
cat dickens_meta.txt </span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt;mso-ansi-language:
DE" lang="DE">ACC<span style="mso-tab-count:1">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>ACC<span style="mso-tab-count:
1">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span></span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt;mso-ansi-language:
DE" lang="DE">DC<span style="mso-tab-count:1">&nbsp; </span>DC</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt;mso-ansi-language:
DE" lang="DE">DaS<span style="mso-tab-count:1"> </span>Das</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt;mso-ansi-language:
DE" lang="DE">GE<span style="mso-tab-count:1">&nbsp;&nbsp; </span>GE</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt;mso-ansi-language:
DE" lang="DE">HT<span style="mso-tab-count:1">&nbsp;&nbsp; </span>HT</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt;mso-ansi-language:
DE" lang="DE">MHC<span style="mso-tab-count:1">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>MHC</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">NN<span style="mso-tab-count:1">&nbsp; </span>NN</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">OT<span style="mso-tab-count:1">&nbsp; </span>OT</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">OMF<span style="mso-tab-count:1">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>OMF</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">BOZ<span style="mso-tab-count:1"> </span>BOZ</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">ToTC<span style="mso-tab-count:1">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>ToTC</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">OCS<span style="mso-tab-count:1">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </span>OCS</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">PP<span style="mso-tab-count:1">&nbsp;&nbsp; </span>PP</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">3GS<span style="mso-tab-count:1"> </span>3GS</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">When
asked “Where is the registry file?” I specified “In the directory specified
here:” </span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">/home/ray/Dickens/DICKENS-cqbweb-edition/registry</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">After
hitting "Install corpus with settings above", I got the following
error message:</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">CQPweb
encountered an error and could not continue.</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">The data
directory specified in the registry file could not be found.</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">... in
file /usr/local/apache2/htdocs/cqp/lib/admin-install.inc.php line 146.</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US"><span style="mso-spacerun:yes">&nbsp;</span></span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">I looked
into :/usr/local/apache2/htdocs/cqp (my CQPweb directory) and found no
directory called dickens was created. However, if I commented the said line 146
out, the dickens directory could be created in the CQPweb program directory,
but there was still no index created in /usr/local/apache2/cqpweb_aux/index (my
CQPweb index directory for all corpora).</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">After I
manually moved all the DICKENS corpus's index files to the CQPweb's index
directory, I could start to use the DICKENS corpus via my browser.</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">All
things went well except the “Restricted Query”. When I tried to search the word
"the" in ACC, the browser says:</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">Your
query had no results.</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">There are
no matches for your query.</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">It seems
that my metadata is not recognized. I guess this might have to do with some
internal changes to the DICKENS corpus not implemented by the patch work yet.
Am I correct?</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">&nbsp;</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">Best,</span></p>

<p class="MsoNormal"><span style="mso-bidi-font-size:24.0pt" lang="EN-US">Ray</span></p>

<div id="divNeteaseMailCard"></div><br>At 2012-05-25 07:01:49,"Hardie,&nbsp;Andrew"&nbsp;&lt;a.hardie@lancaster.ac.uk&gt; wrote:<br> <blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">



<style><!--

_font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
_font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
_font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
_font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}

p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
span.EmailStyle17
        {mso-style-type:personal-reply;
        font-family:"Verdana","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
_page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style>


<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Well, I was referring to editing the source, not realising that Stefan did not have it to hand. BUT you can still encode the two necessary s-attributes as extras,
 using the attached files.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Assuming you are in the root of the tutorial corpus, insert those two files there, and run these commands:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">cwb-s-encode -d data -f text.src -S text<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">cwb-s-encode -d data -f text_id.src -V text_id<o:p></o:p></span></b></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Then add the following lines to the registry/dickens file:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"># &lt;text id=".."&gt; ... &lt;/text&gt;<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"># (no recursive embedding allowed)<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">STRUCTURE text<o:p></o:p></span></b></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">STRUCTURE text_id&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; # [annotations]<o:p></o:p></span></b></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">(in with the other s-atts).<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">IF all the above works successfully, the corpus should become CQPweb-compatible. You can check whether it worked as follows:<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">cwb-describe-corpus -r registry -sd DICKENS | less<o:p></o:p></span></b></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">best<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p>&nbsp;</o:p></span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;" lang="EN-US">From:</span></b><span style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;" lang="EN-US"> <a href="mailto:cwb-bounces@sslmit.unibo.it">cwb-bounces@sslmit.unibo.it</a> [mailto:<a href="mailto:cwb-bounces@sslmit.unibo.it">cwb-bounces@sslmit.unibo.it</a>]
<b>On Behalf Of </b>Kurt Sultana<br>
<b>Sent:</b> 24 May 2012 15:27<br>
<b>To:</b> Open source development of the Corpus WorkBench<br>
<b>Subject:</b> Re: [CWB] Sample corpus for IMS Corpus Workbench<o:p></o:p></span></p>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<p class="MsoNormal">Thanks for your input guys.&nbsp;<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">Andrew, when you said:<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">&nbsp;<o:p></o:p></p>
</div>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt">
<p class="MsoNormal">adjusting the existing tutorial data to make it CQPweb-compatible is much easier, as outlined<o:p></o:p></p>
</blockquote>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">I didn't quite get you there. The files in the /data directory seem to be encoded (I believe CWB encodes them in the process). Where should I do the changes from &lt;novel&gt; to &lt;text&gt;?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
<div>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Kurt<o:p></o:p></p>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<p class="MsoNormal">On Tue, May 22, 2012 at 10:50 AM, Stefan Evert &lt;<a href="mailto:stefanML@collocations.de" target="_blank">stefanML@collocations.de</a>&gt; wrote:<o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt"><br>
&gt; And if you can hang on till I and/or Stefan finds a suitable schedule hole (which alas can take a very long time as neither of us works on CWB as our main job), we’ll do it for you, as Stefan said!<o:p></o:p></p>
</div>
<p class="MsoNormal">I'm afraid this may have to wait until my laptop stops being dead -- apparently the motherboard is broken -- and I can get my hands on the source code of the demo corpora again. &nbsp;I might want to put them in a safer place then ...<br>
<br>
I'll set a reminder to look at the issue again in early June.<br>
<br>
Cheers,<br>
Stefan<o:p></o:p></p>
<div>
<div>
<p class="MsoNormal"><br>
_______________________________________________<br>
CWB mailing list<br>
<a href="mailto:CWB@sslmit.unibo.it" target="_blank">CWB@sslmit.unibo.it</a><br>
<a href="http://devel.sslmit.unibo.it/mailman/listinfo/cwb" target="_blank">http://devel.sslmit.unibo.it/mailman/listinfo/cwb</a><o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
</div>


</blockquote></div><br><br><span title="neteasefooter"><span id="netease_mail_footer"></span></span>