<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman",serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:#0563C1;
        text-decoration:underline;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Verdana",sans-serif;
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-GB" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">For monitor corpora the best approach is to create a new installed corpus at each update point.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Adding more text to an existing corpus is one of those things that sounds good until you think about how it would fit with all the
 rest of the system. For instance, if the system administrator were to append more text to a corpus, it would cause
<b>all</b> the saved data by users (saved queries, categorised queries, subcorpora) to suddenly no longer match the corpus they relate to. Not to mention that &nbsp;when running the same query on the same corpus produces different results on Tuesday than it did
 on Monday, replicability of analyses becomes a serious headache. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">So it will
<b>never</b> be possible to append additional text to an existing indexed corpus.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">What
<i>would</i> be possible, and I’ll add it to my list for the long term, would be a function to say “create a new corpus by taking&nbsp; the full content of this existing corpus and adding these new files to it”. IE, without modifying the original corpus data in
 any way. <o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">But in fact, that’s already possible via command line: using cwb-decode to save the existing corpus to a text file and then adding
 in the extra files and running cwb-encode on the whole lot. There’s just no web interface for that use case at present.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">best<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Andrew.<br>
<br>
<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p>&nbsp;</o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif"> cwb-bounces@sslmit.unibo.it &lt;cwb-bounces@sslmit.unibo.it&gt;
<b>On Behalf Of </b>wu liangping<br>
<b>Sent:</b> 08 July 2021 10:15<br>
<b>To:</b> Open source development of the Corpus WorkBench &lt;cwb@sslmit.unibo.it&gt;<br>
<b>Subject:</b> Re: [CWB] How to add new data to a corpus without re-indexing it<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p>&nbsp;</o:p></p>
<div>
<div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">Hi Andrew,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black"><o:p>&nbsp;</o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">Thanks for the clarification, then all things seem to make sense.&nbsp;<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">As monitor/dynamic corpus is becoming more visible, it would be great to find a way to be able to periodically update the data behind CQPweb.<o:p></o:p></span></p>
</div>
<p style="margin:0cm"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black"><o:p>&nbsp;</o:p></span></p>
<p style="margin:0cm"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">Best,<o:p></o:p></span></p>
<div>
<p class="MsoNormal"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">WU Liangping</span><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:gray"><br>
</span><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">&nbsp;
<o:p></o:p></span></p>
</div>
<p style="margin:0cm"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black"><o:p>&nbsp;</o:p></span></p>
<p><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">At 2021-07-08 16:51:20, &quot;Hardie, Andrew&quot; &lt;</span><a href="mailto:a.hardie@lancaster.ac.uk"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif">a.hardie@lancaster.ac.uk</span></a><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">&gt;
 wrote:<o:p></o:p></span></p>
<blockquote style="border:none;border-left:solid #CCCCCC 1.0pt;padding:0cm 0cm 0cm 6.0pt;margin-left:4.8pt;margin-top:5.0pt;margin-right:0cm;margin-bottom:5.0pt" id="isReplyContent">
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Hi Liangping,</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">It’s not possible to
<b>append</b> text to an existing corpus. The “add data” function allows you to add
<i>new attributes</i> (annotation/xml) or <i>new metadata</i> to the existing corpus. IT doesn’t allow you to extend the corpus.
</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">best</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Andrew.</span><span style="color:black"><o:p></o:p></span></p>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal" style="margin-left:4.8pt"><b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:black">From:</span></b><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:black">
</span><a href="mailto:cwb-bounces@sslmit.unibo.it"><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">cwb-bounces@sslmit.unibo.it</span></a><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:black">
 &lt;</span><a href="mailto:cwb-bounces@sslmit.unibo.it"><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">cwb-bounces@sslmit.unibo.it</span></a><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:black">&gt;
<b>On Behalf Of </b>wu liangping<br>
<b>Sent:</b> 08 July 2021 09:16<br>
<b>To:</b> </span><a href="mailto:cwb@sslmit.unibo.it"><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif">cwb@sslmit.unibo.it</span></a><span lang="EN-US" style="font-size:11.0pt;font-family:&quot;Calibri&quot;,sans-serif;color:black"><br>
<b>Subject:</b> [CWB] How to add new data to a corpus without re-indexing it</span><span style="color:black"><o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="color:black">&nbsp;<o:p></o:p></span></p>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">Dear all,</span><span style="color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">Has anyon managed to add new data to a corpus without re-indexing it?</span><span style="color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">In the &quot;Latest news&quot; of a recent 3.2 branch CQPweb installation, it reads that CQPweb has &quot;[c]ompleted the feature that adds new data to
 a corpus without re-indexing it (this can now be done for p-attributes as well as s-attributes and corpus metadata)&quot; since version 3.2.31. However, a previous discussion back in 2012 in the thread titled &quot;Appending text to an existing corpus&quot; clearly says
 that we &quot;need to re-index from scratch&quot; if we want to append text to an existing corpus.</span><span style="color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">Has anyone tried the new feature with success? Or better still, is there any documentation for this new feature?&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">Thanks for any hints before we decide to dive into the actual code.</span><span style="color:black"><o:p></o:p></span></p>
</div>
<p style="mso-margin-top-alt:0cm;margin-right:0cm;margin-bottom:0cm;margin-left:4.8pt">
<span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">&nbsp;</span><span style="color:black"><o:p></o:p></span></p>
<div>
<p class="MsoNormal" style="margin-left:4.8pt"><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">Best,<br>
WU Liangping</span><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:gray"><br>
</span><span style="font-size:10.5pt;font-family:&quot;Arial&quot;,sans-serif;color:black">&nbsp;
</span><span style="color:black"><o:p></o:p></span></p>
</div>
</div>
</div>
</blockquote>
</div>
</div>
</div>
</body>
</html>