<div dir="ltr"><div><span style="font-size:14px">part of the output of &quot;</span><span style="font-size:14px">cwb-decode -C CANTON1 -ALL | less</span><span style="font-size:14px">&quot;</span></div><div><span style="font-size:14px"><br></span></div><div><div><span style="font-size:14px">&lt;s&gt;</span></div><div><span style="font-size:14px">&lt;text&gt;</span></div><div><span style="font-size:14px">&lt;text_id T01&gt;</span></div><div><span style="font-size:14px">中環    N       中環</span></div><div><span style="font-size:14px">保育    V       保育</span></div><div><span style="font-size:14px">奇觀    N       奇觀</span></div><div><span style="font-size:14px">:      PU      :</span></div><div><span style="font-size:14px">孫中山  N       孫中山</span></div><div><span style="font-size:14px">史蹟    N       史蹟</span></div><div><span style="font-size:14px">徑      N       徑</span></div><div><span style="font-size:14px">至      CONJ    至</span></div><div><span style="font-size:14px">大館    N       大館</span></div><div><span style="font-size:14px">&lt;/text_id&gt;</span></div><div><span style="font-size:14px">&lt;/text&gt;</span></div><div><span style="font-size:14px">&lt;/s&gt;</span></div><div style="font-size:14px"><br></div></div><div><span style="font-size:14px"><br></span></div><div><span style="font-size:14px">part of the</span><span style="font-size:14px"> </span><span style="font-size:14px">output of &quot;cwb-described-corpus -s CANTON1&quot;</span><br></div><div><br></div><div>==============================<wbr>==============================</div><div>Corpus: CANTON1</div><div>==============================<wbr>==============================</div><div><br></div><div>description:    </div><div>registry file:  /usr/local/share/cwb/registry/<wbr>canton1</div><div>home directory: /usr/local/corpora/data/<wbr>canton1/</div><div>info file:      /usr/local/corpora/data/<wbr>canton1/.info</div><div>size (tokens):  23</div><div><br></div><div>  3 positional attributes</div><div>  3 structural attributes</div><div>  0 alignment  attributes</div><div><br></div><div>p-ATT word                     23 tokens,       22 types</div><div>p-ATT pos                      23 tokens,        8 types</div><div>p-ATT lemma                    23 tokens,       22 types</div><div>s-ATT s                         2 regions</div><div>s-ATT text                      2 regions</div><div>s-ATT text_id                   2 regions (with annotations)</div><div><br></div><div><br></div><div>It seems that CWB can recognize the number of words but CQPweb doesn&#39;t.</div><div><br></div><div>Regards,</div><div>Lai</div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-06-19 15:43 GMT+08:00 Stefan Evert <span dir="ltr">&lt;<a href="mailto:stefanML@collocations.de" target="_blank">stefanML@collocations.de</a>&gt;</span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">What does the corpus look like if you decode it from the CWB index with the following command?<br>
<br>
        cwb-decode -C CANTON1 -ALL | less<br>
<br>
Can you show us part of the output?  It would also be useful to see the output of<br>
<br>
        cwb-described-corpus -s CANTON1<br>
<br>
<br>
One possibility I can think of is that your linebreaks are messed up so that CWB treats everything within the text region as a single long line. <br>
<br>
Best,<br>
Stefan<br>
<span class=""><br>
<br>
&gt; On 19 Jun 2018, at 09:26, Hermann Lai &lt;<a href="mailto:halflifelai@gmail.com">halflifelai@gmail.com</a>&gt; wrote:<br>
&gt; <br>
&gt; I am using CQPwebinabox and I have indexed a Traditonal Chinese corpus called &quot;canton1&quot; by using two commands:<br>
&gt; <br>
&gt; sudo cwb-encode -d /usr/local/corpora/data/<wbr>canton1 -f /home/user/Desktop/corpora/<wbr>canton1/canton1.vrt -R /usr/local/share/cwb/registry/<wbr>canton1 -c utf8 -xsB -P pos -P lemma -S s:0 -S text:0+id<br>
&gt; <br>
&gt; sudo cwb-make -V CANTON1<br>
&gt; <br>
&gt; After that, I install the corpus onto CQPweb. Most of the thing are correct. However, the total number of corpus texts is as same as the total words in all corpus texts.<br>
<br>
</span>______________________________<wbr>_________________<br>
CWB mailing list<br>
<a href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><br>
<a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" rel="noreferrer" target="_blank">http://liste.sslmit.unibo.it/<wbr>mailman/listinfo/cwb</a><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><blockquote style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><i><font face="times new roman, serif" size="4">Gaspard Germannson</font></i></blockquote></div></div></div>
</div>