<div dir="ltr"><div><span style="font-size:14px">part of the output of "</span><span style="font-size:14px">cwb-decode -C CANTON1 -ALL | less</span><span style="font-size:14px">"</span></div><div><span style="font-size:14px"><br></span></div><div><div><span style="font-size:14px"><s></span></div><div><span style="font-size:14px"><text></span></div><div><span style="font-size:14px"><text_id T01></span></div><div><span style="font-size:14px">中環 N 中環</span></div><div><span style="font-size:14px">保育 V 保育</span></div><div><span style="font-size:14px">奇觀 N 奇觀</span></div><div><span style="font-size:14px">: PU :</span></div><div><span style="font-size:14px">孫中山 N 孫中山</span></div><div><span style="font-size:14px">史蹟 N 史蹟</span></div><div><span style="font-size:14px">徑 N 徑</span></div><div><span style="font-size:14px">至 CONJ 至</span></div><div><span style="font-size:14px">大館 N 大館</span></div><div><span style="font-size:14px"></text_id></span></div><div><span style="font-size:14px"></text></span></div><div><span style="font-size:14px"></s></span></div><div style="font-size:14px"><br></div></div><div><span style="font-size:14px"><br></span></div><div><span style="font-size:14px">part of the</span><span style="font-size:14px"> </span><span style="font-size:14px">output of "cwb-described-corpus -s CANTON1"</span><br></div><div><br></div><div>==============================<wbr>==============================</div><div>Corpus: CANTON1</div><div>==============================<wbr>==============================</div><div><br></div><div>description: </div><div>registry file: /usr/local/share/cwb/registry/<wbr>canton1</div><div>home directory: /usr/local/corpora/data/<wbr>canton1/</div><div>info file: /usr/local/corpora/data/<wbr>canton1/.info</div><div>size (tokens): 23</div><div><br></div><div> 3 positional attributes</div><div> 3 structural attributes</div><div> 0 alignment attributes</div><div><br></div><div>p-ATT word 23 tokens, 22 types</div><div>p-ATT pos 23 tokens, 8 types</div><div>p-ATT lemma 23 tokens, 22 types</div><div>s-ATT s 2 regions</div><div>s-ATT text 2 regions</div><div>s-ATT text_id 2 regions (with annotations)</div><div><br></div><div><br></div><div>It seems that CWB can recognize the number of words but CQPweb doesn't.</div><div><br></div><div>Regards,</div><div>Lai</div></div><div class="gmail_extra"><br><div class="gmail_quote">2018-06-19 15:43 GMT+08:00 Stefan Evert <span dir="ltr"><<a href="mailto:stefanML@collocations.de" target="_blank">stefanML@collocations.de</a>></span>:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">What does the corpus look like if you decode it from the CWB index with the following command?<br>
<br>
cwb-decode -C CANTON1 -ALL | less<br>
<br>
Can you show us part of the output? It would also be useful to see the output of<br>
<br>
cwb-described-corpus -s CANTON1<br>
<br>
<br>
One possibility I can think of is that your linebreaks are messed up so that CWB treats everything within the text region as a single long line. <br>
<br>
Best,<br>
Stefan<br>
<span class=""><br>
<br>
> On 19 Jun 2018, at 09:26, Hermann Lai <<a href="mailto:halflifelai@gmail.com">halflifelai@gmail.com</a>> wrote:<br>
> <br>
> I am using CQPwebinabox and I have indexed a Traditonal Chinese corpus called "canton1" by using two commands:<br>
> <br>
> sudo cwb-encode -d /usr/local/corpora/data/<wbr>canton1 -f /home/user/Desktop/corpora/<wbr>canton1/canton1.vrt -R /usr/local/share/cwb/registry/<wbr>canton1 -c utf8 -xsB -P pos -P lemma -S s:0 -S text:0+id<br>
> <br>
> sudo cwb-make -V CANTON1<br>
> <br>
> After that, I install the corpus onto CQPweb. Most of the thing are correct. However, the total number of corpus texts is as same as the total words in all corpus texts.<br>
<br>
</span>______________________________<wbr>_________________<br>
CWB mailing list<br>
<a href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><br>
<a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" rel="noreferrer" target="_blank">http://liste.sslmit.unibo.it/<wbr>mailman/listinfo/cwb</a><br>
</blockquote></div><br><br clear="all"><div><br></div>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><blockquote style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex"><i><font face="times new roman, serif" size="4">Gaspard Germannson</font></i></blockquote></div></div></div>
</div>