<html><head><style>body{font-family:Helvetica,Arial;font-size:13px}</style></head><body style="word-wrap: break-word; -webkit-nbsp-mode: space; -webkit-line-break: after-white-space;"><div id="bloop_customfont" style="font-family:Helvetica,Arial;font-size:13px; color: rgba(0,0,0,1.0); margin: 0px; line-height: auto;"><div id="bloop_customfont" style="margin: 0px;">Hi every body</div><div id="bloop_customfont" style="margin: 0px;">I try to create a CQP corpus with XML attributes from 618 vrt-files. They look as follows:</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">&lt;text encrypted_msg="0" contains_fra="true" content_msg="1838" user_msg="1873" no_consent_msg="0" consent_speakers="2" lang_100_and_more="fra" speakers="2" empty_msg="0" media_msg="35" system_msg="0" total_msg="1873"&gt;</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">&lt;msg msg_id="162577"&gt;</div><div id="bloop_customfont" style="margin: 0px;">token1 pos1</div><div id="bloop_customfont" style="margin: 0px;">token2 pos2</div><div id="bloop_customfont" style="margin: 0px;">&lt;/msg&gt;</div><div id="bloop_customfont" style="margin: 0px;">&lt;/text&gt;</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">An example with two messages is available here: www.ueberwasser.eu/chat105_original.vrt</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">In this example, the first message contains 3 tokens. If I run the following query, the message is found:</div><div id="bloop_customfont" style="margin: 0px;">&lt;msg&gt;[]*&lt;/msg&gt;:: match.msg_msg_id = “162577"</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">The second message is 95 tokens long. The same query shows no results:</div><div id="bloop_customfont" style="margin: 0px;">&lt;msg&gt;[]*&lt;/msg&gt;:: match.msg_msg_id = "162578"</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">If I remove any 5 tokens from this message, the query is fine for this message, too. Is this a normal behaviour? Is there a limit to the number of tokens within an attribute? I could not find any information in the documentation.</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">Many thanks for any help</div><div id="bloop_customfont" style="margin: 0px;">Simone&nbsp;</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">*********************************</div><div id="bloop_customfont" style="margin: 0px;">Setup:&nbsp;</div><div id="bloop_customfont" style="margin: 0px;">Xubuntu 16.04</div><div id="bloop_customfont" style="margin: 0px;">cwb-3.0.0-linux-x86_64</div><div id="bloop_customfont" style="margin: 0px;">CWB Perl-CWB-3.0</div><div id="bloop_customfont" style="margin: 0px;">CWB-CL Perl-CWB-CL-3.0</div><div id="bloop_customfont" style="margin: 0px;">CWB-Web Perl-CWB-Web-3.0</div><div id="bloop_customfont" style="margin: 0px;">CWB-CQI Perl-CWB-CQI-3.0</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">But I had the same problem with CWB 3.0 and Perl scripts 2.2 on a Mac</div><div id="bloop_customfont" style="margin: 0px;"><br></div><div id="bloop_customfont" style="margin: 0px;">I create the corpus with:</div><div id="bloop_customfont" style="margin: 0px;">sudo -H cwb-encode -c utf8 -x -s -B -d PathToData -f /pathtofile.vrt -R PathToRegisty -P pos -S text:0+contains_fra+no_consent_msg+content_msg+empty_msg+total_msg+speakers+media_msg+system_msg+user_msg+encrypted_msg+consent_speakers+lang_100_and_more+demographics+lang_less_than_100+contains_gsw+contains_eng+contains_spa+contains_deu+contains_ita+contains_sla+contains_roh -S msg:0+msg_id</div></div><br><div class="bloop_sign" id="bloop_sign_1493448526811790080"><span style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.80000114440918px;">===========================================================</span><br style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.80000114440918px;"><a href="http://www.ueberwasser.eu/" target="_blank" style="font-family: arial, sans-serif; font-size: 12.80000114440918px;">www.ueberwasser.eu</a><br style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.80000114440918px;"><span style="color: rgb(34, 34, 34); font-family: arial, sans-serif; font-size: 12.80000114440918px;">===========================================================</span></div></body></html>