<div dir="ltr">Dear Stephanie,<div>Thanks for your response and clarification! Best of luck to implement the Ziggurat backend!<br clear="all"><div><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><br></div><div><br></div><div>Best,</div>Austin Yang (楊承洋)<div>MS in Cognitive Neuroscience, NCU<br><div>BS in Psychology, CYCU</div></div></div></div></div><br></div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Mon, Feb 6, 2023 at 6:43 PM Stephanie Evert <<a href="mailto:stefanML@collocations.de">stefanML@collocations.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Dear Austin,<br>
<br>
I think you've been misreading the encoding tutorial, which says that<br>
<br>
The maximum corpus size is 2,147,483,647 tokens (the largest value that can be stored as a signed 32-bit integer). In the CWB source code, this is represented by the macro CL_MAX_CORPUS_SIZE.<br>
<br>
<a href="https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial/B.html" rel="noreferrer" target="_blank">https://cwb.sourceforge.io/files/CWB_Encoding_Tutorial/B.html</a><br>
<br>
So the maximum size is a hard upper limit, and there is no indication here that it would be sensible to modify CL_MAX_CORPUS_SIZE in the source code.<br>
<br>
Such limitations will be lifted by the new Ziggurat backend, once we finally get round to implementing it. Things are progressing, though, so I'm inclined to say “stay tuned”.<br>
<br>
Best,<br>
Stephanie<br>
<br>
<br>
> On 6 Feb 2023, at 09:53, Austin Yang <<a href="mailto:austin.yang.2014@gmail.com" target="_blank">austin.yang.2014@gmail.com</a>> wrote:<br>
> <br>
> Dear all,<br>
> I'm trying to encode a corpus size over 2GiB. The CWB encoding tutorial noted that it is possible by changing the CL_MAX_CORPUS_SIZE from CWB source code. I modified the parameter (CL_MAX_CORPUS_SIZE) from the cl.h file (which I'm not sure if it's the CWB source code mentioned in the tutorial) by 10x, but the CQPweb site still show that the maximum token is 2,147,483,647 tokens. Did I miss something from the tutorial? Any comments will be greatly appreciated! <br>
> <br>
> CWB version 3.5.0<br>
> <br>
> <br>
> Best,<br>
> Austin Yang (楊承洋)<br>
> MS in Cognitive Neuroscience, NCU<br>
> BS in Psychology, CYCU<br>
> _______________________________________________<br>
> CWB mailing list<br>
> <a href="mailto:CWB@sslmit.unibo.it" target="_blank">CWB@sslmit.unibo.it</a><br>
> <a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" rel="noreferrer" target="_blank">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a><br>
<br>
_______________________________________________<br>
CWB mailing list<br>
<a href="mailto:CWB@sslmit.unibo.it" target="_blank">CWB@sslmit.unibo.it</a><br>
<a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb" rel="noreferrer" target="_blank">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a><br>
</blockquote></div>