<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Consolas;
        panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";
        color:black;}
pre
        {mso-style-priority:99;
        mso-style-link:"HTML Preformatted Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:10.0pt;
        font-family:"Courier New";
        color:black;}
span.HTMLPreformattedChar
        {mso-style-name:"HTML Preformatted Char";
        mso-style-priority:99;
        mso-style-link:"HTML Preformatted";
        font-family:Consolas;
        color:black;}
span.EmailStyle20
        {mso-style-type:personal-reply;
        font-family:"Verdana","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body bgcolor="white" lang="EN-GB" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">And our Ziggurat project is designed to address – among other things - precisely this limitation.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><i><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Read all about it:
<a href="http://cwb.sourceforge.net/cwb4.php">http://cwb.sourceforge.net/cwb4.php</a>
</span></i><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">best<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D">Andrew.<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Verdana","sans-serif";color:#1F497D"><o:p> </o:p></span></p>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm">
<p class="MsoNormal"><b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext">From:</span></b><span lang="EN-US" style="font-size:10.0pt;font-family:"Tahoma","sans-serif";color:windowtext"> cwb-bounces@sslmit.unibo.it
[mailto:cwb-bounces@sslmit.unibo.it] <b>On Behalf Of </b>Vladimír Benko<br>
<b>Sent:</b> 30 March 2017 09:49<br>
<b>To:</b> ssadowsky@gmail.com<br>
<b>Cc:</b> Open source development of the Corpus WorkBench<br>
<b>Subject:</b> Re: [CWB] Maximum corpus size exceeded<o:p></o:p></span></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal" style="margin-bottom:12.0pt">Dear Scott,<br>
<br>
Yes, this is a documented limitation of the CWB software. One of the options for larger corpora is a system called NoSketch Engine, which is an open-source subset of the commercial Sketch Engine. The largest corpus we have in our installation of NoSkE is
the Russian 13.7 billion Araneum Russicum Maximum. You may want to try how the system feels here:<br>
<br>
<a href="http://unesco.uniba.sk/guest/index.html">http://unesco.uniba.sk/guest/index.html</a><br>
<br>
The software itself can be downloaded here:<br>
<br>
<a href="https://nlp.fi.muni.cz/trac/noske">https://nlp.fi.muni.cz/trac/noske</a><br>
<br>
Best,<br>
<br>
Vlado B, 10:45<o:p></o:p></p>
</div>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal">Hi all, <o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">I just got this warning for the first time:<o:p></o:p></p>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">WARNING: Maximal corpus size has been exceeded.</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New""> Input truncated to the first 2147483647 tokens (file /home/homebox/Corpora/source-files//input.vrt, line #3161375683).</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><span style="font-family:"Courier New"">Warning: missing </s> tag inserted at end of input.</span><o:p></o:p></p>
</div>
</div>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Is there any way around this, by chance? That's 2^31, just a bit shy of 32 bits, but I'm on a 64 bit system with ext4 filesystems, so I assume the issue is CQB related. <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal">Cheers!<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal">Scott<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><br>
<br>
<br>
<o:p></o:p></p>
<pre>_______________________________________________<o:p></o:p></pre>
<pre>CWB mailing list<o:p></o:p></pre>
<pre><a href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a><o:p></o:p></pre>
<pre><a href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a><o:p></o:p></pre>
</blockquote>
<p class="MsoNormal"><o:p> </o:p></p>
<p><o:p> </o:p></p>
<div>
<p class="MsoNormal">-- <br>
<span style="color:navy">Vladimír Benko</span> <o:p></o:p></p>
<p>Université Comenius de Bratislava<br>
Chaire UNESCO de communication<br>
plurilingue et multiculturelle<o:p></o:p></p>
<p>Šafárikovo námestie 6, SK-81499 Bratislava<o:p></o:p></p>
<p><a href="http://unesco.uniba.sk/guest/">http://unesco.uniba.sk/guest/</a><br>
<a href="https://www.facebook.com/araneawebcorpora/">https://www.facebook.com/araneawebcorpora/</a><br>
<a href="https://vk.com/araneawebcorpora">https://vk.com/araneawebcorpora</a> <o:p>
</o:p></p>
</div>
</div>
</body>
</html>