[CWB] Maximum corpus size exceeded

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Mar 30 11:11:26 CEST 2017


And our Ziggurat project is designed to address – among other things - precisely this limitation.

Read all about it: http://cwb.sourceforge.net/cwb4.php

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Vladimír Benko
Sent: 30 March 2017 09:49
To: ssadowsky at gmail.com
Cc: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Maximum corpus size exceeded

Dear Scott,

Yes, this is a documented limitation of the CWB software.  One of the options for larger corpora is a system called NoSketch Engine, which is an open-source subset of the commercial Sketch Engine.  The largest corpus we have in our installation of NoSkE is the Russian 13.7 billion Araneum Russicum Maximum.  You may want to try how the system feels here:

http://unesco.uniba.sk/guest/index.html

The software itself can be downloaded here:

https://nlp.fi.muni.cz/trac/noske

Best,

Vlado B, 10:45
Hi all,

I just got this warning for the first time:

WARNING: Maximal corpus size has been exceeded.
         Input truncated to the first 2147483647 tokens (file /home/homebox/Corpora/source-files//input.vrt, line #3161375683).
Warning: missing </s> tag inserted at end of input.

Is there any way around this, by chance? That's 2^31, just a bit shy of 32 bits, but I'm on a 64 bit system with ext4 filesystems, so I assume the issue is CQB related.

Cheers!
Scott




_______________________________________________

CWB mailing list

CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>

http://liste.sslmit.unibo.it/mailman/listinfo/cwb



--
Vladimír Benko

Université Comenius de Bratislava
Chaire UNESCO de communication
plurilingue et multiculturelle

Šafárikovo námestie 6, SK-81499 Bratislava

http://unesco.uniba.sk/guest/
https://www.facebook.com/araneawebcorpora/
https://vk.com/araneawebcorpora
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170330/91f33541/attachment-0001.html>


More information about the CWB mailing list