[CWB] IMS error - maximum file size exceeded

Stefan Evert stefanML at collocations.de
Fri Nov 5 15:50:01 CET 2010


> I'm using CWB 3.0 x64 on Ubuntu 10.10 x64.
> 
> The corpus contains about 800m tokens, and each of these has an entry for word, syntactic relationship, lemma and POS.
> 
> The file system is NTFS, which should be able to handle files up to about 16 TB, if I'm not mistaken, but the size of the largest files that CWB generates before the error occurs is 2 GB.

This should work in principle, but I haven't tested it on an NTFS file system yet.  I can give it at try over the weekend at home, but I suspect the problem may be elsewhere.

> I'm trying to encode a large corpus with IMS and after a while I get an error saying that the maximum file size has been exceeded and the file can't be written.


Does this happen during the cwb-encode phase?  I'm asking because this program just writes files through the standard stream API and does not try to access them in any special way (which could be a reason for problems otherwise).  If this is the case, then you'll find that other programs won't be able to create large files (> 2 GiB) either on the NTFS volume.

The Linux NTFS driver should include large file support, but perhaps the volume has been mounted with large files disabled?  Or perhaps your account has a file size limit?  Can you show us the output of the following commands, please?

	mount

	ulimit -a

Best,
Stefan



More information about the CWB mailing list