[CWB] [ cwb-Bugs-2906451 ] CQPweb: compilation of text-frequency-index CWB corpora

SourceForge.net noreply at sourceforge.net
Tue Dec 1 03:12:44 CET 2009


Bugs item #2906451, was opened at 2009-12-01 02:12
Message generated for change (Tracker Item Submitted) made by andrewhardie
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=2906451&group_id=131809

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CQPweb
Group: None
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Andrew Hardie (andrewhardie)
Assigned to: Andrew Hardie (andrewhardie)
Summary: CQPweb: compilation of text-frequency-index CWB corpora

Initial Comment:
This process seems very prone to either (a) running out of PHP memory or (b) timing out or (c) fillng up the hard disk and then falling over.

A full, proper investigation is needed.

Stefan suggests two improvements:

First, you should use -M switch for cwb-makeall so that it doesn't try to do the entire indexing in memory.  
Second, it would be even better to use the cwb-make script from the CWB/Perl interface (or the corresponding Perl module directly), which minimises disk usage by compressing data files as early as possible.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=2906451&group_id=131809


More information about the CWB mailing list