[CWB] Re: Corpus size and filtering

Rui Chaves rpchaves at gmail.com
Thu Apr 10 16:02:30 CEST 2008


Hi,
thanks for all your precious help.

The BNCweb approach that you mention (constructing a SQL query from
the metadata restrictions, retrieving a list of matching texts from
its MySQL database, and then running the CQP query on the
corresponding  subcorpus) is exactly what we intend to do. We would
love to try this using perhaps the BNCweb (QCP edition), but it is not
clear to us how BNCweb (QCP edition) can be obtained and what issues
are raised when using a corpus other than the BNC. Is there a manual
online that we can consult?

There is a Corpus Encoding Tutorial for CWB available online at
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CWBTutorial/cwb-tutorial.pdf
but it seems to be a draft from 2002. Is there a more recent version
of this? It would be great if these were part of the sourceforge
package, along with the code.

Also, what sort of computer -- with regard to hardware -- would it be
advisable to run the above web interface for a corpus with 400M?


Many thanks,
Rui


More information about the CWB mailing list