[CWB] Appending text to an existing corpus

Nik cqplist at nikvdp.com
Thu Nov 8 07:00:02 CET 2012


Hi all,
I have a pretty simple question: is there any way to append text to an
existing corpus?

We're working on a corpus based on data collected from a webcrawler and
would like to periodically  update the corpus with new data from the
crawler. From the documentation I found info on how to add annotations to
existing corpora etc., but I can't find anything about simply appending new
data to an existing corpus.

Decoding the entire corpus, adding the new data to the generated file and
re-encoding the new file is an option, but the server we're running on
isn't exactly fast. Any way to save a few CPU cycles and directly insert
the new data into the existing corpus? Perhaps there's some functionality
to combine two corpora into one?

Thanks,
Nik
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121108/cc1842a3/attachment.html>


More information about the CWB mailing list