[CWB] Performance of "expand to"

Stefan Evert stefanML at collocations.de
Wed Feb 15 16:01:31 CET 2012


> But how exactly do I find $TEXT_START_CPOS and $TEXT_END_CPOS in CQP (assuming the only thing I know is the text_id)?

The easiest and quickest solution (unless you care to store a list of all text regions in a MySQL table) is to use CQP:

	Text = <text id="..."> [] expand to text;
	dump Text;

which will print

	<start> TAB <end> TAB -1 TAB -1

You can easily get the full text of the document from CQP:

	tabulate Text match .. matchend word;

However, if you want to have fancier formatting (<s> ... </s> tags, paragraph breaks, show token-level annotations, ...) you'll have to implement this yourself.  E.g. you can obtain POS tags for all tokens with

	tabulate Text match .. matchend pos;

but then you have to combine the two separate lists for word forms and POS tags.

Cheers,
Stefan


PS: Compiling CWB in Mac OS X Lion is (almost) a piece of cake, using HomeBrew for the required support libraries. Instructions and updated config settings to follow soon.


More information about the CWB mailing list