[CWB] Performance of "expand to"
Stefan Evert
stefanML at collocations.de
Wed Feb 15 16:01:31 CET 2012
> But how exactly do I find $TEXT_START_CPOS and $TEXT_END_CPOS in CQP (assuming the only thing I know is the text_id)?
The easiest and quickest solution (unless you care to store a list of all text regions in a MySQL table) is to use CQP:
Text = <text id="..."> [] expand to text;
dump Text;
which will print
<start> TAB <end> TAB -1 TAB -1
You can easily get the full text of the document from CQP:
tabulate Text match .. matchend word;
However, if you want to have fancier formatting (<s> ... </s> tags, paragraph breaks, show token-level annotations, ...) you'll have to implement this yourself. E.g. you can obtain POS tags for all tokens with
tabulate Text match .. matchend pos;
but then you have to combine the two separate lists for word forms and POS tags.
Cheers,
Stefan
PS: Compiling CWB in Mac OS X Lion is (almost) a piece of cake, using HomeBrew for the required support libraries. Instructions and updated config settings to follow soon.
More information about the CWB
mailing list