[CWB] How to properly encode XML-like tokens?

Richard Eckart de Castilho eckartde at tk.informatik.tu-darmstadt.de
Fri Jan 13 00:39:15 CET 2012


Hello,

I would like to know if there is a proper way to encode corpora with arbitrary tokens, in particular such that look like XML.
For example, if I have a real token <RLS> in my corpus, I messages like these:

	s-attribute <RLS> not declared, inserted literally (input line #80094547, warning issued only once).

In this case it is rather an esthetic problem, but I also sometimes have tokens that are equal to structural tags , e.g. <text>.

Is there some way I can escape such XML-like token in the input to cwb-encode, so that such messages are avoided but the tokens are still
properly indexed and searchable as "<text>".

Best regards,

-- Richard

-- 
------------------------------------------------------------------- 
Richard Eckart de Castilho
Technical Lead
Ubiquitous Knowledge Processing Lab (UKP-TUD) 
FB 20 Computer Science Department      
Technische Universität Darmstadt 
Hochschulstr. 10, D-64289 Darmstadt, Germany 
phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117
eckartde at tk.informatik.tu-darmstadt.de 
www.ukp.tu-darmstadt.de 
Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
------------------------------------------------------------------- 







More information about the CWB mailing list