[CWB] How to properly encode XML-like tokens?
Hardie, Andrew
a.hardie at lancaster.ac.uk
Fri Jan 13 10:43:38 CET 2012
As standard for XML:
<
best
Andrew.
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Richard Eckart de Castilho
Sent: 13 January 2012 08:29
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] How to properly encode XML-like tokens?
Hello Andrew,
sound like what I need. But how would I escape a literal "<" so it doesn't become a "<" in the index?
Best,
-- Richard
Am 13.01.2012 um 02:03 schrieb Hardie, Andrew:
> Hi Richard,
>
> Yes indeed there is a way to do this: <text> . The entities will be replaced by literal characters in the index iff you use the -x option with cwb-encode.
>
> best
>
> Andrew.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Richard Eckart de Castilho
> Sent: 12 January 2012 23:39
> To: Open source development of the Corpus WorkBench
> Subject: [CWB] How to properly encode XML-like tokens?
>
> Hello,
>
> I would like to know if there is a proper way to encode corpora with arbitrary tokens, in particular such that look like XML.
> For example, if I have a real token <RLS> in my corpus, I messages like these:
>
> s-attribute <RLS> not declared, inserted literally (input line #80094547, warning issued only once).
>
> In this case it is rather an esthetic problem, but I also sometimes have tokens that are equal to structural tags , e.g. <text>.
>
> Is there some way I can escape such XML-like token in the input to cwb-encode, so that such messages are avoided but the tokens are still properly indexed and searchable as "<text>".
>
> Best regards,
>
> -- Richard
>
> --
> -------------------------------------------------------------------
> Richard Eckart de Castilho
> Technical Lead
> Ubiquitous Knowledge Processing Lab (UKP-TUD)
> FB 20 Computer Science Department
> Technische Universität Darmstadt
> Hochschulstr. 10, D-64289 Darmstadt, Germany phone [+49] (0)6151 16-7477, fax -5455, room S2/02/B117 eckartde at tk.informatik.tu-darmstadt.de
> www.ukp.tu-darmstadt.de
> Web Research at TU Darmstadt (WeRC) www.werc.tu-darmstadt.de
> -------------------------------------------------------------------
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list