[CWB] how to escape special characters in CQPweb

Ray Wu liangpingwu at 126.com
Sat Jun 15 02:48:47 CEST 2013


Thank you, Andrew.  It works.

But aren't (some) CQPweb files processed by the CWB utilities in the background, which, in turn, written in C? OK, I'll make up my lessons. There is still a lot to figure out.


Best,
Ray


At 2013-06-15 03:36:36,"Hardie, Andrew" <a.hardie at lancaster.ac.uk> wrote:
This is not a CQPweb thing but a general XML thing. To escape quote marks within an XML attribute value, you need to use the XML entity &quot;


C escapes won't work at all in XML.


Best


Andrew.



Ray Wu <liangpingwu at 126.com> wrote:



hi all,

I'm preparing a parallel corpus for CQPweb. All things went well until I hit upon the double quotes.

Ok, I have a corpus like this,using \ (as in C) to escape the quotation mark in the translation:
<text id="test">
<s cn="\"亚洲\"的未来">
The    AT
Future    NN1
of    IO
Asia    NP1
</s>
</text>

When concordancing the corpus, I got the following:
The Future of Asia
 \

It seems that everything after the second quotation mark was silently ignored. However, if I change the input like this: <s cn="'非洲'的未来">, I would have
The Future of Asia
'亚洲'的未来

This is better but at the cost of changing the face of the original text. Does anyone know how to properly escape such special characters like quotation marks in CQPweb? Thanks.


Ray


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130615/6d417071/attachment.html>


More information about the CWB mailing list