[CWB] how to escape special characters in CQPweb

Hardie, Andrew a.hardie at lancaster.ac.uk
Sun Jun 16 14:26:58 CEST 2013


Yes, true. But just because a program is written in C, does not imply it accepts C escapes in its input.

Best

Andrew.



Ray Wu <liangpingwu at 126.com> wrote:


Thank you, Andrew.  It works.

But aren't (some) CQPweb files processed by the CWB utilities in the background, which, in turn, written in C? OK, I'll make up my lessons. There is still a lot to figure out.

Best,
Ray

At 2013-06-15 03:36:36,"Hardie, Andrew" <a.hardie at lancaster.ac.uk> wrote:
This is not a CQPweb thing but a general XML thing. To escape quote marks within an XML attribute value, you need to use the XML entity &quot;

C escapes won't work at all in XML.

Best

Andrew.



Ray Wu <liangpingwu at 126.com<mailto:liangpingwu at 126.com>> wrote:


hi all,

I'm preparing a parallel corpus for CQPweb. All things went well until I hit upon the double quotes.

Ok, I have a corpus like this,using \ (as in C) to escape the quotation mark in the translation:
<text id="test">
<s cn="\"亚洲\"的未来">
The    AT
Future    NN1
of    IO
Asia    NP1
</s>
</text>

When concordancing the corpus, I got the following:
The<http://124.193.83.252/cqp/paratest/context.php?batch=0&qname=e5o3dvdevx&uT=y> Future of Asia
 \

It seems that everything after the second quotation mark was silently ignored. However, if I change the input like this: <s cn="'非洲'的未来">, I would have
The<http://124.193.83.252/cqp/paratest/context.php?batch=0&qname=e5o3dvdevx&uT=y> Future of Asia
'亚洲'的未来

This is better but at the cost of changing the face of the original text. Does anyone know how to properly escape such special characters like quotation marks in CQPweb? Thanks.

Ray




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130616/17fcb359/attachment.html>


More information about the CWB mailing list