[CWB] encoding a corpus of Persian texts

Hardie, Andrew a.hardie at lancaster.ac.uk
Tue Aug 28 11:49:40 CEST 2012


RE

>>> I tried to encode a Persian text with the instructions in CWB_Encoding_Tutorial.pdf. 
>>> so I changed the file name extension to .vrt and .xml (extensions of the examples in tutorials).

The file extension is immaterial, .vrt is just a convention, no more. 
What matters is the internal format of the file.

RE

>>> Now the problem is this "Persian characters are not correctly displayed 
>>> and you can't type Persain and/or Arabic characters in cmd command promt. 
>>> Is there a solution for this?

This is a limitation of cmd terminal, I would imagine. Some users have had success 
at fixing this using the steps laid out here:

http://cwb.sourceforge.net/faq.php?hoist=windows_terminal#windows_terminal

Good luck!

best

Andrew.



-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of S Mollaei
Sent: 28 August 2012 10:41
To: cwb
Subject: [CWB] encoding a corpus of Persian texts

I encoded the text, but it wasn't possible to search it (even the pos tags). Then I used the .txt and after encoding I was able to search the corpus in cmd command prompt using CQP.

Now the problem is this "Persian characters are not correctly displayed and you can't type Persain and/or Arabic characters in cmd command promt. Is there a solution for this?
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list