[CWB] Escape "<" and ">" symbols
mansur
6688000 at gmail.com
Wed Feb 21 07:20:19 CET 2018
Hello, Ruprecht!
Thank you for advice. Actually, it is Tatar, but you were very close :)
With best wishes,
Mansur
On 20 February 2018 at 21:10, Ruprecht von Waldenfels <
ruprecht.waldenfels at gmx.net> wrote:
> Hi,
> escape them as xml entities (I am assuming you are compiling in XML mode).
>
> < <
> > >
> also
> ' '
> " "
> & &
>
> better still, do this for the text but convert the tagsto a better format,
> i.e,
> n:pl:px3sp:nom
>
> (Is this Kyrgyz or Kazakh or something else?)
> Best,
> Ruprecht
>
>
> Am 20.02.2018 um 17:57 schrieb mansur:
>
> Hello!
>
> Could you explain how to escape "<" and ">" symbols in morphological tags,
> that produces Apertium's analyser? For example:
>
> <s>
> 2008 <num> 2008
> елда <n><sg><loc> ел
> нефть <n><sg><nom> нефть
> табу <v><tv><ger><nom> тап
> эшләре <n><pl><px3sp><nom> эш
> өчен <post> өчен
> авыл <n><sg><attr> авыл
> хуҗалыгы <n><sg><px3sp><nom> хуҗалык
> җирләреннән <n><pl><px3sp><abl> җир
> якынча <adv> якынча
> 500 <num> 500
> гектарда <n><sg><loc> гектар
> 950 <num> 950
> җир <n><sg><attr> җир
> участогын <n><sg><px3sp><acc> участок
> <g/>
> , <cm> ,
> 2009 <num> 2009
> <g/>
> - <guio> -
> <g/>
> 2010 <num> 2010
> елларда <n><pl><loc> ел
> 100 <num> 100
> гектарда <n><sg><loc> гектар
> 400 <num> 400
> участокны <n><sg><acc> участок
> күчерергә <v><tv><inf> күчер
> ...
>
> cwb-encode tries to parse them as structural tags along with <s> and
> <text>.
>
> Thank you!
> Mansur
>
>
>
> _______________________________________________
> CWB mailing listCWB at sslmit.unibo.ithttp://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180221/6a971636/attachment-0001.html>
More information about the CWB
mailing list