[CWB] Escape "<" and ">" symbols

mansur 6688000 at gmail.com
Wed Feb 21 07:20:19 CET 2018


Hello, Ruprecht!

Thank you for advice. Actually, it is Tatar, but you were very close :)

With best wishes,
Mansur

On 20 February 2018 at 21:10, Ruprecht von Waldenfels <
ruprecht.waldenfels at gmx.net> wrote:

> Hi,
> escape them as xml entities (I am assuming you are compiling in XML mode).
>
> < &lt;
> > &gt;
> also
> ' &apos;
> " &quot;
> & &amp;
>
> better still, do this for the text but convert the tagsto a better format,
> i.e,
> n:pl:px3sp:nom
>
> (Is this Kyrgyz or Kazakh or something else?)
> Best,
> Ruprecht
>
>
> Am 20.02.2018 um 17:57 schrieb mansur:
>
> Hello!
>
> Could you explain how to escape "<" and ">" symbols in morphological tags,
> that produces Apertium's analyser? For example:
>
> <s>
> 2008    <num>   2008
> елда    <n><sg><loc>    ел
> нефть   <n><sg><nom>    нефть
> табу    <v><tv><ger><nom>       тап
> эшләре  <n><pl><px3sp><nom>     эш
> өчен    <post>  өчен
> авыл    <n><sg><attr>   авыл
> хуҗалыгы        <n><sg><px3sp><nom>     хуҗалык
> җирләреннән     <n><pl><px3sp><abl>     җир
> якынча  <adv>   якынча
> 500     <num>   500
> гектарда        <n><sg><loc>    гектар
> 950     <num>   950
> җир     <n><sg><attr>   җир
> участогын       <n><sg><px3sp><acc>     участок
> <g/>
> ,       <cm>    ,
> 2009    <num>   2009
> <g/>
> -       <guio>  -
> <g/>
> 2010    <num>   2010
> елларда <n><pl><loc>    ел
> 100     <num>   100
> гектарда        <n><sg><loc>    гектар
> 400     <num>   400
> участокны       <n><sg><acc>    участок
> күчерергә       <v><tv><inf>    күчер
> ...
>
> cwb-encode tries to parse them as structural tags along with <s> and
> <text>.
>
> Thank you!
> Mansur
>
>
>
> _______________________________________________
> CWB mailing listCWB at sslmit.unibo.ithttp://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180221/6a971636/attachment-0001.html>


More information about the CWB mailing list