[CWB] Experience encoding FreeLing-tagged corpora?

Scott Sadowsky ssadowsky at gmail.com
Sun Jul 17 11:41:31 CEST 2016


Does anyone have experience encoding FreeLing-tagged corpora? My main
questions are these:

1. FreeLing's plain text vertical output separates sentences with a blank
line, rather than enclosing them in any sort of tag (e.g. <s>...</s>). Can
CWB be configured to recognize this type of sentence encoding?

2. FreeLing's XML output looks a lot more complex than what I see in
tutorials. It has more attributes, which shouldn't be a problem, but it
also encodes each line in XML, as seen below. Can CWB be used with this?

Thanks!
Scott

<text corpus="MY-CORPUS">
<sentence id="1">
  <token id="t1.1" form="La" lemma="el" tag="DA0FS0" ctag="DA"
pos="determiner" type="article" gen="feminine" num="singular" >
  </token>
  <token id="t1.2" form="secretaria" lemma="secretario" tag="NCFS000"
ctag="NC" pos="noun" type="common" gen="feminine" num="singular" >
  </token>
  <token id="t1.3" form="de" lemma="de" tag="SP" ctag="SP" pos="adposition"
type="preposition" >
  </token>
  [...]
</sentence>
<sentence id="2">
  <token id="t2.1" form="La" lemma="el" tag="DA0FS0" ctag="DA"
pos="determiner" type="article" gen="feminine" num="singular" >
  </token>
  <token id="t2.2" form="nueva" lemma="nuevo" tag="AQ0FS00" ctag="AQ"
pos="adjective" type="qualificative" gen="feminine" num="singular" >
  </token>
  <token id="t2.3" form="directora" lemma="director" tag="NCFS000"
ctag="NC" pos="noun" type="common" gen="feminine" num="singular" >
  </token>
</sentence>
</text>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160717/51994199/attachment.html>


More information about the CWB mailing list