[CWB] other kind of annotations in cwb corpus

Luigi Talamo luigi.talamo at libero.it
Tue Feb 15 23:03:22 CET 2011


Yannick Versley wrote:
> Hi Luigi,
>
> the easiest way to get tokenized and tagged data from raw text would be
> to use an existing toolkit, such as
> TextPro (http://textpro.fbk.eu/) or Tanl (http://medialab.di.unipi.it/wiki/Tanl)

Thank you Yannick and Andrew for your kind answers to my questions.
Now things are clearer: I'll begin using a toolkit and then I process 
data with cwb-encode.

Maybe I'll take the time to write something about it, if it will be of 
any use... :)

By the way, some of the answers were already in the CQP tutorial, in the 
part where GERMAN-LAW is described: it is clearly said that you can have 
other information such alemma and agr, the latter fits perfectly with my 
needs...
Sorry for not RTFM ;), I'll post my advances to the list.

Luigi


More information about the CWB mailing list