[CWB] CWB-Encode and tokenization

Alberto Simões ambs at di.uminho.pt
Mon Sep 20 12:55:57 CEST 2010


Hello

So far, I used all my CWB input files in a tokenized form (one token per
line). Are there other formats that can be used, for example, making the
tokenization a task of CWB?

I am just asking because I am starting on the creation of a script to
encode directly a TMX file, but I would love if I could not deal with
tokenization :)

At the moment I may just split by space characters and pray :)

Thanks
Alberto
-- 
Alberto Simões


More information about the CWB mailing list