[CWB] Creating and importing Cyrillic corpus in CQPWeb

Nikolche Mickoski nmickoski at gmail.com
Mon Jan 16 18:34:15 CET 2017


Hello,

 

I’m trying to create a corpus in the CQPWeb for Macedonian language and I
would like to ask for your help.

 

I’ve installed CQPWeb in a box (Esmeralda). I tried to follow CQPweb Admin
Manual, CWB Encoding Tutorial and Martínez tutorial
(http://chozelinek.github.io/sacoco/cqpwebsetup.html) but in vain. I tried
to annotate the corpus with TreeTagger but I failed. I was able to parse
into sentences small texts with MorphAdorner but I still don’t know how I
can use them with CQPWeb. 

 

I obtained MULTEXT-East non-commercial lexicon for Macedonian
(https://www.clarin.si/repository/xmlui/handle/11356/1042) containing over 1
million tagged lemmas. I’ve extracted Macedonian dump file of Wikipedia from
dumps.wikimedia.org with Wikipedia Extractor. I did all the preparatory
work, but I wasn’t able to create the corpus in CQPWeb.

 

After I tried everything I could get my hands on, I decided to write to you
and ask for your help. I really hope that you can spare some time to help me
with this. 

 

Thank you very much,

Nikolche

 

Nikolche Mickoski
Translator/Interpreter
GSM +389 70 357 406
nmickoski at gmail.com

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170116/d30d8825/attachment.html>


More information about the CWB mailing list