[CWB] Creating and importing Cyrillic corpus in CQPWeb
Nikolche Mickoski
nmickoski at gmail.com
Mon Jan 16 18:34:15 CET 2017
Hello,
Im trying to create a corpus in the CQPWeb for Macedonian language and I
would like to ask for your help.
Ive installed CQPWeb in a box (Esmeralda). I tried to follow CQPweb Admin
Manual, CWB Encoding Tutorial and Martínez tutorial
(http://chozelinek.github.io/sacoco/cqpwebsetup.html) but in vain. I tried
to annotate the corpus with TreeTagger but I failed. I was able to parse
into sentences small texts with MorphAdorner but I still dont know how I
can use them with CQPWeb.
I obtained MULTEXT-East non-commercial lexicon for Macedonian
(https://www.clarin.si/repository/xmlui/handle/11356/1042) containing over 1
million tagged lemmas. Ive extracted Macedonian dump file of Wikipedia from
dumps.wikimedia.org with Wikipedia Extractor. I did all the preparatory
work, but I wasnt able to create the corpus in CQPWeb.
After I tried everything I could get my hands on, I decided to write to you
and ask for your help. I really hope that you can spare some time to help me
with this.
Thank you very much,
Nikolche
Nikolche Mickoski
Translator/Interpreter
GSM +389 70 357 406
nmickoski at gmail.com
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170116/d30d8825/attachment.html>
More information about the CWB
mailing list