[CWB] Help with CWB under linux

Gassan Tabajah gtabajah at cs.technion.ac.il
Mon Nov 30 18:44:37 CET 2009


Hi Serge,

My input format looks like this:
<corpus>
<text id="http://www.foo.org/index.html">
<s>
volunteers      NN2     volunteer
work    VVB     work
as      PRP     as
part    NN1     part
of      PRF     of
a       AT0     a
team    NN1     team
and     CJC     and
provide VVB     provide
help    NN1-VVB help
</s>
</text>
</corpus>

I used the following commands under the bin directory: 
$ cwb-encode -d /usr/local/mycorpus -f filename.xml -R
/usr/local/share/cwb/registry/mycorpus -P pos -P lemma -V text -S s -S
corpus
$ cwb-makeall -V MYCORPUS

Then I run cqp -e -> MYCORPUS
When I inter a regular expression like "a.*" I got the following output:
MYCORPUS> "a.*";
        2: teer work    VVB     work <as      PRP     as> part    NN1   part
of
        5:   part of      PRF     of <a       AT0     a> team    NN1    team
and
        7:    a team    NN1     team <and     CJC     and> provide VVB
provide

But when I tried something simple like "a", I got no matches:
MYCORPUS> "a";
0 matches.

I don't exactly understand why I got these results, do you have any Ideas?
What should be the output of 'cwb-decode'? Do you have an example how to use
it?
(BTW, I am using cwb-2.2.b99-RC1 version under Cygwin).


Regards,
Ghassan Tabajah
SoftWare Engineer  - Mila Center
Computer Science  Faculty -Technion 
Room 644, Tel: (829) 3969

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On
Behalf Of Serge HEIDEN
Sent: Monday, November 30, 2009 6:58 PM
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Help with CWB under linux

Dear Ghassan,

From: "Gassan Tabajah" <gtabajah at cs.technion.ac.il>
>> Also I noticed that the following files under "mycorpus" directory:
>> lemma.corpus, pos.corpus, word.corpus includes only <nul>'s (Is that
>> an error !?)

Yes, this is an error.
Try to use the 'cwb-decode' tool to decode your indexes independently
of using them from 'cqp'.
It seems that your 'cwb-encode' or 'cwb-makeall' process had a problem.
Are you sure of your input format ? Have you an exerpt of it ?

Best,
Serge

-- 
Dr. Serge Heiden, slh at ens-lsh.fr, http://textometrie.ens-lsh.fr
ENS-LSH/CNRS - ICAR UMR5191, Institut de Linguistique Française
15, parvis René Descartes 69342 Lyon BP7000 Cedex, tél. +33(0)622003883

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list