[CWB] Problem with indexing corpora
Stefan Evert
stefanML at collocations.de
Sun Jan 31 15:19:47 CET 2010
Hi Benoit,
there is a simple reason for your problem: you specified a _relative_
data path for the cwb-encode program (with "-d .", i.e. the current
directory). This relative path is embedded in the automatically
generated registry entry (~/corpora/registry/ep), so whenever you run
a CWB program (cqp, cwb-describe-corpus, etc.) on the EP corpus, it
expects to find the data files in _its own_ working directory. This
is fine in your first session, but when you start a new terminal
session, the working directory reverts to your home directory.
Registry files should always include absolute paths, e.g. "/Users/
benoit/corpora/ep". You can either pass this absolute path to cwb-
encode (taking a guess, you might want to use "-d ~/corpora/ep"
instead of "-d ."; or simply "-d `pwd`" if you're in the right working
directory); or you can edit the registry file after encoding in order
to replace
HOME .
by the appropriate absolute path, e.g.
HOME /Users/benoit/corpora/ep
Note that the "~/corpora/..." notation only works in the shell (as an
argument to cwb-encode), but not in the registry entry.
Best wishes,
Stefan
On 29 Jan 2010, at 16:02, Benoit Crabbé wrote:
> Hi all,
>
> I am a casual user of CQP, and it turns out I have the following
> problem with corpus indexing
>
> The indexing process works fine :
>
> 1)
> cwb-encode -d . -f EP.ims -R ~/corpora/registry/ep -P lemma -P pos -
> S text -S corpus -S s
>
> 2)
> cwb-makeall -r ~/corpora/registry EP
>
> 3)
> cqp -r ~/corpora/registry
>
> I can access the indexed corpora under its name within cqp and
> perform common queries.
> so far so good.
>
> however, as soon as I use another terminal or relaunch another
> session on my computer, it happens that :
>
> 4) cqp -r ~/corpora/registry
>
> fails to provide acces to the encoded corpus while trying to access
> it with:
>
> indeed the show corpora command, does not show the previously
> indexed corpus anymore;
> the registry directory on my filesystem however still contains an
> entry for this corpora.
>
> This problem is recurrent and I can reproduce it on other machines.
> Can anyone point me my mistake in the installation process ?
More information about the CWB
mailing list