[CWB] Problem with indexing corpora

Stefan Evert stefanML at collocations.de
Sun Jan 31 15:19:47 CET 2010


Hi Benoit,

there is a simple reason for your problem: you specified a _relative_  
data path for the cwb-encode program (with "-d .", i.e. the current  
directory).  This relative path is embedded in the automatically  
generated registry entry (~/corpora/registry/ep), so whenever you run  
a CWB program (cqp, cwb-describe-corpus, etc.) on the EP corpus, it  
expects to find the data files in _its own_ working directory.  This  
is fine in your first session, but when you start a new terminal  
session, the working directory reverts to your home directory.

Registry files should always include absolute paths, e.g. "/Users/ 
benoit/corpora/ep".  You can either pass this absolute path to cwb- 
encode (taking a guess, you might want to use "-d ~/corpora/ep"  
instead of "-d ."; or simply "-d `pwd`" if you're in the right working  
directory); or you can edit the registry file after encoding in order  
to replace

	HOME .

by the appropriate absolute path, e.g.

	HOME /Users/benoit/corpora/ep

Note that the "~/corpora/..." notation only works in the shell (as an  
argument to cwb-encode), but not in the registry entry.

Best wishes,
Stefan

On 29 Jan 2010, at 16:02, Benoit Crabbé wrote:

> Hi all,
>
> I am a casual user of CQP, and it turns out I have the following  
> problem with corpus indexing
>
> The indexing process works fine :
>
> 1)
> cwb-encode -d . -f EP.ims -R ~/corpora/registry/ep -P lemma -P pos - 
> S text -S corpus -S s
>
> 2)
> cwb-makeall -r ~/corpora/registry EP
>
> 3)
> cqp -r ~/corpora/registry
>
> I can access the indexed corpora under its name within cqp and  
> perform common queries.
> so far so good.
>
> however, as soon as I use another terminal or relaunch another  
> session on my computer, it happens that :
>
> 4) cqp -r ~/corpora/registry
>
> fails to provide acces to the encoded corpus while trying to access  
> it with:
>
> indeed the show corpora command, does not show the previously  
> indexed corpus anymore;
> the registry directory on my filesystem however still contains an  
> entry for this corpora.
>
> This problem is recurrent and I can reproduce it on other machines.
> Can anyone point me my mistake in the installation process ?



More information about the CWB mailing list