[CWB] CQPweb - managing metadata

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Jul 26 03:44:22 CEST 2010


Hi Claudia,

 

Apologies for the delay replying; I have been off email. I’ll take your original email and updates in reverse order, and answer queries even if you answered them yourself later – just in case the answers are useful for others.

 

“Corpus ``EXAMPLESIMPLE'' is undefined”

 

An error like this does indeed indicate that the problem was at the corpus-indexing stage.

 

On how to tell whether a corpus has been created or not:

 

“When using the terminal version, cwb-encode and cwb-make create a bunch of additional files....but I am not seeing this happening in the web version - how can I check this?”

 

These files are placed in the registry and data directories that are specified in your configuration file (lib/config.inc.php). The variables are $cwb_registry and $cwb_datadir. The registry files are placed in the former; a directory of actual index files per corpus in the latter. These are outside the normal web directory (/var/www or whatever) so that random browsers don’t have access to the internal workings of your CWB setup! The PHP files created in the web directory are just pointers to different bits of the interface.

 

“mkdir($datadir, 0775); - this is not working for me, and infact my /corpora/data/ is empty, when it should have the folder 'examplesimple' in it”

 

As you later worked out, this is a permissions problem. The setup manual puts it as follows:

The username of the webserver (in the case of Apache, usually something like www or www-data) need to have read-write-execute access to all these directories [that is, the ones you create for CQPweb]. The username of the mysqld process (usually something like mysql) also needs read and write access if you want MySQL to use file-access functions. So, these new folders must have the write permission set for either “all” or – if you are worried about security – for “group” (where the file is assigned to some group that both the mysql server's account and the web server's account belong to).

 

“function create_text_metadata_for_minimalist() contains the call to:     create_text_metadata_check_text_ids($corpus); however $corpus is an empty string and so this was resulting in an sql error”

 

Yes, this is a bug in v2.12; it was fixed in 2.13. The solution was exactly what you thought – change to $corpus_sql_name.

 

“if I try say a word lookup, I get an error:

Error message

Syntax error
Sorry, your simple query ' f* ' contains a syntax error.
Usage: $grammar->SetParam($name, $value) at - line 10

I will try and find out what's causing this. any tips?”

 

It looks as if one of two things is not working with regard to the tertiary annotation and the mapping table. 

 

EITHER your CEQL setup for this corpus is not correct – though it should be if you used default p-attributes. Click on “Manage annotation” in the admin section of the main menu to check this. 

 

OR the mapping tables that are needed for the query don’t exist. In the admin interface under “Misc” click on “mapping tables” and then press the button at the bottom of the screen to regenerate built-in mapping tables.

 

(And more on all of this here: http://cwb.svn.sourceforge.net/viewvc/cwb/gui/cqpweb/trunk/doc/CQPweb-CEQL-manual.html )

 

best

 

Andrew.

 

 

 

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Claudia Borg
Sent: 23 July 2010 18:04
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] CQPweb - managing metadata

 

[second update]

apologies for the several emails. 

I have sorted out the php  mkdir problem by setting chmod 777 to the following directories:

/corpora/data
/corpora/registry
/corpora/system
/corpora/system/access
/corpora/system/temp
/corpora/system/upload

That sorted out the reading and writing to those directories. in fact now I see that there are the corpus files in the directory /corpora/data/examplesimple showing that at least the input file has been processed.

Next problem encountered when creating minimalist metadata:

in admin-lib.inc.php:

line 1454 (approx, since I added some checks):

function create_text_metadata_for_minimalist()

contains the call to:
    create_text_metadata_check_text_ids($corpus);
however $corpus is an empty string and so this was resulting in an sql error.

I changed this to $corpus_sql_name and it seems to have worked. 

I don't know if this is actually a bug or not. But it kind of works so far...still not out of the woods though! the metadata installation seems to be ok, but if I try say a word lookup, I get an error:

Error message

Syntax error
Sorry, your simple query ' f* ' contains a syntax error.
Usage: $grammar->SetParam($name, $value) at - line 10

I will try and find out what's causing this. any tips?

thanks
Claudia




On 23 July 2010 17:22, Claudia Borg <claudiaborg at gmail.com> wrote:

[update]

as a matter of fact, in the running of admin-lib.inc.php around line 380+ there is a piece of code mkdir($datadir, 0775); - this is not working for me, and infact my /corpora/data/ is empty, when it should have the folder 'examplesimple' in it. 

I am running php version 5.3.2-1ubuntu4.2 - I'll try to check why the directory is not being created...

claudia

 

On 23 July 2010 16:35, Claudia Borg <claudiaborg at gmail.com> wrote:

Hi Andrew, Thanks for your reply. Some more questions from my side to help me understand better: 

metadata can be extra information about the corpus (e.g. URL, author) and not linguistic information present in the text, correct? Linguistic info are the p-attributes, and the structure of the text (chapters, ect) are the s-attributes - correct? 

re removing text_id and text_lang - I've done that and now getting a different error :( 

What I did:
placed a new file in the upload section
installed a new corpus using this file leaving default s- and p-attributes
clicked on design and insert a text-metadata table link
left all as is on the form and just clicked the minimalist metadata button at the bottom - the result is this:

Error message

**** CQP ERROR ****
CQP Error:
Corpus ``EXAMPLESIMPLE'' is undefined



I'll try to have a look at the code - I suspect that the problem is actually in the previous step when installing a new corpus - how can I confirm that a corpus has been indeed installed correctly? At this stage all I notice is that in my /var/www/ I have a new folder for the corpus, with some php files inside it - apart from that, I don't see any other changes. When using the terminal version, cwb-encode and cwb-make create a bunch of additional files....but I am not seeing this happening in the web version - how can I check this? ....in the meantime, help is much appreciated!

Claudia




On 21 July 2010 15:44, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:

	Hi Claudia,

	 

	Could you try encoding without the explicit text_id and text_lang elements in your input file? CQPweb assumes that input files will be valid XML, and that s-attributes like text_id and text_lang are to be inferred from the attributes of text. So spelling them out may have caused the problem.The file  ___install_temp_metadata_illum01 should have been created by cwb-s-decode from the text_id s-attribute, so the fact that it was missing suggests that this s-attribute is not available.

	 

	On the more general point about metadata: in this case the “minimalist metadata” is probably what you want so you are going about it the right way. As the manual explains “The metadata file should be a tab-delimited database. The first column should be the text id-codes, with a line for each text. You can then have as many columns of metadata as you need.” If you haven’t got a table of information like this, then the minimalist-metadata generates a dummy table for you. “Entering metadata fields” simply means specifying what the columns in your table of information contain, so is not relevant if you don’t have such a table.

	 

	best

	 

	Andrew.

	 

	From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Claudia Borg
	Sent: 20 July 2010 15:32
	To: CWB mailing list
	Subject: [CWB] CQPweb - managing metadata

	 

	Hi all,
	
	I am trying to install my own corpus though cqpweb - I have a simple vertical text file in the following structure:
	
	<text id="illum01" lang="Maltese">
	<text_id "illum01">
	<text_lang "Maltese">
	<s>
	word1
	word2
	...
	</s>
	</text_lang>
	</text_id>
	</text>
	
	there is no annotation (pos, lemma, ect) so its basically like a word list. The corpus installation process goes well (I used default p-attributes, even if in reality I only have word attribute - in future I will add pos and lemma but for the time being I am just trying to get used to cqpweb), but then I need to install the metadata, and I cannot quite understand what is required here. 
	
	If I try to create a minimalist metadata table without specifying anything in the manage metatdata page, then I get this error:

	A mySQL query did not run successfully!

	Error # 2: 
	File '/home/mlrs/corpora/system/temp/___install_temp_metadata_illum01' not found (Errcode: 2) 

	
	
	from mysql admin, I see that the table text_metadata_for_illum01 has been created but it is empty (no rows).
	
	If I try to enter some metadata fields (which I cannot clearly understand what's meant to be here), then I still get the above error. 
	
	I cannot seem to find anything specific to this problem in the documentation (i.e. explaining what metadata should look like, ect.).  I am mainly following:
	http://cwb.svn.sourceforge.net/viewvc/cwb/gui/cqpweb/trunk/doc/CQPweb-installing-corpora.html
	
	Any pointers would be appreciated.
	
	Regards 
	Claudia

	 

	_______________________________________________
	CWB mailing list
	CWB at sslmit.unibo.it
	http://devel.sslmit.unibo.it/mailman/listinfo/cwb

 

 

 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20100726/9308eb51/attachment-0001.htm


More information about the CWB mailing list