[CWB] Issues when installing metadata restrictions

Hardie, Andrew a.hardie at lancaster.ac.uk
Wed Apr 6 10:08:48 CEST 2016


The reason those restrictions appear twice is that text metadata and XML metadata are two separate entities (because the notion of “text” is a special one in CQPweb), but “text” is still an XML element. When you create text metadata from the XML, the XML attributes are still there. If you don’t want them to show up in restricted query, switch their datatype back to “free text”.

The error in your fig 5 is because manual installation requires a file containing columnar data, but you appear to have supplied a file containing XML.

I’m sorry that the manual sections on metadata are rather underdeveloped. They will explain all of this. Time, alas, is limited.

best

Andrew.


From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Giorgina Cerutti Benitez
Sent: 06 April 2016 08:55
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Issues when installing metadata restrictions

Hi Andrew,

I have fixed my corpus (figure 1), retried the installation (figures 2 to 4) and now the problem has been solved, so thank you very much. Nonetheless, I was wondering why the system creates two restriction tables (figure 3) since, according to the tests I have performed, both shield the same results when using them to launch restricted queries.

Furthermore, even though we are going to use the metadata embedded in our corpus, I also tried installing metadata manually with this new corpus and I got the same error as before (figure 5) – this is not going to be a problem for us, but I just wanted to report the error in case someone else encounters the same problem.

Regards,

Giorgina


[cid:image001.jpg at 01D18FE3.C28BE030]
Figure 1

[cid:image002.jpg at 01D18FE3.C28BE030]
Figure 2


Figure 3

[cid:image004.jpg at 01D18FE3.C28BE030]
Figure 4


Figure 5





-----Message d'origine-----
De : cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it] De la part de Hardie, Andrew
Envoyé : mardi 5 avril 2016 16:18
À : Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Objet : Re: [CWB] Issues when installing metadata restrictions



Hi Giorgina,



Category codes in classification fields can only contain letters and numbers. I see spaces in some of the values in your XML. EG " Monitoring and application".



Might that explain the problem?



best



Andrew.



-----Original Message-----

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Giorgina Cerutti Benitez

Sent: 05 April 2016 10:39

To: Open source development of the Corpus WorkBench

Subject: Re: [CWB] Issues when installing metadata restrictions



Hi Matt,



Yes, you're right. The thing is that I have also tested with other minimalist corpus that apparently are well built and I still had the same issues.



Thank you again.



Regards,



Giorgina



-----Message d'origine-----

De : cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it] De la part de Timperley, Matt Envoyé : mardi 5 avril 2016 11:14 À : Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>> Objet : Re: [CWB] Issues when installing metadata restrictions



Hi Giorgina,



Sorry if I'm mistaken about your issue but it looks to me like there is an angle bracket missing from the end of the first line. Just after lang="French". I think it should be: lang="French">.



I hope this helps,

Matt

________________________________________

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [cwb-bounces at sslmit.unibo.it] on behalf of Giorgina Cerutti Benitez [Giorgina.Cerutti at unige.ch]

Sent: 05 April 2016 09:49

To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>

Subject: [CWB] Issues when installing metadata restrictions



Hello everyone,



I am writing to you because we are having issues when installing our metadata classifications. We are currently testing metadata installation with corpus T39 (figure 1). Even though we manage to specify our s-attributes (see figure 2), only three of them are recognized as classifications when installing the metadata from the embedded XML (see figure 3); and in other tests none of them is recognized at all (see figure 4).



[cid:image010.jpg at 01D18E88.2B7242E0]

Figure 1:



<text id="test13" period="1" organization="un" category="Monitoring and application" genre="legislative" lang="French"

this

is

a

test

</text>

<text id="test25" period="2" organization="eu" category="Lawmaking" genre="monitoring" lang="Spanish"> this is also a test .

thas

worked

</text>

<text id="test26" period="3" organization="wto" category="Adjudication" genre="adjudication" lang="English"> thas thus shalala muajajaja </text>



[cid:image012.jpg at 01D18E88.2B7242E0]

Figure 2



[cid:image013.jpg at 01D18E88.2B7242E0]

Figure 3



[cid:image015.jpg at 01D18E88.2B7242E0]

Figure 4



We have then tried to install metadata by specifying the desired settings by hand (see figure 5), but we encounter an error (see figure 6).



[cid:image016.jpg at 01D18E88.2B7242E0]

Figure 5



[cid:image020.jpg at 01D18E88.2B7242E0]

Figure 6



The data source you specified for the text metadata contains badly-formatted text ID codes, as follows: <strong> '.'; '</text>'; '<text id="test13" period="1" organization="un" category="Monitoring and application" genre="legislative" lang="French"'; '<text id="test25" period="2" organization="eu" category="Lawmaking" genre="monitoring" lang="Spanish">'; '<text id="test26" period="3" organization="wto" category="Adjudication" genre="adjudication" lang="English">';</strong> (text ids can only contain unaccented letters, numbers, and underscore).



Since we cannot identify the error, we were wondering if any of you has had the same problem (I couldn't find any thread or information in the manual about this). I would also be grateful if you could tell us if this is a bug or if the system only accepts three classifications.



Thank you very much.



Regards,





Giorgina Cerutti

Assistant

Department of Translation - Spanish Unit Faculty of Translation and Interpreting University of Geneva Office 6242 - Uni Mail

40 bd du Pont d'Arve

CH-1211 Genève 4

[cid:image007.png at 01D1127F.0F2785D0]<https://www.linkedin.com/pub/giorgina-cerutti/20/337/7a0/en>[Facebook]<https://www.facebook.com/UNES.FTI.UNIGE>[Twitter]<https://twitter.com/giorginacerutti>[Transius_EN]<http://transius.unige.ch/en/>

_______________________________________________

CWB mailing list

CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>

http://devel.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160406/d2af1e68/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.jpg
Type: image/jpeg
Size: 24420 bytes
Desc: image001.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160406/d2af1e68/attachment-0003.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image002.jpg
Type: image/jpeg
Size: 41193 bytes
Desc: image002.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160406/d2af1e68/attachment-0004.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image004.jpg
Type: image/jpeg
Size: 23579 bytes
Desc: image004.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160406/d2af1e68/attachment-0005.jpg>


More information about the CWB mailing list