[CWB] problem at managing corpus metadata

Hardie, Andrew a.hardie at lancaster.ac.uk
Sat Jan 4 17:24:41 CET 2014


Thanks, that's clearer.

1.    This has nothing to do with one field vs. another on the form. This has to do with the fact that the text element is given special treatment in CQPweb, namely (a) it's compulsory, (b) it's used as the unit for metadata and for query restriction (also for subcorpus creation). The text element is the only one that is treated this way. Thus, necessarily, if you want to introduce text metadata via XML, you have to introduce it at the level of the text (or higher) in the XML hierarchy. Since the text is the unit of metadata, nothing can be treated as metadata if it has more than one value within the scope of a single text.

2.    No, that would make no difference at all. It doesn't matter how the corpus has been created, CQPweb cannot currently perform restrictions or handle metadata for any unit other than "text".

3.    Only items of metadata that have been designated as "classifications" are available for use in restricted queries. You designate this on the form for creating metadata (whether from a file or form the XML). There is an entry on each line with a dropdown which has two options: "classification" and "free text". Only the ones specified as "classification" are available in the Restricted Query screen, and only the one selected as the "primary classification" is available in the shortcut dropdown on the Standard Query screen. BUT NOTE that the contents of a "classification field" must be a single "handle" consisting only of alphanumeric characters.
best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Andres Chandia
Sent: 04 January 2014 16:07
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] problem at managing corpus metadata

1. Do you remember that, following your instructions, I put in the first field of s-attributes (XML elements): "text:0+id+name+lang+season" then in the second field I put "s:0+id+type", but this didn't work, so you use first field but what about the next ones, can you use them some way, how?

2. You understood what was my intention when using the second field explained at previous point, can this intention be achieved indexing the coprpus via cwb by command line and then uploaded to the cqp interface through the "Install a corpus pre-indexed in CWB"

3. Once indexed the corpus, at the standard query there is an option that says: "Restriction:" but it shows no option to me, how should this option be available, what do I need to do to put some restrictions in here?

I hope this is clearer than before, if not, I will try it again....

thanks


El Sab, 4 de Enero de 2014, 16:46, Hardie, Andrew escribió:
I'm sorry I don't understand the questions. Can you rephrase/elaborate.
best
Andrew.
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Andres Chandia
Sent: 04 January 2014 15:31
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] problem at managing corpus metadata
Some questions then...

1. if you have to index the metadata as an s-attribute in only one field when you index a corpus, what are the rest of the fields for, how and what for can they be used to?

2. if you index corpus via command line with cwb would the s-attributes be availabe in the way I intended to index them, the way that you say is not available yet by cqp interface?

3. at the standard query there is the restriction option, but this one does not get activated when the corpora is indexed, how shoul I proceed at indexing process to activate this?

Thanks


El Mar, 31 de Diciembre de 2013, 20:50, Hardie, Andrew escribió:

1. Yes, of course, because metadata works at the text level. So if you index text metadata from something on the element, then naturally only 1 element per text will actually have any effect. There is a warning in the interface to this effect:
The following XML annotations are indexed in the corpus.
Select the ones which you wish to use as text-metadata fields.
Note: you must only select annotations that occur at or above
the level of in the XML hierarchy of your corpus
What you seem to actually want is to be able to restrict your queries to particular elements depending on their attributes. CQPweb can't do this. Queries can only be restricted to particular *texts*, not to sub-parts of texts. XML Restricted Queries is a much-requested feature and one I hope to be able to implement once the database reorganisation in v3.1 is done. But it can't be done now.

2. Either switch your checkout over from the trunk to the URL of the 3.0 branch, or just manually copy the code available in the download tarball.
best
Andrew.
From: Andres Chandia [mailto:andres at chandia.net]
Sent: 31
December 2013 19:32
To: Hardie, Andrew
Cc: Open source development of the Corpus WorkBench
Subject: RE: [CWB] problem at managing corpus metadata
Thanks, I turned back to the previous one that I had CQPweb v3.0.7 © 2008-2012 and all went well
except for this:

1. I have introduced at the second line s-attributes this way: s:0+id+type, then at the restricted query for the s_id it only appears the data for S1 and S3, but not for S2 and S4, if you see the corpus text_1 owns S1 and S2, text_2 owns S3 and S4, so only appears the first S of each text


And I take the opportunity to ask you how do I upgrade with svn to the version you are recommending


Thanks, and If you don't answer right now I would understand it, Have a Great New Year's Eve!!!



_______________________
andrés chandía
[IMAGE REMOVED]<http://www.chandia.net>
administrador de
parles.upf.edu<http://parles.upf.edu>
psicoaching.net<http://psicoaching.net>
mapuche koyaktu<http://koyaktumapuche.net>
ong mapuche koyaktu<http://corporacionkoyaktu.net>
P No imprima innecesariamente. ¡Cuide el medio ambiente!



_______________________
            andrés chandía
[Image removed by sender. chandia.net]<http://www.chandia.net>[Image removed by sender.]<https://twitter.com/andreschandia>
administrador de
parles.upf.edu<http://parles.upf.edu>
psicoaching.net<http://psicoaching.net>
mapuche koyaktu<http://koyaktumapuche.net>
ong mapuche koyaktu<http://corporacionkoyaktu.net>
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20140104/dc5805a0/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: ~WRD071.jpg
Type: image/jpeg
Size: 823 bytes
Desc: ~WRD071.jpg
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20140104/dc5805a0/attachment-0001.jpg>


More information about the CWB mailing list