[CWB] problem at managing corpus metadata

Andres Chandia andres at chandia.net
Sat Jan 4 17:58:03 CET 2014



1. Ok, just to be clear:
<corpus id="1">
         <text id="1">
               <s
id="1">
                 
bla
                 
bla
              
</s>
          </text>
</corpus>

in this case the higher level is <corpus> or <s>, I think is <corpus> so I can only introduce text metadata at <text> or <corpus> levels, if so,
what could I include at <corpus> level and how?

2.
ok

3. when indexing I did declare at "Create metadata table from corpus XML
annotations" the metadata as classification giving one of them the status of primary, but
anyway I don't have the restrictions available at the standar query even when they do are
present at "restricted query"


thanks again

El Sab, 4 de
Enero de 2014, 17:24, Hardie, Andrew escribió:
 <style type="text/css">-></style>


Thanks,
that’s clearer.
 1.   
This
has nothing to do with one field vs. another on the form. This has to do with the fact that
the text element is given special treatment in CQPweb,  namely (a) it’s compulsory, (b)
it’s used as the unit for metadata and for query restriction (also for subcorpus
creation). The text element is the only one that is treated this way. Thus, necessarily, if
you want to introduce text metadata via XML, you have  to introduce it at the level of the
text (or higher) in the XML hierarchy. Since the text is the unit of metadata, nothing can be
treated as metadata if it has more than one value within the scope of a single
text.
 2.   
No,
that would make no difference at all. It doesn’t matter how the corpus has been created,
CQPweb cannot currently perform restrictions or handle metadata for any unit other
than “text”.
 3.   
Only
items of metadata that have been designated as “classifications” are available for
use in restricted queries. You designate this on the form  for creating metadata (whether from
a file or form the XML). There is an entry on each line with a dropdown which has two options:
“classification” and “free text”. Only the ones specified as
“classification” are available in the Restricted Query screen, and  only the one
selected as the “primary classification” is available in the shortcut dropdown on
the Standard Query screen. BUT NOTE that the contents of a “classification field”
must be a single “handle” consisting only of alphanumeric characters.
best


Andrew.


From:
cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of
Andres Chandia
 Sent: 04 January 2014 16:07

To: Open source development of the Corpus WorkBench

Subject: Re: [CWB] problem at managing corpus metadata
 
1. Do you remember that, following your instructions, I put in the first
field of s-attributes (XML elements): "text:0+id+name+lang+season" then in the
second field I put "s:0+id+type", but this didn't work, so you use first field but
what  about the next ones, can you use them some way, how?
 
 2. You understood what
was my intention when using the second field explained at previous point, can this intention
be achieved indexing the coprpus via cwb by command line and then uploaded to the cqp
interface through the "Install a corpus pre-indexed in  CWB"
 
 3. Once
indexed the corpus, at the standard query there is an option that says:
"Restriction:" but it shows no option to me, how should this option be available,
what do I need to do to put some restrictions in here?
 
 I hope this is clearer
than before, if not, I will try it again....
 
 thanks
 
 
 El Sab, 4
de Enero de 2014, 16:46, Hardie, Andrew escribió:


I’m
sorry I don’t understand the questions. Can you rephrase/elaborate.
best
Andrew.
From: 
cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of
Andres Chandia
 Sent: 04 January
2014 15:31
 To: Open source
development of the Corpus WorkBench
 Subject: Re:
[CWB] problem at managing corpus metadata
Some questions
then...
 
 1. if you have to index the metadata as an s-attribute in only one field
when you index a corpus, what are the rest of the fields for, how and what for can they be
used to?
 
 2. if you index corpus via command line with cwb would the s-attributes
be availabe in the way I intended to index them, the way that you say is not available yet by
cqp interface?
 
 3. at the standard query there is the restriction option, but this
one does not get activated when the corpora is indexed, how shoul I proceed at indexing
process to activate this?
 
 Thanks
 
 
 El Mar, 31 de Diciembre de
2013, 20:50, Hardie, Andrew escribió:


1. Yes,
of course, because metadata works at the text level. So if you index text metadata from
something on the element, then naturally only 1 element per text will actually have any
effect. There is a warning in the interface to this effect:
 The
following XML annotations are indexed in the corpus.
 Select
the ones which you wish to use as text-metadata
fields.
 Note:
you must only select annotations that occur at or
above
 the
level of in the XML hierarchy of your corpus
What
you seem to actually want is to be able to restrict your queries to particular elements
depending  on their attributes. CQPweb can’t do this. Queries can only be restricted to
particular *texts*, not to
sub-parts of texts. XML Restricted Queries is a much-requested feature and one  I hope to be
able to implement once the database reorganisation in v3.1 is done. But it can’t be done
now.
2. Either
switch your checkout over from the trunk to the URL of the 3.0 branch, or just manually copy
the code available in the download tarball.
best
Andrew.
From: 
Andres Chandia [mailto:andres at chandia.net] 
 Sent: 31
December 2013 19:32
 To: Hardie,
Andrew
 Cc: Open source
development of the Corpus WorkBench
 Subject: RE:
[CWB] problem at managing corpus metadata
Thanks, I turned
back to the previous one that I had CQPweb v3.0.7 © 2008-2012 and all went well

except for this:
 
 1. I have introduced at the second line s-attributes this way:
s:0+id+type, then at the restricted query for the s_id it only appears the data for S1 and S3,
but not for S2 and S4, if you see the corpus text_1 owns S1 and S2, text_2 owns S3 and S4, so
only  appears the first S of each text
 
 
 And I take the opportunity to ask
you how do I upgrade with svn to the version you are recommending 
 
 
 Thanks,
and If you don't answer right now I would understand it, Have a Great New Year's
Eve!!!



 


 _______________________
 andrés chandía
 [IMAGE
REMOVED]
 administrador de
 parles.upf.edu
 psicoaching.net
 mapuche
koyaktu
 ong mapuche koyaktu
 P No imprima innecesariamente. ¡Cuide el medio
ambiente!



 
 
 _______________________
            
andrés chandía
 
 administrador de
 parles.upf.edu
 psicoaching.net

mapuche koyaktu
 ong mapuche
koyaktu
 P
No imprima innecesariamente. ¡Cuide
el medio ambiente!


 


_______________________
            andrés
chandía

administrador de
parles.upf.edu
psicoaching.net
mapuche koyaktu
ong mapuche koyaktu
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20140104/ccfb3504/attachment.html>


More information about the CWB mailing list