[CWB] Display options for structural attributes

Yannick Versley yversley at gmail.com
Fri Sep 17 17:21:42 CEST 2010


> I'd like to have id=n connceted to that token.  The information is encoded
> implicitly but to get it I'd have to use "show +doc_id" and "expand to doc"
> for every query.  Or I could add another column "n" to every line before
> encoding but that is what I meant by redundant information.

Oh, I see - you'd probably want to do that with a short script like
the following then:

--- 8< --- xform_matches.py
from CWB.CL import Corpus

crp=Corpus('STORIES')
p_attrs=[crp.attribute(name,'p') for name in ['word','pos','lemma']]
s_attrs=[crp.attribute(name,'s') for name in ['s_id','text_id']]

for l in sys.stdin:
  line=l.strip().split()
  start=int(line[0]) ; end=int(line[1])
  print '<match start="%d" end="%d">'%(start,end)
  for i in xrange(start,end+1):
    result=[]
    for attr in p_attrs: result.append(attr[i])
    for attr in s_attrs:
      struc=attr.cpos2struc(i)
      if not struc:
         result.append('--') # no matchin s-attribute
      else:
         vals=attr[struc]
         if len(vals)>2:
           result.append(vals[2])
         else:
           result.append(str(struc)) # s-attribute with no value ->
just put in the ID of that span
     print '/'.join(result)
  print "</match>"
--- 8< ---
Ok, it's more than half a dozen lines (although I suspect a determined
Perl hacker
would have a terser style than the Python code above) to postprocess
the CQP dump,
but it does mostly what you need and you could make it do exactly what you want
(e.g. include more context, appropriate escaping of attribute values,
convert to JSON,
include the code for actually doing the query instead of reading a
dump etc etc.) without
having to wait for that specific feature to become part of CQP.
This uses the Python version of CWB::CL, which can be checked out from
http://bitbucket.org/yannick/cwb-python. No guarantee for correct
indentation, either.
(I just typed this in so the indentation would look plausible in the
mail program, but
I suspect the indentation is correct).

Best,
Yannick
>
> To cut a long story short: I want S-attribtue information that can be
> accessed
> like P-attribute information ;).
>
> Regards,
>
> Lukas
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>


More information about the CWB mailing list