[CWB] AttributeSeparator
Stefan Evert
stefanML at collocations.de
Tue Apr 20 01:28:23 CEST 2010
Thanks for your quick answer, Andrew!
> You're right that there's no way to reset this; there are 4
> hardcoded option sets for formatting concordances (ascii, html, sgml
> and latex modes) but / is the p-attribute separator in all of them.
The only way to change this separator at the moment is to recompile
CQP after modifying the "AttributeSeparator" entry in the
ASCIIPrintDescriptionRecord structure in cqp/ascii-print.c (and
corresponding files for the other print modes).
With a little more work, you could add a user-configurable option for
this attribute separator entry to CQP ...
> Unfortunately the way all this is handled in the code makes it
> rather more tricky to have this as a changeable option, although
> clearly it would make sense. No quick fix, alas!
... the reason we haven't tried this so far is that the entire kwic
formatting code needs to be revised (amongst other things to keep it
from crashing on large contexts) and if parts of the
PrintDescriptionRecord are put under user control, it should be
possible to change all sensible options there.
However, I'd be more in favour of implementing a decent XML (or Perl-
friendly) output mode without configuration options, which could then
easily be transformed into any desired format with external XSLT or
Perl scripts.
Concerning your original problem, if you need e.g. sentence context
with different attribute values for further processing, I've often
found the tabulate command to provide an efficient and reliable
solution (esp. if you're doing some postprocessing anyway).
Something along these lines:
> Result = ...; # this is the original query results
> Context = Result expand to s; # full sentence context including
match (separate left/right contexts are a bit tricky)
> tabulate context match .. matchend word, match .. matchend pos;
The last command returns a line of the form:
word1 word2 ... wordN <TAB> pos1 pos2 ... posN
for every match of the query.
Best wishes,
Stefan
More information about the CWB
mailing list