[CWB] AttributeSeparator

Stefan Evert stefanML at collocations.de
Tue Apr 20 01:28:23 CEST 2010


Thanks for your quick answer, Andrew!

> You're right that there's no way to reset this; there are 4  
> hardcoded option sets for formatting concordances (ascii, html, sgml  
> and latex modes) but / is the p-attribute separator in all of them.

The only way to change this separator at the moment is to recompile  
CQP after modifying the "AttributeSeparator" entry in the  
ASCIIPrintDescriptionRecord structure in cqp/ascii-print.c (and  
corresponding files for the other print modes).

With a little more work, you could add a user-configurable option for  
this attribute separator entry to CQP ...

> Unfortunately the way all this is handled in the code makes it  
> rather more tricky to have this as a changeable option, although  
> clearly it would make sense. No quick fix, alas!

... the reason we haven't tried this so far is that the entire kwic  
formatting code needs to be revised (amongst other things to keep it  
from crashing on large contexts) and if parts of the  
PrintDescriptionRecord are put under user control, it should be  
possible to change all sensible options there.

However, I'd be more in favour of implementing a decent XML (or Perl- 
friendly) output mode without configuration options, which could then  
easily be transformed into any desired format with external XSLT or  
Perl scripts.


Concerning your original problem, if you need e.g. sentence context  
with different attribute values for further processing, I've often  
found the tabulate command to provide an efficient and reliable  
solution (esp. if you're doing some postprocessing anyway).

Something along these lines:

 > Result = ...;  # this is the original query results
 > Context = Result expand to s; # full sentence context including  
match (separate left/right contexts are a bit tricky)
 > tabulate context match .. matchend word, match .. matchend pos;

The last command returns a line of the form:

   word1 word2 ... wordN <TAB> pos1 pos2 ... posN

for every match of the query.

Best wishes,
Stefan



More information about the CWB mailing list