[CWB] aligned regions: more context and highlighting matched parts
Stefan Evert
stefan.evert at uos.de
Mon Jul 7 14:24:18 CEST 2008
Hi again & sorry about the late reply!
> Adding constraints to the extended context of aligned regions is not
> really necessary. I just need to show additional context and your
> suggestion using the direct access module works fine! it's working
> now in the OPUS interface: http://logos.uio.no/cgi-bin/opus/opuscqp.pl
> (if you go to advanced queries) I could send the little code snippet
> for getting the additonal alignment context if anyone likes to save
> time writing this subroutine (it's very basic, though).
>
> Highlighting would be nice but the solution with intermediate
> corpora sounds quite complicated and I don't have the time to look
> into this right now. Did someone else try to implement this already?
I guess not, or have you got a response from anyone?
> Is it very complicated to change CWB/CQP internals to not throw away
> matching parts in aligned regions? It sounds like it shouldn't be a
> big deal to save these positions somewhere in the data structure
> instead of throwing them away. But, of course, I don't know
> anything about the implementation ....
Yes, I'm afraid it would be fairly complicated to make this change in
CQP. Not so much because of the data structures and code that would be
needed (although this behaviour should be optional because it might
waste a lot of RAM for large queries), but because the internals of
CQP are so terribly messy.
The same holds for all the other extensions & improvements that CQP
users (including myself) are impatiently waiting for: handling
gigaword corpora on 32bit platforms, Unicode support, etc. For a while
I believed that some of these would be relatively easy, but in the
meantime I've realised that all extensions will require a full
revision of the CWB source code.
If a few of us could spare some time and work together on this, it
would probably be easier to reimplement CQP with a better design and
proper encapsulation. I'm sure that such a rewrite would allow others
to add individual feature or make small improvements without
understanding the full complexity of the source code. Guess we should
have a kind of summer workshop ...
Best wishes,
Stefan
More information about the CWB
mailing list