[CWB] aligned regions: more context and highlighting matched parts

Stefan Evert stefan.evert at uos.de
Mon Jul 7 14:24:18 CEST 2008


Hi again & sorry about the late reply!

> Adding constraints to the extended context of aligned regions is not  
> really necessary. I just need to show additional context and your  
> suggestion using the direct access module works fine! it's working  
> now in the OPUS interface: http://logos.uio.no/cgi-bin/opus/opuscqp.pl
> (if you go to advanced queries) I could send the little code snippet  
> for getting the additonal alignment context if anyone likes to save  
> time writing this subroutine (it's very basic, though).
>
> Highlighting would be nice but the solution with intermediate  
> corpora sounds quite complicated and I don't have the time to look  
> into this right now. Did someone else try to implement this already?

I guess not, or have you got a response from anyone?

> Is it very complicated to change CWB/CQP internals to not throw away  
> matching parts in aligned regions? It sounds like it shouldn't be a  
> big deal to save these positions somewhere in the data structure  
> instead of throwing them away. But, of course,  I don't know  
> anything about the implementation ....

Yes, I'm afraid it would be fairly complicated to make this change in  
CQP. Not so much because of the data structures and code that would be  
needed (although this behaviour should be optional because it might  
waste  a lot of RAM for large queries), but because the internals of  
CQP are so terribly messy.

The same holds for all the other extensions & improvements that CQP  
users (including myself) are impatiently waiting for: handling  
gigaword corpora on 32bit platforms, Unicode support, etc. For a while  
I believed that some of these would be relatively easy, but in the  
meantime I've realised that all extensions will require a full  
revision of the CWB source code.

If a few of us could spare some time and work together on this, it  
would probably be easier to reimplement CQP with a better design and  
proper encapsulation. I'm sure that such a rewrite would allow others  
to add individual feature or make small improvements without  
understanding the full complexity of the source code. Guess we should  
have a kind of summer workshop ...

Best wishes,
Stefan 


More information about the CWB mailing list