[CWB] aligned regions: more context and highlighting matched parts

Joerg Tiedemann j.tiedemann at rug.nl
Fri May 16 11:40:50 CEST 2008


Thanks Stefan for your reply.

Adding constraints to the extended context of aligned regions is not 
really necessary. I just need to show additional context and your 
suggestion using the direct access module works fine! it's working now 
in the OPUS interface: http://logos.uio.no/cgi-bin/opus/opuscqp.pl
(if you go to advanced queries) I could send the little code snippet for 
getting the additonal alignment context if anyone likes to save time 
writing this subroutine (it's very basic, though).

Highlighting would be nice but the solution with intermediate corpora 
sounds quite complicated and I don't have the time to look into this 
right now. Did someone else try to implement this already?

Is it very complicated to change CWB/CQP internals to not throw away 
matching parts in aligned regions? It sounds like it shouldn't be a big 
deal to save these positions somewhere in the data structure instead of 
throwing them away. But, of course,  I don't know anything about the 
implementation ....

cheers,

jorg



Stefan Evert wrote:
> Hi!
> 
> I'm afraid I can only offer negative answers ... :o(
> 
>> I use cwb for searching through sentence aligned parallel corpora 
>> (using the perl/cgi interface). is there an easy way to increase the 
>> context of aligned regions? For example, I would like to add another 
>> sentence before and/or after the aligned region?
> 
> Do you want to increase the context displayed by the "cat" command, or 
> increase the context for an alignment constraint in the query?
> 
> Neither is possible in CQP, but the first requirement -- extended 
> context display -- can easily be simulated with a Perl script (if you're 
> running this through a Web interface anyway): just get the query 
> matches, map them to the aligned regions, and then do the standard 
> context calculations (as e.g. in the CQPdemo Web interface) in the 
> aligned corpus.
> 
> I don't see an easy way to modify the context for alignment constraints 
> in CQP queries. You might be able to simulate this in Perl, but this 
> will require a substantial amount of extra work:
> 
> 1) Run query on the source corpus and download corpus positions of 
> matches into Perl.
> 2) "Map" the matches to aligned regions in the target corpus, using 
> direct access to the corpus annotation with CL.pm (I've long had plans 
> to implement a similar "mapping" feature in CQP, but never got round to 
> it, sadly).
> 3) Extend the aligned regions as necessary in the target corpus (using 
> the CQPdemo code).
> 4) Upload the resulting corpus positions into CQP as a subcorpus ("named 
> query result" in the new terminology).
> 5) Activate this subcorpus and run a query consisting of the alignment 
> constraint (or filter regions matching the query with "!").
> 6) Download corpus positions for the alignment constraint queries and 
> match them against the list from step 3); this allows you to remove the 
> items filtered out by the alignment constraint as well as the 
> corresponding items among the original corpus positions from step 1)
> 7) Upload the filtered list from step 1) back into CQP.
> 
> Et voila!
> 
>> Also, I would like to highlight matching parts in aligned regions. 
>> However, there does not seem to be markup for splitting aligned 
>> regions into before/after-context and matched part. this is pf course 
>> only if additional search constraints are defined for the aligned 
>> corpora.
> 
> Unfortunately, CQP doesn't remember where exactly the alignment 
> constraint matched -- these corpus positions are immediately thrown away 
> when the constraint has been evaluated.
> 
> If you use the ugly Perl workaround above, it's possible to keep track 
> of the matches of the alignment constraint (in step 5/6), so that you 
> can highlight them in the output of the Web interface.
> 
> 
> The CWB is clearly showing its limitations here, isn't it?
> 
> 
> Best,
> Stefan
> 
> 
> [ stefan.evert at uos.de | http://purl.org/stefan.evert ]
> 
> 
> 


-- 

Jörg


***********/\/\/\/\/\/\/\/\/\/\/\************************************
**  Jörg Tiedemann                 j.tiedemann at rug.nl              **
**  Alfa-Informatica               http://www.let.rug.nl/~tiedeman **
**  Rijksuniversiteit Groningen    Harmoniegebouw, room 1311-429   **
**  Postbus 716                    phone: +31 (0)50-363 5935       **
**  9700 AS Groningen              fax:   +31 (0)50-363 6855       **
*************************************/\/\/\/\/\/\/\/\/\/\/\**********


More information about the CWB mailing list