[CWB] aligned regions: more context and highlighting matched parts

Tue May 13 00:17:01 CEST 2008

Hi!

I'm afraid I can only offer negative answers ... :o(

> I use cwb for searching through sentence aligned parallel corpora  
> (using the perl/cgi interface). is there an easy way to increase  
> the context of aligned regions? For example, I would like to add  
> another sentence before and/or after the aligned region?

Do you want to increase the context displayed by the "cat" command,  
or increase the context for an alignment constraint in the query?

Neither is possible in CQP, but the first requirement -- extended  
context display -- can easily be simulated with a Perl script (if  
you're running this through a Web interface anyway): just get the  
query matches, map them to the aligned regions, and then do the  
standard context calculations (as e.g. in the CQPdemo Web interface)  
in the aligned corpus.

I don't see an easy way to modify the context for alignment  
constraints in CQP queries. You might be able to simulate this in  
Perl, but this will require a substantial amount of extra work:

1) Run query on the source corpus and download corpus positions of  
matches into Perl.
2) "Map" the matches to aligned regions in the target corpus, using  
direct access to the corpus annotation with CL.pm (I've long had  
plans to implement a similar "mapping" feature in CQP, but never got  
round to it, sadly).
3) Extend the aligned regions as necessary in the target corpus  
(using the CQPdemo code).
4) Upload the resulting corpus positions into CQP as a subcorpus  
("named query result" in the new terminology).
5) Activate this subcorpus and run a query consisting of the  
alignment constraint (or filter regions matching the query with "!").
6) Download corpus positions for the alignment constraint queries and  
match them against the list from step 3); this allows you to remove  
the items filtered out by the alignment constraint as well as the  
corresponding items among the original corpus positions from step 1)
7) Upload the filtered list from step 1) back into CQP.

Et voila!

> Also, I would like to highlight matching parts in aligned regions.  
> However, there does not seem to be markup for splitting aligned  
> regions into before/after-context and matched part. this is pf  
> course only if additional search constraints are defined for the  
> aligned corpora.

Unfortunately, CQP doesn't remember where exactly the alignment  
constraint matched -- these corpus positions are immediately thrown  
away when the constraint has been evaluated.

If you use the ugly Perl workaround above, it's possible to keep  
track of the matches of the alignment constraint (in step 5/6), so  
that you can highlight them in the output of the Web interface.

The CWB is clearly showing its limitations here, isn't it?

Best,
Stefan

[ stefan.evert at uos.de | http://purl.org/stefan.evert ]