[CWB] Regex/backreferencing

Susanne Flach susanne.flach at fu-berlin.de
Mon Mar 13 10:48:46 CET 2017


Hi Andrew,

Ah, ok, that might be it — we’re using 3.0 at the moment. I’ll test this.

Thanks!
Susanne


> On 13 Mar 2017, at 10:45, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:
> 
> The query
>  
> [word="(.+)-?\1"]
>  
> works as expected for me -- both via the CQPweb interface, and on the command line: IE it finds words consisting of the same element twice, with or without intervening hyphen.
>  
> What version of CWB are you using? If it’s a version that predates the use of PCRE as the regex engine, that could explain this…
>  
> best
>  
> Andrew
>  
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Susanne Flach
> Sent: 13 March 2017 09:10
> To: Open source development of the Corpus WorkBench
> Subject: [CWB] Regex/backreferencing
>  
> Dear CWBists,
>  
> Can you backreference on the token level in CQP?
>  
> This question has been nagging me from time to time; now a student wants to investigate reduplication. For querying across token boundaries, labels sort of do the trick except they don’t seem to ignore case (i.e. a:[] b:[] :: a.word = b.word), but for reduplication within a token the pattern [word="(.+)-?\1”] only returns strings ending in 1.
>  
> Is this possible in CQP? Plus, is there a way to ignore case in labels?
>  
> Any ideas or pointers on (advanced) functions would be much appreciated.
>  
> Best & thanks
> Susanne
>  
>  
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170313/056f8f71/attachment-0001.html>


More information about the CWB mailing list