[CWB] Regex/backreferencing

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Mar 13 10:45:07 CET 2017


The query

[word="(.+)-?\1"]

works as expected for me -- both via the CQPweb interface, and on the command line: IE it finds words consisting of the same element twice, with or without intervening hyphen.

What version of CWB are you using? If it’s a version that predates the use of PCRE as the regex engine, that could explain this…

best

Andrew

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Susanne Flach
Sent: 13 March 2017 09:10
To: Open source development of the Corpus WorkBench
Subject: [CWB] Regex/backreferencing

Dear CWBists,

Can you backreference on the token level in CQP?

This question has been nagging me from time to time; now a student wants to investigate reduplication. For querying across token boundaries, labels sort of do the trick except they don’t seem to ignore case (i.e. a:[] b:[] :: a.word = b.word), but for reduplication within a token the pattern [word="(.+)-?\1”] only returns strings ending in 1.

Is this possible in CQP? Plus, is there a way to ignore case in labels?

Any ideas or pointers on (advanced) functions would be much appreciated.

Best & thanks
Susanne


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170313/5b8e8cbf/attachment.html>


More information about the CWB mailing list