[CWB] Regex/backreferencing
Hardie, Andrew
a.hardie at lancaster.ac.uk
Mon Mar 13 10:45:07 CET 2017
The query
[word="(.+)-?\1"]
works as expected for me -- both via the CQPweb interface, and on the command line: IE it finds words consisting of the same element twice, with or without intervening hyphen.
What version of CWB are you using? If it’s a version that predates the use of PCRE as the regex engine, that could explain this…
best
Andrew
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Susanne Flach
Sent: 13 March 2017 09:10
To: Open source development of the Corpus WorkBench
Subject: [CWB] Regex/backreferencing
Dear CWBists,
Can you backreference on the token level in CQP?
This question has been nagging me from time to time; now a student wants to investigate reduplication. For querying across token boundaries, labels sort of do the trick except they don’t seem to ignore case (i.e. a:[] b:[] :: a.word = b.word), but for reduplication within a token the pattern [word="(.+)-?\1”] only returns strings ending in 1.
Is this possible in CQP? Plus, is there a way to ignore case in labels?
Any ideas or pointers on (advanced) functions would be much appreciated.
Best & thanks
Susanne
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170313/5b8e8cbf/attachment.html>
More information about the CWB
mailing list