[CWB] [ cwb-Bugs-3514300 ] Numbered backrefs in string-level regex
not working
SourceForge.net
noreply at sourceforge.net
Tue Apr 3 10:25:38 CEST 2012
Bugs item #3514300, was opened at 2012-04-02 16:53
Message generated for change (Comment added) made by sheiden
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=3514300&group_id=131809
Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: CL low-level library
Group: TODO-3.5
Status: Open
Resolution: None
Priority: 7
Private: No
Submitted By: Andrew Hardie (andrewhardie)
Assigned to: Andrew Hardie (andrewhardie)
Summary: Numbered backrefs in string-level regex not working
Initial Comment:
Only one of the various possible syntaxes for backrefs within a regex seem to be working.
For instance:
"(.)\1"
should in theory find all forms consisting of the same character twice. However, it's not working. Neither are the following, which according to man pcre should be equivalent:
"(.)\g1"
"(.)\g{1}"
The following, however, DOES work, even though it SHOULD be identical to the preceding:
"(?P<name>.)\g{name}"
I suspect this is due to the regex optimiser and its lack of full PCRE-awareness (in cl/regopt.c) -- i.e. it is doing an incorrect optimisation and doing simple string matching on the first three but not on the fourth -- but cannot be sure without further investigation.
----------------------------------------------------------------------
Comment By: Serge Heiden (sheiden)
Date: 2012-04-03 01:25
Message:
"(.)\2" does what "(.)\1" should do actually.
There is apparently a +1 shift in the RE groups buffers in PCRE
[TXM 0.6b2, CQP 3.4, Linux]
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=722303&aid=3514300&group_id=131809
More information about the CWB
mailing list