[CWB] Aternative and patterns order in queries

Sébastien Jacquot sebastien.jacquot at univ-fcomte.fr
Tue Apr 7 11:35:55 CEST 2015


Hi,
in a corpora with this structure :

<text date="1900">
<body>
<p>...<q>...</q>...</p>
<p>...</p>
</body>
</text>

<text date="1901">
<body>
<p>...</p>
<p>...</p>
</body>
</text>
...

I'd like to get the tokens outside the "q" tags.

Do you know why these 2 queries don't return the same tokens ?

<text>[!q]+<q> | </q>[!q]+<q> | </q>[!q]+</text>;

</q>[!q]+</text> | <text>[!q]+<q> | </q>[!q]+<q>;

The first query doesn't work as expected, the returned tokens match only 
the first alternative pattern part : <text>[!q]+<q>
as if the pipe character would act like the OR boolean condition instead 
of the REGEX alternative.

The second query seems to work as expected and returns all the tokens 
outside the "q" tag.

Could these 2 behaviors be different because of the matching strategy 
configuration ?
Thanks in advance for the help.
Cheers,
Sebastian

-- 
ELLIADD, EA 4661
UFR SLHS - Université de Franche-Comté
30-32 rue Mégevand
25030 Besançon cedex
03.81.66.54.22



More information about the CWB mailing list