<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML><HEAD>
<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 9.00.8112.16872">
<STYLE></STYLE>
</HEAD>
<BODY bgColor=#ffffff>
<DIV><FONT size=2 face=Arial>I would like to suggest/request a facility in CWB
(or its successor) where a user can intervene in the construction of an
index.</FONT></DIV>
<DIV> </DIV>
<DIV><FONT size=2 face=Arial>I envisage allowing the user to supply a script
which can receive the token, extracted from the text and destined to be
placed in an index, and can transform it. The transformed token would
be placed in the index, rather than the original form.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>The attached concordance output (tobar.jpg) — if
attachments are allowed on the list — was made by another program, and
shows an example of why I need this facility.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>In my example, under the keyword "bean" are
indexed/concorded several different forms, including "bean" and "bhean" and
"mbean" and "Bean", among others. As far as I am aware, this cannot be
achieved with CWB at present.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>In my texts, "bhean" is marked up as "b^hean", and
"mbean" as "^mbean". I would like to be able to supply a script which, in
my case, would drop the character "^" and the letter immediately following
it.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>In displayed contexts, I would need to be able
to drop the character "^h" but retain the letter following it. This is
what happens in the program which produced the screenshot.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>In my case again, I would also make my script
lower-case the token, bringing "Bean" into the family.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>It would further be necessary to allow the script
to return more than one keyword. For example, the text might contain
"seanbhean", which I encode as "sean+b^hean". My script here would act on
the character "+" and return TWO words for the index, "sean" and "bean".
Contexts would show "seanbhean", with "^" and "+" both deleted.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>For contexts, it might suffice (for my
needs) to give CWB a list of characters to be dropped from contexts,
without going to the lengths of allowing a user script for contexts, in addition
to the script for keywords.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>With thanks,</FONT></DIV>
<DIV><FONT size=2 face=Arial>Ciarán Ó Duibhín.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV></BODY></HTML>