<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML><HEAD>

<META content="text/html; charset=iso-8859-1" http-equiv=Content-Type>

<META name=GENERATOR content="MSHTML 9.00.8112.16872">

<STYLE></STYLE>

</HEAD>

<BODY bgColor=#ffffff>

<DIV><FONT size=2 face=Arial>I would like to suggest/request a facility in CWB 

(or its successor) where a user can intervene in the construction of an 

index.</FONT></DIV>

<DIV>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>I envisage allowing the user to supply a script 

which can receive the token, extracted from the text and&nbsp;destined to be 

placed in an index, and can transform it.&nbsp; The transformed&nbsp;token would 

be placed in the index, rather than the original form.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>The attached concordance output (tobar.jpg) — if 

attachments are allowed on the list —&nbsp;was made by another program, and 

shows an example of why I need this facility.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>In my example, under the keyword "bean" are 

indexed/concorded several different forms, including "bean" and "bhean" and 

"mbean" and "Bean", among others.&nbsp; As far as I am aware, this cannot be 

achieved with CWB at present.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>In my texts, "bhean" is marked up as "b^hean", and 

"mbean" as "^mbean".&nbsp; I would like to be able to supply a script which, in 

my case,&nbsp;would drop the character "^" and the letter immediately following 

it.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>In&nbsp;displayed contexts, I would need to be able 

to drop the character "^h" but retain the letter following it.&nbsp; This is 

what happens in the program which produced the screenshot.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>In my case again, I would also make my script 

lower-case the token, bringing "Bean" into the family.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>It would further be necessary to allow the script 

to return more than one keyword.&nbsp; For example, the text might contain 

"seanbhean", which I encode as "sean+b^hean".&nbsp; My script here would act on 

the character "+" and return TWO words for the index, "sean" and "bean".&nbsp; 

Contexts would show "seanbhean", with "^" and "+" both deleted.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>For contexts, it might suffice (for my 

needs)&nbsp;to give CWB a list of characters to be dropped from contexts, 

without going to the lengths of allowing a user script for contexts, in addition 

to the script for keywords.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>With thanks,</FONT></DIV>

<DIV><FONT size=2 face=Arial>Ciarán Ó Duibhín.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV></BODY></HTML>