<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">

<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v = 

"urn:schemas-microsoft-com:vml" xmlns:o = 

"urn:schemas-microsoft-com:office:office" xmlns:w = 

"urn:schemas-microsoft-com:office:word" xmlns:m = 

"http://schemas.microsoft.com/office/2004/12/omml"><HEAD>

<META content="text/html; charset=utf-8" http-equiv=Content-Type>

<META name=GENERATOR content="MSHTML 9.00.8112.16872">

<STYLE>@font-face {

        font-family: Cambria Math;

}

@font-face {

        font-family: Calibri;

}

@font-face {

        font-family: Verdana;

}

@page WordSection1 {size: 612.0pt 792.0pt; margin: 72.0pt 72.0pt 72.0pt 72.0pt; }

P.MsoNormal {

        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt

}

LI.MsoNormal {

        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt

}

DIV.MsoNormal {

        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt

}

A:link {

        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.MsoHyperlink {

        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99

}

A:visited {

        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99

}

SPAN.MsoHyperlinkFollowed {

        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99

}

P.MsoListParagraph {

        MARGIN: 0cm 0cm 0pt 36pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt; mso-style-priority: 34

}

LI.MsoListParagraph {

        MARGIN: 0cm 0cm 0pt 36pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt; mso-style-priority: 34

}

DIV.MsoListParagraph {

        MARGIN: 0cm 0cm 0pt 36pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt; mso-style-priority: 34

}

P.msonormal0 {

        FONT-FAMILY: "Times New Roman",serif; MARGIN-LEFT: 0cm; FONT-SIZE: 12pt; MARGIN-RIGHT: 0cm; mso-style-name: msonormal; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto

}

LI.msonormal0 {

        FONT-FAMILY: "Times New Roman",serif; MARGIN-LEFT: 0cm; FONT-SIZE: 12pt; MARGIN-RIGHT: 0cm; mso-style-name: msonormal; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto

}

DIV.msonormal0 {

        FONT-FAMILY: "Times New Roman",serif; MARGIN-LEFT: 0cm; FONT-SIZE: 12pt; MARGIN-RIGHT: 0cm; mso-style-name: msonormal; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto

}

SPAN.EmailStyle19 {

        FONT-STYLE: normal; FONT-FAMILY: "Verdana",sans-serif; COLOR: #1f497d; FONT-WEIGHT: normal; TEXT-DECORATION: none; mso-style-type: personal

}

SPAN.EmailStyle20 {

        FONT-STYLE: normal; FONT-FAMILY: "Verdana",sans-serif; COLOR: #1f497d; FONT-WEIGHT: normal; TEXT-DECORATION: none; mso-style-type: personal

}

SPAN.EmailStyle21 {

        FONT-STYLE: normal; FONT-FAMILY: "Verdana",sans-serif; COLOR: #1f497d; FONT-WEIGHT: normal; TEXT-DECORATION: none; mso-style-type: personal-reply

}

.MsoChpDefault {

        FONT-SIZE: 10pt; mso-style-type: export-only

}

DIV.WordSection1 {

        page: WordSection1

}

OL {

        MARGIN-BOTTOM: 0cm

}

UL {

        MARGIN-BOTTOM: 0cm

}

</STYLE>

<!--[if gte mso 9]><xml>

<o:shapedefaults v:ext="edit" spidmax="1026" />

</xml><![endif]--><!--[if gte mso 9]><xml>

<o:shapelayout v:ext="edit">

<o:idmap v:ext="edit" data="1" />

</o:shapelayout></xml><![endif]--></HEAD>

<BODY lang=EN-GB bgColor=white vLink=purple link=blue>

<DIV><FONT size=2 face=Arial>Thank you, Vlado.&nbsp; </FONT><FONT size=2 

face=Arial>That's a really neat feature of NoSketch but for me it is better 

still to break "seanbhean" and "sean-bhean" each into two tokens in the vertical 

file:</FONT></DIV>

<DIV><FONT size=2 face=Arial>&nbsp;(1) word="sean" bzw "sean-"; 

demut="sean"</FONT></DIV>

<DIV><FONT size=2 face=Arial>&nbsp;(2) word="bhean"; demut="bean"</FONT></DIV>

<DIV><FONT size=2 face=Arial>The query is then made on the "demut" p-attribute 

("demut" is like how people use "lemma", but linguistically this is not a 

lemma). This results in:<BR>• a search for "sean" and "bean" together will 

retrieve all of "seanbhean", "sean-bhean" and "sean bhean" (NoSketch 

"sean--bhean" can do that too)<BR>• a search for "bean" will retrieve all of 

those, as well as all the other examples of "bean"; and correspondingly for a 

search for "sean".<BR>That is what will best suit the lexicographical user of 

the corpus.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>Thank you, Andrew, for showing how to display 

p-attributes in the kwic line; and for clarifying that CWB/Perl has not been 

made to work under Windows.&nbsp; I have only a couple of comments.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>&gt;&gt; It will avoid having a permanent 

multi-column file outside the corpus, but won't the multiple columns still exist 

internally in some form within the corpus?&nbsp; :-(<BR><FONT color=#0000ff>&gt; 

Yes, but it has to. If you want to store more than one item of 

separately-searchable information about each token – in this case, your 

word/demut combination – then you have to have multiple attributes. 

</FONT></FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>OK, I see that cwb's architecture requires 

that.<BR>&nbsp;<BR><FONT color=#0000ff>&gt; If you want to avoid at all costs 

multiple attributes being stored under the hood then…. you don’t want to use 

CWB! (Or Manatee, since that works on precisely the same 

principle.)</FONT></FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>Yes,&nbsp; I don't need to search on "word", and 

the program I use in Windows stores only "demut" in the index, and fetches the 

kwic contexts from a copy of the running text.&nbsp; (Incidentally, this means 

that "non-original spaces" never enter the contexts.)&nbsp; Given that cwb 

doesn't need a copy of the running text, the storage requirements of the two 

methods should be similar.</FONT></DIV>

<DIV><FONT size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>A script to post-process output from cqp before 

displaying it:<BR><FONT color=#0000ff>&gt;<BR>&gt;- read input line from user 

standard input<BR>&gt;- pass input line to CQP slave process (either directly, 

or via a library)<BR>&gt;- if necessary, read output line(s) from CQP slave 

process<BR>&gt;- modify output line(s) as per whatever requirements you 

have*<BR>&gt;- print output line(s) to standard output<BR>&gt;- print prompt for 

next user input.<BR>&gt;<BR>&gt;The user then runs your script instead of 

running CQP.<BR>&gt;<BR>&gt;(*) If you use one of the libraries, an easy way to 

do this is by specifying a "line handler" function when you call 

"exec/execute()" &gt;or "query()".</FONT></FONT></DIV>

<DIV><FONT color=#00ffff size=2 face=Arial></FONT>&nbsp;</DIV>

<DIV><FONT size=2 face=Arial>Many thanks for that explanation.&nbsp; I should 

also look at existing front-ends to cqp, which must do something like this 

series of steps, and may allow&nbsp;the&nbsp;user&nbsp;some control over the 

fourth step.&nbsp; The most promising of these may be TXM.<BR>&nbsp;<BR>The need 

to post-process&nbsp;kwic output in order to remove non-original spaces may not 

arise at all&nbsp;if I am correct in thinking that David's suggestion, some time 

ago, of&nbsp;&lt;g/&gt; as glue is actually implemented in the Sketch Engine as 

meaning "leave no space between the preceding and following tokens".&nbsp; In 

cwb, the " &lt;g/&gt; " comes through unchanged into the&nbsp;kwic context, 

and&nbsp;needs post-processing to remove it.&nbsp; It might be an idea to have 

cqp&nbsp;implement &lt;g/&gt; like that too.</FONT></DIV><FONT size=2 

face=Arial>

<DIV><BR>Regards,<BR>Ciarán.</DIV>

<DIV></FONT>&nbsp;</DIV></BODY></HTML>