<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML xmlns="http://www.w3.org/TR/REC-html40" xmlns:v =
"urn:schemas-microsoft-com:vml" xmlns:o =
"urn:schemas-microsoft-com:office:office" xmlns:w =
"urn:schemas-microsoft-com:office:word" xmlns:m =
"http://schemas.microsoft.com/office/2004/12/omml"><HEAD>
<META content="text/html; charset=utf-8" http-equiv=Content-Type>
<META name=GENERATOR content="MSHTML 9.00.8112.16872">
<STYLE>@font-face {
        font-family: Cambria Math;
}
@font-face {
        font-family: Calibri;
}
@font-face {
        font-family: Verdana;
}
@page WordSection1 {size: 612.0pt 792.0pt; margin: 72.0pt 72.0pt 72.0pt 72.0pt; }
P.MsoNormal {
        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt
}
LI.MsoNormal {
        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt
}
DIV.MsoNormal {
        MARGIN: 0cm 0cm 0pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt
}
A:link {
        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.MsoHyperlink {
        COLOR: blue; TEXT-DECORATION: underline; mso-style-priority: 99
}
A:visited {
        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99
}
SPAN.MsoHyperlinkFollowed {
        COLOR: purple; TEXT-DECORATION: underline; mso-style-priority: 99
}
P.MsoListParagraph {
        MARGIN: 0cm 0cm 0pt 36pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt; mso-style-priority: 34
}
LI.MsoListParagraph {
        MARGIN: 0cm 0cm 0pt 36pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt; mso-style-priority: 34
}
DIV.MsoListParagraph {
        MARGIN: 0cm 0cm 0pt 36pt; FONT-FAMILY: "Times New Roman",serif; FONT-SIZE: 12pt; mso-style-priority: 34
}
P.msonormal0 {
        FONT-FAMILY: "Times New Roman",serif; MARGIN-LEFT: 0cm; FONT-SIZE: 12pt; MARGIN-RIGHT: 0cm; mso-style-name: msonormal; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto
}
LI.msonormal0 {
        FONT-FAMILY: "Times New Roman",serif; MARGIN-LEFT: 0cm; FONT-SIZE: 12pt; MARGIN-RIGHT: 0cm; mso-style-name: msonormal; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto
}
DIV.msonormal0 {
        FONT-FAMILY: "Times New Roman",serif; MARGIN-LEFT: 0cm; FONT-SIZE: 12pt; MARGIN-RIGHT: 0cm; mso-style-name: msonormal; mso-margin-top-alt: auto; mso-margin-bottom-alt: auto
}
SPAN.EmailStyle19 {
        FONT-STYLE: normal; FONT-FAMILY: "Verdana",sans-serif; COLOR: #1f497d; FONT-WEIGHT: normal; TEXT-DECORATION: none; mso-style-type: personal
}
SPAN.EmailStyle20 {
        FONT-STYLE: normal; FONT-FAMILY: "Verdana",sans-serif; COLOR: #1f497d; FONT-WEIGHT: normal; TEXT-DECORATION: none; mso-style-type: personal
}
SPAN.EmailStyle21 {
        FONT-STYLE: normal; FONT-FAMILY: "Verdana",sans-serif; COLOR: #1f497d; FONT-WEIGHT: normal; TEXT-DECORATION: none; mso-style-type: personal-reply
}
.MsoChpDefault {
        FONT-SIZE: 10pt; mso-style-type: export-only
}
DIV.WordSection1 {
        page: WordSection1
}
OL {
        MARGIN-BOTTOM: 0cm
}
UL {
        MARGIN-BOTTOM: 0cm
}
</STYLE>
<!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]--></HEAD>
<BODY lang=EN-GB bgColor=white vLink=purple link=blue>
<DIV><FONT size=2 face=Arial>Thank you, Vlado. </FONT><FONT size=2
face=Arial>That's a really neat feature of NoSketch but for me it is better
still to break "seanbhean" and "sean-bhean" each into two tokens in the vertical
file:</FONT></DIV>
<DIV><FONT size=2 face=Arial> (1) word="sean" bzw "sean-";
demut="sean"</FONT></DIV>
<DIV><FONT size=2 face=Arial> (2) word="bhean"; demut="bean"</FONT></DIV>
<DIV><FONT size=2 face=Arial>The query is then made on the "demut" p-attribute
("demut" is like how people use "lemma", but linguistically this is not a
lemma). This results in:<BR>• a search for "sean" and "bean" together will
retrieve all of "seanbhean", "sean-bhean" and "sean bhean" (NoSketch
"sean--bhean" can do that too)<BR>• a search for "bean" will retrieve all of
those, as well as all the other examples of "bean"; and correspondingly for a
search for "sean".<BR>That is what will best suit the lexicographical user of
the corpus.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>Thank you, Andrew, for showing how to display
p-attributes in the kwic line; and for clarifying that CWB/Perl has not been
made to work under Windows. I have only a couple of comments.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>>> It will avoid having a permanent
multi-column file outside the corpus, but won't the multiple columns still exist
internally in some form within the corpus? :-(<BR><FONT color=#0000ff>>
Yes, but it has to. If you want to store more than one item of
separately-searchable information about each token – in this case, your
word/demut combination – then you have to have multiple attributes.
</FONT></FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>OK, I see that cwb's architecture requires
that.<BR> <BR><FONT color=#0000ff>> If you want to avoid at all costs
multiple attributes being stored under the hood then…. you don’t want to use
CWB! (Or Manatee, since that works on precisely the same
principle.)</FONT></FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>Yes, I don't need to search on "word", and
the program I use in Windows stores only "demut" in the index, and fetches the
kwic contexts from a copy of the running text. (Incidentally, this means
that "non-original spaces" never enter the contexts.) Given that cwb
doesn't need a copy of the running text, the storage requirements of the two
methods should be similar.</FONT></DIV>
<DIV><FONT size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>A script to post-process output from cqp before
displaying it:<BR><FONT color=#0000ff>><BR>>- read input line from user
standard input<BR>>- pass input line to CQP slave process (either directly,
or via a library)<BR>>- if necessary, read output line(s) from CQP slave
process<BR>>- modify output line(s) as per whatever requirements you
have*<BR>>- print output line(s) to standard output<BR>>- print prompt for
next user input.<BR>><BR>>The user then runs your script instead of
running CQP.<BR>><BR>>(*) If you use one of the libraries, an easy way to
do this is by specifying a "line handler" function when you call
"exec/execute()" >or "query()".</FONT></FONT></DIV>
<DIV><FONT color=#00ffff size=2 face=Arial></FONT> </DIV>
<DIV><FONT size=2 face=Arial>Many thanks for that explanation. I should
also look at existing front-ends to cqp, which must do something like this
series of steps, and may allow the user some control over the
fourth step. The most promising of these may be TXM.<BR> <BR>The need
to post-process kwic output in order to remove non-original spaces may not
arise at all if I am correct in thinking that David's suggestion, some time
ago, of <g/> as glue is actually implemented in the Sketch Engine as
meaning "leave no space between the preceding and following tokens". In
cwb, the " <g/> " comes through unchanged into the kwic context,
and needs post-processing to remove it. It might be an idea to have
cqp implement <g/> like that too.</FONT></DIV><FONT size=2
face=Arial>
<DIV><BR>Regards,<BR>Ciarán.</DIV>
<DIV></FONT> </DIV></BODY></HTML>