<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body text="#000000" bgcolor="#FFFFFF">
<div class="moz-cite-prefix">Dear All,<br>
<br>
</div>
<blockquote type="cite"
cite="mid:443efbfb-03d8-dc0c-e425-c43067da8e3b@ff.cuni.cz">> A
word like "seanbhean" is actually more frequently spelled with a
hyphen
<br>
> between the parts
<br>
<br>
This sounds like people will be most likely to search for
"sean-bhean"
<br>
(especially if they encounter it displayed as "sean-bhean" within
the corpus
<br>
itself), which will yield no results if it's actually split into
"sean" and
<br>
"bhean" under the hood.
<br>
</blockquote>
<p>NoSketch Engine (with the already mentioned Manatee backend) has
a solution for queries like this. For example, you can type a
"Simple query like this:</p>
<p><img src="cid:part1.7CAE0C98.1ABBC408@uniba.sk" alt=""></p>
<p>And the resulting frequency distribution may be:</p>
<p><img src="cid:part2.A72590F8.1DFDC757@uniba.sk" alt=""></p>
<p>I.e., one token with or without a hyphen, even two tokens :-)</p>
<p>You may want to try it yourself here:</p>
<p><a class="moz-txt-link-freetext" href="http://unesco.uniba.sk/guest/">http://unesco.uniba.sk/guest/</a></p>
<p>(No password needed in guest mode.)</p>
<p>Sorry for being bit off-topic ;-)</p>
<p>Best,</p>
<p>Vlado B, 17:20<br>
</p>
<p><br>
</p>
<div class="moz-signature">-- <br>
<font color="navy">Vladimír Benko</font>
<p>
Université Comenius de Bratislava<br>
Chaire UNESCO de communication<br>
plurilingue et multiculturelle</p>
<p>
Šafárikovo námestie 6, SK-81499 Bratislava</p>
<p>
<a class="moz-txt-link-freetext" href="http://unesco.uniba.sk/guest/">http://unesco.uniba.sk/guest/</a><br>
<a class="moz-txt-link-freetext" href="https://www.facebook.com/araneawebcorpora/">https://www.facebook.com/araneawebcorpora/</a><br>
<a class="moz-txt-link-freetext" href="https://vk.com/araneawebcorpora">https://vk.com/araneawebcorpora</a>
</p>
</div>
</body>
</html>