<html><body><div style="color:#000; background-color:#fff; font-family:times new roman, new york, times, serif;font-size:12pt"><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">Andrew,</div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; ">Thanks for the suggestion. It may be a good idea to include this info in the instruction page for cwb-encode. The corpus was encoded just fine. However, I'm still having hell of a problem getting cqp to accept Cyrillic character encoding even in utf8. Has anyone been successful in encoding and searching a cyrillic corpus in Windows? I didn't encounter any such problems on Unix. Below is my encoding script and the search error:</div><div><div>cwb-encode -d "C:\CWB\ANGELINA\data" -f "C:\CWB\ANGELINA\angelina.txt" -c utf8 -R "C:\CWB\registry\angelina" -xsB -S s:0 -S text:0+id+title+author+genre -S subject:0 -S publisher:0 -S dateOrigonal:0 -S
dateDigital:0 -S identifier:0 -S citation:0 -S source:0 -S relation:0 -S hasPart:0 -S isPartOf:0</div><div><br></div><div>C:\Windows\system32>cqp</div><div>[no corpus]> ANGELINA;</div><div>ANGELINA> "што";</div><div>CL: Regex Compile Error: unrecognized character after (? or (?-</div><div>CQP Error:</div><div> Illegal regular expression: ???</div><div><br></div><div style="color: rgb(0, 0, 0); font-size: 16px; font-family: 'times new roman', 'new york', times, serif; background-color: transparent; font-style: normal; ">Regards,</div><div>George.</div></div><div style="font-family: 'times new roman', 'new york', times, serif; font-size: 12pt; "><br></div><div style="font-size: 12pt; font-family: 'times new roman', 'new york', times, serif; "><div style="font-size: 12pt; font-family: 'times new roman', 'new york', times, serif; "><font size="2" face="Arial"><hr size="1"><b><span
style="font-weight:bold;">From:</span></b> "Hardie, Andrew" <a.hardie@lancaster.ac.uk><br><b><span style="font-weight: bold;">To:</span></b> Open source development of the Corpus WorkBench <cwb@sslmit.unibo.it><br><b><span style="font-weight: bold;">Cc:</span></b> George Goce Mitrevski <podmocani@yahoo.com><br><b><span style="font-weight: bold;">Sent:</span></b> Friday, April 8, 2011 4:15 PM<br><b><span style="font-weight: bold;">Subject:</span></b> Re: [CWB] Encoding error in Windows<br></font><br>
<meta http-equiv="x-dns-prefetch-control" content="off"><div id="yiv306971045">
<div dir="ltr" align="left"><span class="yiv306971045390151121-08042011"><font face="Verdana" color="#000080" size="2">It means the encoding hasn't been set to utf8. This is
possibly because you haven't specified the encoding using <b>-c utf8
</b>(cwb-encode defaults to Latin-1 if not told specifically what encoding
to use) </font></span></div>
<div dir="ltr" align="left"><span class="yiv306971045390151121-08042011"></span><span class="yiv306971045390151121-08042011"><font face="Verdana" color="#000080" size="2"></font></span> </div>
<div dir="ltr" align="left"><span class="yiv306971045390151121-08042011"><font face="Verdana" color="#000080" size="2">On the other hand, if you <b><em>have</em></b>
specified that it is utf-8, then it may be a bug. If this is the case,
could you specify precisely what command line you've been using?
Thanks.</font></span></div>
<div dir="ltr" align="left"><span class="yiv306971045390151121-08042011"><font face="Verdana" color="#000080" size="2"></font></span> </div>
<div dir="ltr" align="left"><span class="yiv306971045390151121-08042011"><font face="Verdana" color="#000080" size="2">best</font></span></div>
<div dir="ltr" align="left"><span class="yiv306971045390151121-08042011"><font face="Verdana" color="#000080" size="2"></font></span> </div>
<div dir="ltr" align="left"><span class="yiv306971045390151121-08042011"><font face="Verdana" color="#000080" size="2">Andrew.</font></span></div><br>
<blockquote dir="ltr" style="PADDING-LEFT:5px;MARGIN-LEFT:5px;BORDER-LEFT:#000080 2px solid;MARGIN-RIGHT:0px;">
<div class="yiv306971045OutlookMessageHeader" lang="en-us" dir="ltr" align="left">
<hr tabindex="-1">
<font face="Tahoma" size="2"><b>From:</b> cwb-bounces@sslmit.unibo.it
[mailto:cwb-bounces@sslmit.unibo.it] <b>On Behalf Of </b>George Goce
Mitrevski<br><b>Sent:</b> 08 April 2011 22:09<br><b>To:</b> Open source
development of the Corpus WorkBench<br><b>Subject:</b> [CWB] Encoding error in
Windows<br></font><br></div>
<div style="font-size: 12pt; color: rgb(0, 0, 0); background-color: rgb(255, 255, 255); font-family: 'times new roman', 'new york', times, serif; ">
<div style="font-size: 12pt; font-family: times, serif; ">Can
someone please explain what's causing this encoding error when I try to encode
corpus in Window in utf8?</div>
<div style="font-size: 12pt; font-family: times, serif; "><br></div>
<div style="font-family: times, serif; ">
<div style="font-family: times, serif; ">
<div id="yiv306971045">
<div class="yiv306971045Section1" dir="rtl">
<div class="yiv306971045MsoNormal" dir="ltr" style="DIRECTION:ltr;unicode-bidi:embed;TEXT-ALIGN:left;"><font class="yiv306971045Apple-style-span" face="Arial"><font class="yiv306971045Apple-style-span" size="2">"Encoding error: an invalid byte or byte sequence for charset "latin1"
was encountered."</font><br></font></div>
<div class="yiv306971045MsoNormal" dir="ltr" style="DIRECTION:ltr;unicode-bidi:embed;TEXT-ALIGN:left;"><font class="yiv306971045Apple-style-span" face="Arial"><font class="yiv306971045Apple-style-span" size="2"><br></font></font></div>
<div class="yiv306971045MsoNormal" dir="ltr" style="DIRECTION:ltr;unicode-bidi:embed;TEXT-ALIGN:left;"><font class="yiv306971045Apple-style-span" face="Arial"><font class="yiv306971045Apple-style-span" size="2">Thanks
much.</font></font></div></div></div></div></div></div></blockquote>
</div><meta http-equiv="x-dns-prefetch-control" content="on"><br><br></div></div></div></body></html>