[CWB] CWB Digest, Vol 77, Issue 7

Gertrud Faaß faassg at uni-hildesheim.de
Tue May 28 13:13:53 CEST 2013


Dear all,
thanks very much for the hint, and please accept my apologies for not 
trying out other corpora, too: The corpus I used was originally encoded 
in latin-1 and I re-encoded it in utf-8 at a later stage - that might be 
the reason for having such problems with it. I tried the query now with 
other corpora, encoded with cwb3.0 and the query works fine with them, 
so it is a corpus-encoding problem then.

Best
Gertrud



Am 28.05.2013 12:00, schrieb cwb-request at sslmit.unibo.it:
> Send CWB mailing list submissions to
> 	cwb at sslmit.unibo.it
>
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> or, via email, send a message with subject or body 'help' to
> 	cwb-request at sslmit.unibo.it
>
> You can reach the person managing the list at
> 	cwb-owner at sslmit.unibo.it
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of CWB digest..."
>
>
> Today's Topics:
>
>     1. Re: %d does not work?? (Hardie, Andrew)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Mon, 27 May 2013 19:04:42 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> 	<cwb at sslmit.unibo.it>
> Subject: Re: [CWB] %d does not work??
> Message-ID:
> 	<28078EC3FBF1B940A3EF3D0D19BE351D1CE4D4 at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="iso-8859-1"
>
> What this indicates is that some strings in the index of the "lemma" attribute are such that, with the accents removed, an invalid UTF-8 string is produced. This is quite confusing, and shouldn't happen if the corpus was properly encoded. Is this the same corpus index that you created with V3.0? Can you try checking the registry file to see what character set is declared there?
>
> best
>
> Andrew.
>
> -----Original Message-----
> From: Gertrud Faa? [mailto:faassg at uni-hildesheim.de]
> Sent: 27 May 2013 18:45
> To: Hardie, Andrew
> Subject: Re: [CWB] %d does not work??
>
> Dear Andrew, Stefan & all,
> thanks a lot for your advices - I've installed the beta-version 3.4
> successfully. Now we're using the double quotes in the call of the macro
> and the macro itself, and all the Umlauts work fine!
>
> I however tried the %d version of the query, too, and unfortunately,
> this still does not seem to  work properly (again syntax?)
>
> HGC> [lemma="Gef?hl"];
> (leads to 15595 results)
>
> HGC> [lemma="Gefuhl" %d];
> CL: major error, invalid UTF8 string passed to cl_string_canonical...
> CL: major error, invalid UTF8 string passed to cl_string_canonical...
> CL: major error, invalid UTF8 string passed to cl_string_canonical...
> CL: major error, invalid UTF8 string passed to cl_string_canonical...
>
> (the dots are really there)
>
> This line is printed to the terminal a lot of times, after that,
> however, some correct results are shown.
>
> Best
> Gertrud
>
>
>
> Am 25.05.2013 12:11, schrieb Hardie, Andrew:
>>>> We are working on a linux (ubuntu 12) system with CWB 3.0, the shell and the corpus are both utf-8.
>> Version 3.0 does not have UTF-8 support. Please upgrade to the latest version in the 3.4 series and see if the problem persists.
>>
>> best
>>
>> Andrew.
>>
>> -----Original Message-----
>> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of "Gertrud Faa?"
>> Sent: 25 May 2013 07:58
>> To: cwb at sslmit.unibo.it
>> Subject: [CWB] %d does not work??
>>
>> Dear all,
>> I am bit confused as students showed me a case this week where %d flag does not seem to work anymore, I could not find the error here.
>> The problem was raised because Umlaut did not work when used in parameters entered to execute a macro, e.g.
>>
>> HGC> /testmac3[f?hren];
>> CQP Error:
>>    Macro syntax error.
>> CQP Error:
>>    CQP Syntax Error: syntax error, unexpected UNDEFINED_MACRO  /testmac3[f? <-- Synchronizing to end of line ...
>>
>> Any word without Umlaut works with this macro.
>>
>> I then tried online queries:
>>
>> [lemma="f?hren"] = >worked (77,408 results) [lemma="fuhren" %d]; => worked, but 0 results.
>> [lemma="f\"uhren]; => worked, but 0 results "fuhren" %d; by the way only shows "fuhren", though "f?hren" exists (see above)
>>
>> We are working on a linux (ubuntu 12) system with CWB 3.0, the shell and the corpus are both utf-8. We are using the syntax described in the CWB online tutorial.
>>
>> There seem to be two problems (in case we got the syntax right): first, umlaut cannot be entered with macros, and second, the method using %d or the latex code do not work (dto), anymore(?) .
>>
>> We'd be very thankful for any hints,
>>
>> best
>> Gertrud
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>


-- 
Dipl. Ling Gertrud Faaß / PhD University of Pretoria
Universität Hildesheim
Institut für Informationswissenschaft und Sprachtechnologie
Marienburger Platz 22
31141 Hildesheim
gertrud.faass at uni-hildesheim.de
https://www.uni-hildesheim.de/index.php?id=faass
tel 05121/883-906
fax 05121/883-829
----------------------




More information about the CWB mailing list