[CWB] %d does not work??

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon May 27 21:04:42 CEST 2013


What this indicates is that some strings in the index of the "lemma" attribute are such that, with the accents removed, an invalid UTF-8 string is produced. This is quite confusing, and shouldn't happen if the corpus was properly encoded. Is this the same corpus index that you created with V3.0? Can you try checking the registry file to see what character set is declared there?

best

Andrew. 

-----Original Message-----
From: Gertrud Faaß [mailto:faassg at uni-hildesheim.de] 
Sent: 27 May 2013 18:45
To: Hardie, Andrew
Subject: Re: [CWB] %d does not work??

Dear Andrew, Stefan & all,
thanks a lot for your advices - I've installed the beta-version 3.4 
successfully. Now we're using the double quotes in the call of the macro 
and the macro itself, and all the Umlauts work fine!

I however tried the %d version of the query, too, and unfortunately, 
this still does not seem to  work properly (again syntax?)

HGC> [lemma="Gefühl"];
(leads to 15595 results)

HGC> [lemma="Gefuhl" %d];
CL: major error, invalid UTF8 string passed to cl_string_canonical...
CL: major error, invalid UTF8 string passed to cl_string_canonical...
CL: major error, invalid UTF8 string passed to cl_string_canonical...
CL: major error, invalid UTF8 string passed to cl_string_canonical...

(the dots are really there)

This line is printed to the terminal a lot of times, after that, 
however, some correct results are shown.

Best
Gertrud



Am 25.05.2013 12:11, schrieb Hardie, Andrew:
>>> We are working on a linux (ubuntu 12) system with CWB 3.0, the shell and the corpus are both utf-8.
> Version 3.0 does not have UTF-8 support. Please upgrade to the latest version in the 3.4 series and see if the problem persists.
>
> best
>
> Andrew.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of "Gertrud Faaß"
> Sent: 25 May 2013 07:58
> To: cwb at sslmit.unibo.it
> Subject: [CWB] %d does not work??
>
> Dear all,
> I am bit confused as students showed me a case this week where %d flag does not seem to work anymore, I could not find the error here.
> The problem was raised because Umlaut did not work when used in parameters entered to execute a macro, e.g.
>
> HGC> /testmac3[führen];
> CQP Error:
>   Macro syntax error.
> CQP Error:
>   CQP Syntax Error: syntax error, unexpected UNDEFINED_MACRO  /testmac3[fü <-- Synchronizing to end of line ...
>
> Any word without Umlaut works with this macro.
>
> I then tried online queries:
>
> [lemma="führen"] = >worked (77,408 results) [lemma="fuhren" %d]; => worked, but 0 results.
> [lemma="f\"uhren]; => worked, but 0 results "fuhren" %d; by the way only shows "fuhren", though "führen" exists (see above)
>
> We are working on a linux (ubuntu 12) system with CWB 3.0, the shell and the corpus are both utf-8. We are using the syntax described in the CWB online tutorial.
>
> There seem to be two problems (in case we got the syntax right): first, umlaut cannot be entered with macros, and second, the method using %d or the latex code do not work (dto), anymore(?) .
>
> We'd be very thankful for any hints,
>
> best
> Gertrud
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb


-- 
Dipl. Ling Gertrud Faaß / PhD University of Pretoria
Universität Hildesheim
Institut für Informationswissenschaft und Sprachtechnologie
Marienburger Platz 22
31141 Hildesheim
gertrud.faass at uni-hildesheim.de
https://www.uni-hildesheim.de/index.php?id=faass
tel 05121/883-906
fax 05121/883-829
----------------------




More information about the CWB mailing list