[CWB] invalid UTF8 string passed to cl_string_canonical...
"Andrés Chandía"
andres at chandia.net
Mon May 9 16:36:58 CEST 2016
yes I use: cwb-encode -c utf8
so, what should I do?
El Lun, 9 de Mayo de 2016,
15:55, Hardie, Andrew escribió:
Is the corpus declared as
UTF-8?
If so, the problem is likely
to be that, in testing letter n-grams, the aligner is slicing up UTF characters. (I???m not
quite sure why this causes an error with
cl_string_canonical as I wasn???t aware that
the aligner used that function??? but possibly I???ve just forgotten).
best
Andrew.
From: cwb-bounces at sslmit.unibo.it
[mailto:cwb-bounces at sslmit.unibo.it]
On Behalf Of "Andr??s
Chand??a"
Sent: 09 May 2016 14:31
To: Open
source development of the Corpus WorkBench
Subject: [CWB] invalid UTF8
string passed to cl_string_canonical...
I'm geting this error message when aligning but I don't know how to deal
with it, I just found one comment about it, it didn't help me though, thanks.
OPENING btcataladeutsch_ca [205899 tokens, 7733 regions]
OPENING
btcataladeutsch_de [112264 tokens, 4951 regions]
LEXICON SIZE: 24709 / 19889
FEATURE: character count, weight=1 ... [1]
FEATURE: Shared words, threshold=40.0%,
weight=50 ... [6]
FEATURE: 3-grams, weight=3 ... CL: major error, invalid UTF8 string
passed to cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
[21952]
FEATURE: 4-grams, weight=4 ... CL: major error,
invalid UTF8 string passed to cl_string_canonical...
CL: major error, invalid UTF8
string passed to cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
[614656]
[636615 features allocated]
[290636 entries
in source text feature map]
[296034 entries in target text feature map]
PASS 2:
Setting character count weight.
PASS 2: Processing shared words (th=40.0%).
PASS
2: Processing 3-grams.
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
PASS 2: Processing 4-grams.
CL: major error, invalid UTF8
string passed to cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
CL: major error, invalid UTF8 string passed to
cl_string_canonical...
PASS 2: Creating character counts.
_______________________
andr??s chand??a
administrador de:
parles.upf |
delingua
| amind terapia |
mapuche koyaktu
| mail ong mapuche koyaktu |
mail psicoaching |
P No imprima innecesariamente. ??Cuide el medio
ambiente!
_______________________
            andrés
chandÃa
administrador de:
parles.upf | delingua | amind
terapia | mapuche koyaktu | mail ong mapuche koyaktu | mail psicoaching |
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160509/a6f5db9e/attachment.html>
More information about the CWB
mailing list