[CWB] RE: Badly-formatted text ID codes

Emiliano Guevara emiguevara at gmail.com
Mon Jan 16 14:15:21 CET 2012


what about a corpus-wide re-encoding of the Urls in ASCII safe characters?

something like this...

http://www.albionresearch.com/misc/urlencode.php

E.



On Jan 16, 2012, at 13:34 PM, Eros Zanchetta wrote:

> OK, thanks for the tip!
> 
> Best,
> Eros
> 
> On Jan 16, 2012, at 1:21 PM, Hardie, Andrew wrote:
> 
>> There isn't one. You have to have text ids that contain only ascii letters, numbers and underscore.
>> 
>> The "easy" way is to change the element containing the URL to url="" and then add an id alongside that. When I installed itWaC, I just used numbers for the ids.
>> 
>> best
>> 
>> Andrew.
>> 
>> -----Original Message-----
>> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Eros Zanchetta
>> Sent: 16 January 2012 12:11
>> To: Open source development of the Corpus WorkBench
>> Subject: [CWB] Badly-formatted text ID codes
>> 
>> Hi everyone,
>> 
>> I'm trying to install itwac and dewac on cqpweb but I keep getting the following error when I click on "Create minimalist metadata table":
>> 
>> "The data source you specified for the text metadata contains badly-formatted text ID codes"
>> 
>> The text IDs of the corpus are URLs, the problem seems to be that CQPWeb doesn't like underscores and slashes.
>> 
>> Can anyone suggest a workaround that doesn't include changing the text IDs?
>> 
>> Best,
>> Eros Zanchetta_______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> 
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list