[CWB] RE: Badly-formatted text ID codes
    Eros Zanchetta 
    eros.zanchetta2 at unibo.it
       
    Mon Jan 16 14:28:13 CET 2012
    
    
  
On Jan 16, 2012, at 2:21 PM, Hardie, Andrew wrote:
> No, because url-encoding allows non-word characters (like % and +); if you rolled your own recoding that avoided those characters, many of the resulting values would still be too long (and truncation might lead to duplicates). You need to add a proper ID attribute.
> 
> I can send you the script I wrote to do this in itWaC if you'd like.
Yes, that would be very helpful, thank you!
E
    
    
More information about the CWB
mailing list