[CWB] RE: Badly-formatted text ID codes

Eros Zanchetta eros.zanchetta2 at unibo.it
Mon Jan 16 14:28:13 CET 2012


On Jan 16, 2012, at 2:21 PM, Hardie, Andrew wrote:

> No, because url-encoding allows non-word characters (like % and +); if you rolled your own recoding that avoided those characters, many of the resulting values would still be too long (and truncation might lead to duplicates). You need to add a proper ID attribute.
> 
> I can send you the script I wrote to do this in itWaC if you'd like.

Yes, that would be very helpful, thank you!

E


More information about the CWB mailing list