[CWB] Finding bad non-category-handle values

Scott Sadowsky ssadowsky at gmail.com
Sat Sep 24 17:48:02 CEST 2016


On Sat, Sep 24, 2016 at 3:07 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:

Hi Andrew,

Try a CQP query for
>
>
>
> <whichever_att=".*[^a-zA-Z0-9_].*">[]
>

The s-attribute in question is *text_source*, so I ran the following in CQP:

<text_source=".*[^a-zA-Z0-9_].*">[]

And it produced 0 hits. Same happens with this:

<text_source=".*[^a-z0-9_].*">[]

This would seem to indicate that all the values of *text_source* are licit,
but CQPweb disagrees.


and then  tabulate *match whichever_att* ?
>

This just gives me an error:

tabulate match source_text ?;
CQP Error:
CQP Syntax Error: syntax error, unexpected FIELD, expecting ID or NQRID
tabulate match  <--
Synchronizing to end of line ...

Cheers,
Scott


>
>
> *From:* cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] *On
> Behalf Of *Scott Sadowsky
> *Sent:* 24 September 2016 04:10
> *To:* Open source development of the Corpus WorkBench
> *Cc:* Open source development of the Corpus WorkBench
> *Subject:* [CWB] Finding bad non-category-handle values
>
>
>
> I'm attempting to import a corpus into CQPweb, and when I try to change
> one of the s-attributes from "free text" to "classification", I get the
> following error:
>
>
>
> *The datatype of text_source cannot be changed to [classification],
> because there are non-category-handle values in the CWB index.*
>
>
>
> I understand this to mean that in one or more values of text_source,
> there's a character that's not a-z or _. My question is simply how do I get
> a list of these values in order to figure out which one is causing the
> problem and then fix it?
>
>
>
> Thanks in advance!
>
> Scott
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20160924/0bcc2ab2/attachment.html>


More information about the CWB mailing list