[CWB] Finding bad non-category-handle values

Hardie, Andrew a.hardie at lancaster.ac.uk
Sat Sep 24 18:25:18 CEST 2016


Sorry, I was unclear: the question mark was not part of the command, it was part of my sentence containing it! And the command wasn’t complete anyway.

Tabulate command instructions are here:

http://cwb.sourceforge.net/files/CQP_Tutorial/node39.html

But if you get zero results for the query, then tabulate won’t give you anything anyway.

One possibility, given no results for a non-handle character, is that the bad values are empty strings – e.g. if there exist in the original data instances of <text> that did not have a source.

Another possibility is that some of the text_source values are longer than the maximum handle length.

Incidentally, I’ve just checked in an amendment to the code which fixes a bug (identified while answering your question!) where a too-low maximum handle length was imposed, and also changes the “because there are non-category-handle values in the CWB index” error message to actually say what the bad value was. So, if you are using the bleeding edge code, you can svn up, try to change datatype again, and find out what the problem is that way.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Scott Sadowsky
Sent: 24 September 2016 16:48
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Finding bad non-category-handle values

On Sat, Sep 24, 2016 at 3:07 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:

Hi Andrew,

Try a CQP query for

<whichever_att=".*[^a-zA-Z0-9_].*">[]

The s-attribute in question is text_source, so I ran the following in CQP:

<text_source=".*[^a-zA-Z0-9_].*">[]

And it produced 0 hits. Same happens with this:

<text_source=".*[^a-z0-9_].*">[]

This would seem to indicate that all the values of text_source are licit, but CQPweb disagrees.


and then  tabulate match whichever_att ?

This just gives me an error:

tabulate match source_text ?;
CQP Error:
            CQP Syntax Error: syntax error, unexpected FIELD, expecting ID or NQRID
            tabulate match  <--
Synchronizing to end of line ...

Cheers,
Scott


From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of Scott Sadowsky
Sent: 24 September 2016 04:10
To: Open source development of the Corpus WorkBench
Cc: Open source development of the Corpus WorkBench
Subject: [CWB] Finding bad non-category-handle values

I'm attempting to import a corpus into CQPweb, and when I try to change one of the s-attributes from "free text" to "classification", I get the following error:

The datatype of text_source cannot be changed to [classification], because there are non-category-handle values in the CWB index.

I understand this to mean that in one or more values of text_source, there's a character that's not a-z or _. My question is simply how do I get a list of these values in order to figure out which one is causing the problem and then fix it?

Thanks in advance!
Scott

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20160924/7d5e7742/attachment-0001.html>


More information about the CWB mailing list