[CWB] Difference in token number between CQP and CQPweb

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri Feb 21 16:03:38 CET 2014


Neither intentional nor fixed: bug still present.

I'll remove it when I lift the limit of 50 elsewhere.

But this doesn't affect the issue Hannah observed since her text ids were < 20 chars. 

Safe to upgrade.... soonish.

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Stefan Evert
Sent: 21 February 2014 12:15
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Difference in token number between CQP and CQPweb

I'm also still having problem, but different ones.  After shortening the text IDs to < 50 chars and completely re-installing the corpus in CQPweb, I get the correct corpus size and match counts.

However, frequency distributions still omit some matches.  Some digging revealed a simple cause: in the freq distribution MySQL tables, the ID column is declared as VARCHAR(40)!

Is this intentional, or a bug that has been fixed in the meantime?

On a related note: When is it safe to upgrade to CQPweb 3.1?

Best,
Stefan

On 19 Feb 2014, at 11:10, Hannah Kermes <h.kermes at mx.uni-saarland.de> wrote:

> I hate to spoil the party, but I shortened the text_ids (to a max of 
> 20 chars) of one of the problematic corpora (in the metadatatable and in the cqpcorpus), re-installed the corpus, and the problem stayed the same, still the same wrong token numbers.

_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list