[CWB] Can't generate text-by-text freq lists?

Arthur Wang arthur0421 at gmail.com
Sun Jul 9 23:15:58 CEST 2017


Thanks Andrew! These days I've also installed some other corpora, all 
successfully, and only this particular corpus has this problem, so I'm 
curious what's special about it...

best
Jiayue

On 09/07/17 22:04, Hardie, Andrew wrote:
> Hi Jiayue,
> 
> The current checkin of the trunk is known-broken (as it's stranded between the previous version and the next while I find time to finish the next version!). Suggest you switch to checkin #924, then see if these issues persist.
> 
> BTW: Category handles with just digits are perfectly fine!
> 
> best
> 
> Andrew.
> 
> -----Original Message-----
> From: Arthur Wang [mailto:arthur0421 at gmail.com]
> Sent: 09 July 2017 22:02
> To: Hardie, Andrew
> Subject: Re: [CWB] Can't generate text-by-text freq lists?
> 
> Hi Andrew,
> 
> Mine is a learner corpus. If I click "Manage text metadata", I see two
> file handles "major" and "year", both are of the Classification datatype.
> 
> "Manage text categories": I see the usual forms asking me to insert or
> update text category descriptions... By the way, both the classification
> schemes "major" and "year" have categories that contain only digits, no
> alphabetical letters (is this a problem?).
> 
> My checkout is the trunk...
> svn co http://svn.code.sf.net/p/cwb/code/gui/cqpweb/trunk cqpweb
> 
> Best,
> Jiayue
> 
> On 09/07/17 21:36, Hardie, Andrew wrote:
>> Hi Jiayue,
>>
>> On some further thought, looking back at your original report, it sounds as if the frequency-table setup is not actually the problem. It's to do with the distribution function and the metadata setup, I think.
>>
>> Can you check the following things.
>>
>> - What checkout is your code? Especially, file distribution.inc.php - this is currently broken, if you have anything later than commit # 924 ...
>>
>> - What appears when you go to the corpus menu and click "Manage text metadata" / "Manage text categories"?
>>
>> (Especially - the datatypes in the former.)
>>
>> best
>>
>> Andrew.
>>
>> -----Original Message-----
>> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Arthur Wang
>> Sent: 08 July 2017 14:10
>> To: Open source development of the Corpus WorkBench
>> Subject: Re: [CWB] Can't generate text-by-text freq lists?
>>
>> Hi Andrew
>>
>> Thanks for the reply. But none of these are missing. My corpus is called
>> "gxun_grad" (Tree Tagger tagged), and in MySQL I have all the following
>> tables:
>>
>> text_metadata_for_gxun_grad
>> freq_text_index_gxun_grad
>> freq_corpus_gxun_grad_lemma
>> freq_corpus_gxun_grad_pos
>> freq_corpus_gxun_grad_word
>>
>> The CWB folders are in my home folder. In "index" there are:
>>
>> gxun_grad
>> gxun_grad__freq
>>
>> In "registry" there are:
>>
>> gxun_grad
>> gxun_grad__freq
>>
>> I installed the corpus quite a few times but the problems remain. What
>> else should I look to?
>>
>> Best
>> Jiayue
>>
>> On 08/07/17 12:26, Hardie, Andrew wrote:
>>> I suggest you check in MySQL which tables actually exist.
>>>
>>> You should have the following tables :
>>>
>>> text_metadata_for_CORPUS
>>> freq_text_index_CORPUS
>>> freq_corpus_CORPUS_word
>>>         .... plus one more like the above for every additional p-attribute.
>>>
>>> You should also have a CWB corpus called "__CORPUS" in your index data directory and a corresponding registry file in the CQPweb registry directory.
>>>
>>> If you can identify which of these pieces of data is missing, it will be easier to identify what has gone wrong.
>>>
>>> best
>>>
>>> Andrew.
>>>
>>> -----Original Message-----
>>> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Arthur Wang
>>> Sent: 07 July 2017 09:26
>>> To: Open source development of the Corpus WorkBench
>>> Subject: [CWB] Can't generate text-by-text freq lists?
>>>
>>> Hi,
>>>
>>> These days I installed a 1 million word corpus in CQPweb (v3.2.26) and
>>> its metadata (tsv), and then told CQPweb to auto generate the freq
>>> lists, everything looked fine.
>>>
>>> But then I found that the text freq lists were not actually generated -
>>> "Distribution" shows zero for "Hits in category", "Dispersion" and
>>> "Frequency", and I can't search by category at all. I check my metadata
>>> file, it's perfectly ok.
>>>
>>> Then I tried generating the text/category freq lists manually, no luck
>>> either.
>>>
>>> What are the possible reasons for text freq lists to fail to be
>>> generated? Thanks for any clue.
>>>
>>> Jiayue
>>> _______________________________________________
>>> CWB mailing list
>>> CWB at sslmit.unibo.it
>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>> _______________________________________________
>>> CWB mailing list
>>> CWB at sslmit.unibo.it
>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>
>>
> 

-- 
Jiayue Wang
College of Foreign Studies
Guangxi University for Nationalities
Nanning, China 530006


More information about the CWB mailing list