[CWB] Zero matches in BNC
Hardie, Andrew
a.hardie at lancaster.ac.uk
Fri May 3 20:21:44 CEST 2019
Hi Aleks,
The language variable is not really relevant. It not being set means nothing. The size would seem t be wrong, though, as 26 million is nowhere near enough. Something may have gone wrong in the encoding process at that point that has left the lexicon and/or the index unfinished (thus the search failure).
Also, is your BNC data directory actually /home/corp/tma/ or is it a subdirectory of that? The latter would indicate something amiss if CQP is looking for the .info file (which usually doesn’t exist) in the parent directory. You might check what paths are given in the registry file, perhaps.
Hope that helps
best
Andrew.
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Aleksandar Trklja
Sent: 03 May 2019 15:21
To: cwb at sslmit.unibo.it
Subject: [CWB] Zero matches in BNC
Importance: High
Dear all,
I've re-encoded BNC with 'EncodeBNC.perl' and 'cqp' now returns zero matches. It seems that both Positional and Structural Attributes have been properly encoded (see below) but it seems that the language variable was not properly assigned. This is what 'info' shows:
BNC> info
Warning:
Can't open info file /home/corp/tma/.info for reading
Size: 26142145
Charset: latin1
Properties:
language = '??'
charset = 'latin1'
BNC> "the"
0 matches.
BNC> show cd
===Context Descriptor=======================================
left context: 25 characters
right context: 25 characters
corpus position: shown
target anchors: not shown
Positional Attributes: * word
pos
lemma
hw
class
type
flags_before
space_after
offset
Structural Attributes: text
text_id [A]
text_title [A]
text_n_words [A]
text_n_tokens [A]
text_n_w [A]
text_n_c [A]
text_n_s [A]
text_publication_date [A]
text_text_type [A]
text_context [A]
text_respondent_age [A]
text_respondent_class [A]
text_respondent_sex [A]
text_interaction_type [A]
text_region [A]
text_author_age [A]
text_author_domicile [A]
text_author_sex [A]
text_author_type [A]
text_audience_age [A]
text_domain [A]
text_difficulty [A]
text_medium [A]
...
Any suggestions? Thank you.
Best
Aleks
--
Dr Aleksandar Trklja
Senior Lecturer
Department of Translation Studies
University of Vienna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190503/f87a0e14/attachment-0001.html>
More information about the CWB
mailing list