[CWB] Zero matches in BNC

Hardie, Andrew a.hardie at lancaster.ac.uk
Fri May 3 20:21:44 CEST 2019


Hi Aleks,

The language variable is not really relevant. It not being set means nothing. The size would seem t be wrong, though, as 26 million is nowhere near enough. Something may have gone wrong in the encoding process at that point that has left the lexicon and/or the index unfinished (thus the search failure).

Also, is your BNC data directory actually /home/corp/tma/ or is it a subdirectory of that? The latter would indicate something amiss if CQP is looking for the .info file (which usually doesn’t exist) in the parent directory. You might check what paths are given in the registry file, perhaps.

Hope that helps

best

Andrew.

From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Aleksandar Trklja
Sent: 03 May 2019 15:21
To: cwb at sslmit.unibo.it
Subject: [CWB] Zero matches in BNC
Importance: High

Dear all,

I've re-encoded BNC with 'EncodeBNC.perl' and 'cqp' now returns zero matches. It seems that both Positional and Structural Attributes have been properly encoded (see below) but it seems that the language variable was not properly assigned. This is what 'info' shows:

BNC> info
Warning:
    Can't open info file /home/corp/tma/.info for reading
Size:    26142145
Charset: latin1
Properties:
        language = '??'
        charset = 'latin1'


BNC> "the"
0 matches.


BNC> show cd
===Context Descriptor=======================================

left context:     25 characters
right context:    25 characters
corpus position:  shown
target anchors:   not shown

Positional Attributes:  * word
                          pos
                          lemma
                          hw
                          class
                          type
                          flags_before
                          space_after
                          offset

Structural Attributes:    text
                          text_id              [A]
                          text_title           [A]
                          text_n_words         [A]
                          text_n_tokens        [A]
                          text_n_w             [A]
                          text_n_c             [A]
                          text_n_s             [A]
                          text_publication_date [A]
                          text_text_type       [A]
                          text_context         [A]
                          text_respondent_age  [A]
                          text_respondent_class [A]
                          text_respondent_sex  [A]
                          text_interaction_type [A]
                          text_region          [A]
                          text_author_age      [A]
                          text_author_domicile [A]
                          text_author_sex      [A]
                          text_author_type     [A]
                          text_audience_age    [A]
                          text_domain          [A]
                          text_difficulty      [A]
                          text_medium          [A]
                        ...
Any suggestions? Thank you.

Best
Aleks
--
Dr Aleksandar Trklja
Senior Lecturer
Department of Translation Studies
University of Vienna
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20190503/f87a0e14/attachment-0001.html>


More information about the CWB mailing list