[CWB] WebInABox: Can't import existing corpora from host

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Aug 1 09:26:15 CEST 2016


Hi Scott & anyone else interested,

The bug here was the use of “text” as a sub-text XML element. “text” is special and has to be treated as such, otherwise things go wrong.

What was going wrong in this case was that the query form was requesting a restriction on text, but as a sub-text XML element, not using the actual text-metadata backend. The system tried to fulfil the request using the text-metadata backend, but couldn’t because it wasn’t set up. Result: big honking crash.

I have just now committed to the repo a fix that stops the restriction block from showing text as a sub-text XML element. This means that what was previously a non-obviously incorrect way of doing things will simply no longer work.

The correct way in this case is to generate  a text-metadata table from the data stored in the “text_*” XML attributes, using the variant of the install-text-metadata table accessed via the “” button.

best

Andrew.

From: cwb-bounces at liste.sslmit.unibo.it [mailto:cwb-bounces at liste.sslmit.unibo.it] On Behalf Of Hardie, Andrew
Sent: 28 July 2016 16:25
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] WebInABox: Can't import existing corpora from host


Hi Scott,

This is a bug. It arises, as best as I can tell, from confusion between  a text-level restriction, and a restriction on the <text> XML object (equivalent, but distinct due to the fact that text metadata exists separately from the XML metadata from which it derives, which in turn is a consequence of the special status of “text” as an entitity in CQPweb). Can you send me (off list) screenshots of the search form with the tickboxes you selected that led to this error? I will then investigate.

best

Andrew.

From: cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it] On Behalf Of Scott Sadowsky
Sent: 26 July 2016 17:53
To: Open source development of the Corpus WorkBench
Cc: Open source development of the Corpus WorkBench
Subject: Re: [CWB] WebInABox: Can't import existing corpora from host

On Tue, Jul 26, 2016 at 12:18 PM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:


>>> But how do I restrict searches using the s-attributes (say, speaker sex)? When I do a query and then select "Distribution", for example, I'm told that "This corpus has no text-classification metadata, so the distribution cannot be shown".

•         Go to Restricted query

•         You should see options to restrict your query to XML segments where the given attribute has a particular category handle for any s-att that you set to datatype “Classifcation”
Thanks. That makes sense.

When I run one of these queries, though, CQPweb throws an SQL error (pasted below).


•         OR, go to “Create / edit subcorpora” and define subcorpora using the same control, then use those SCs as restriction criteria.
This also throws an error (also pasted below).


 Note that non-text-based corpus restrictions and subcorpora aren’t currently supported in the Distribution display. I know this is a pain, and it’s high on my feature list. (but quite a big job so can’t be done quickly!)
I can only imagine!

Thanks again,
Scott



===== ERROR 1 =====
CQPweb encountered an error and could not continue.
A MySQL query did not run successfully!

Original query: SELECT count(*), sum(words) FROM text_metadata_for_test_coscach WHERE /* from User: user | Function: do_append_mysql_comment() | 2016-Jul-26 16:42:47 */

Error # 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 2

PHP debugging backtrace

array(7) {
  [1]=>
  array(4) {
    ["file"]=>
    string(40) "/var/www/html/cqpweb/lib/library.inc.php"
    ["line"]=>
    int(282)
    ["function"]=>
    string(20) "exiterror_mysqlquery"
    ["args"]=>
    array(3) {
      [0]=>
      &int(1064)
      [1]=>
      &string(146) "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 2"
      [2]=>
      &string(156) "SELECT count(*), sum(words) FROM text_metadata_for_test_coscach WHERE
        /* from User: user | Function: do_append_mysql_comment() | 2016-Jul-26 16:42:47 */"
    }
  }
  [2]=>
  array(4) {
    ["file"]=>
    string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
    ["line"]=>
    int(1556)
    ["function"]=>
    string(14) "do_mysql_query"
    ["args"]=>
    array(1) {
      [0]=>
      &string(71) "SELECT count(*), sum(words) FROM text_metadata_for_test_coscach WHERE  "
    }
  }
  [3]=>
  array(7) {
    ["file"]=>
    string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
    ["line"]=>
    int(1214)
    ["function"]=>
    string(15) "initialise_size"
    ["class"]=>
    string(11) "Restriction"
    ["object"]=>
    object(Restriction)#14 (15) {
      ["serialised":"Restriction":private]=>
      string(26) "$^text|location~concepcion"
      ["parsed_conditions":"Restriction":private]=>
      array(1) {
        ["text"]=>
        array(1) {
          [0]=>
          string(19) "location~concepcion"
        }
      }
      ["stored_text_metadata_where":"Restriction":private]=>
      NULL
      ["stored_idlink_where":"Restriction":private]=>
      NULL
      ["cpos_collection":"Restriction":private]=>
      NULL
      ["corpus":"Restriction":private]=>
      string(12) "test_coscach"
      ["item_type":"Restriction":private]=>
      string(4) "text"
      ["n_items":"Restriction":private]=>
      NULL
      ["n_tokens":"Restriction":private]=>
      NULL
      ["freqtable_record":"Restriction":private]=>
      NULL
      ["hasrun_initialise_text_metadata_where":"Restriction":private]=>
      bool(false)
      ["hasrun_initialise_idlink_where":"Restriction":private]=>
      bool(false)
      ["hasrun_initialise_cpos_collection":"Restriction":private]=>
      bool(false)
      ["hasrun_initialise_size":"Restriction":private]=>
      bool(false)
      ["needs_to_be_added_to_cache":"Restriction":private]=>
      bool(false)
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(0) {
    }
  }
  [4]=>
  array(6) {
    ["file"]=>
    string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
    ["line"]=>
    int(670)
    ["function"]=>
    string(12) "new_from_url"
    ["class"]=>
    string(11) "Restriction"
    ["type"]=>
    string(2) "::"
    ["args"]=>
    array(2) {
      [0]=>
      &string(85) "theData=gente&qmode=sq_nocase&pp=50&del=begin&t=text|location~concepcion&del=end&uT=y"
      [1]=>
      &bool(true)
    }
  }
  [5]=>
  array(7) {
    ["file"]=>
    string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
    ["line"]=>
    int(589)
    ["function"]=>
    string(14) "parse_from_url"
    ["class"]=>
    string(10) "QueryScope"
    ["object"]=>
    object(QueryScope)#15 (4) {
      ["type"]=>
      int(0)
      ["restriction":"QueryScope":private]=>
      NULL
      ["subcorpus":"QueryScope":private]=>
      NULL
      ["serialised":"QueryScope":private]=>
      string(0) ""
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(2) {
      [0]=>
      &string(89) "theData=gente&qmode=sq_nocase&pp=50&del=begin&t=text%7Clocation%7Econcepcion&del=end&uT=y"
      [1]=>
      &bool(true)
    }
  }
  [6]=>
  array(6) {
    ["file"]=>
    string(44) "/var/www/html/cqpweb/lib/concordance.inc.php"
    ["line"]=>
    int(156)
    ["function"]=>
    string(12) "new_from_url"
    ["class"]=>
    string(10) "QueryScope"
    ["type"]=>
    string(2) "::"
    ["args"]=>
    array(2) {
      [0]=>
      &string(89) "theData=gente&qmode=sq_nocase&pp=50&del=begin&t=text%7Clocation%7Econcepcion&del=end&uT=y"
      [1]=>
      &bool(true)
    }
  }
  [7]=>
  array(4) {
    ["file"]=>
    string(40) "/var/www/html/cqpweb/exe/concordance.php"
    ["line"]=>
    int(1)
    ["args"]=>
    array(1) {
      [0]=>
      string(44) "/var/www/html/cqpweb/lib/concordance.inc.php"
    }
    ["function"]=>
    string(7) "require"
  }
}

===== ERROR 2 =====

CQPweb encountered an error and could not continue.
A MySQL query did not run successfully!

Original query: SELECT count(*), sum(words) FROM text_metadata_for_test_coscach WHERE /* from User: user | Function: do_append_mysql_comment() | 2016-Jul-26 16:49:17 */

Error # 1064: You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 2


PHP debugging backtrace

array(5) {
  [1]=>
  array(4) {
    ["file"]=>
    string(40) "/var/www/html/cqpweb/lib/library.inc.php"
    ["line"]=>
    int(282)
    ["function"]=>
    string(20) "exiterror_mysqlquery"
    ["args"]=>
    array(3) {
      [0]=>
      &int(1064)
      [1]=>
      &string(146) "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '' at line 2"
      [2]=>
      &string(156) "SELECT count(*), sum(words) FROM text_metadata_for_test_coscach WHERE
        /* from User: user | Function: do_append_mysql_comment() | 2016-Jul-26 16:49:17 */"
    }
  }
  [2]=>
  array(4) {
    ["file"]=>
    string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
    ["line"]=>
    int(1556)
    ["function"]=>
    string(14) "do_mysql_query"
    ["args"]=>
    array(1) {
      [0]=>
      &string(71) "SELECT count(*), sum(words) FROM text_metadata_for_test_coscach WHERE  "
    }
  }
  [3]=>
  array(7) {
    ["file"]=>
    string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
    ["line"]=>
    int(1214)
    ["function"]=>
    string(15) "initialise_size"
    ["class"]=>
    string(11) "Restriction"
    ["object"]=>
    object(Restriction)#16 (15) {
      ["serialised":"Restriction":private]=>
      string(26) "$^text|location~concepcion"
      ["parsed_conditions":"Restriction":private]=>
      array(1) {
        ["text"]=>
        array(1) {
          [0]=>
          string(19) "location~concepcion"
        }
      }
      ["stored_text_metadata_where":"Restriction":private]=>
      NULL
      ["stored_idlink_where":"Restriction":private]=>
      NULL
      ["cpos_collection":"Restriction":private]=>
      NULL
      ["corpus":"Restriction":private]=>
      string(12) "test_coscach"
      ["item_type":"Restriction":private]=>
      string(4) "text"
      ["n_items":"Restriction":private]=>
      NULL
      ["n_tokens":"Restriction":private]=>
      NULL
      ["freqtable_record":"Restriction":private]=>
      NULL
      ["hasrun_initialise_text_metadata_where":"Restriction":private]=>
      bool(false)
      ["hasrun_initialise_idlink_where":"Restriction":private]=>
      bool(false)
      ["hasrun_initialise_cpos_collection":"Restriction":private]=>
      bool(false)
      ["hasrun_initialise_size":"Restriction":private]=>
      bool(false)
      ["needs_to_be_added_to_cache":"Restriction":private]=>
      bool(false)
    }
    ["type"]=>
    string(2) "->"
    ["args"]=>
    array(0) {
    }
  }
  [4]=>
  array(6) {
    ["file"]=>
    string(48) "/var/www/html/cqpweb/lib/subcorpus-admin.inc.php"
    ["line"]=>
    int(128)
    ["function"]=>
    string(12) "new_from_url"
    ["class"]=>
    string(11) "Restriction"
    ["type"]=>
    string(2) "::"
    ["args"]=>
    array(1) {
      [0]=>
      &string(178) "subcorpusNewName=concepcion&action=Create+subcorpus+from+selected+categories&scriptMode=create_from_metadata&thisQ=subcorpus&del=begin&t=text%7Clocation%7Econcepcion&del=end&uT=y"
    }
  }
  [5]=>
  array(4) {
    ["file"]=>
    string(44) "/var/www/html/cqpweb/exe/subcorpus-admin.php"
    ["line"]=>
    int(1)
    ["args"]=>
    array(1) {
      [0]=>
      string(48) "/var/www/html/cqpweb/lib/subcorpus-admin.inc.php"
    }
    ["function"]=>
    string(7) "require"
  }
}


From: cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it>] On Behalf Of Scott Sadowsky
Sent: 26 July 2016 17:12

To: Open source development of the Corpus WorkBench
Cc: Open source development of the Corpus WorkBench
Subject: Re: [CWB] WebInABox: Can't import existing corpora from host



On Tue, Jul 26, 2016 at 7:25 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:



Hi Andrew,

I have had a dig, and found the bug (it was a regex glitch parsing the inserted registry file). Update the code to rev 880 and you should find that the system will obediently detect your s-attributes. (You will still, naturally, need to go through the first step that IO mentioned,  of making sure all data from earlier passes is properly scrubbed.)

Eureka - with this new rev CQPweb now imports my XML metadata! Thanks so much for hunting this down and fixing it!



I've now done the following:



1. I went through the "Manage Corpus XML" page and set descriptions and data types, defining the attributes I want to be able to search on in queries, subqueries, sub-corpora, etc. to "classification" (e.g. speaker sex and location).



2. I went through the "Manage Annotation" page and linked the "Annotation setup for CEQL queries" fields to the various annotation data in my corpus.



3. On the "Manage frequency lists" page I (re)generated everything (I've attached the metadata table from mysql below).



I can now perform queries, and my metadata is recognized. But how do I restrict searches using the s-attributes (say, speaker sex)? When I do a query and then select "Distribution", for example, I'm told that "This corpus has no text-classification metadata, so the distribution cannot be shown".



Thanks!

Scott





mysql> select * from xml_metadata;

+----+--------------+-----------------+------------+-----------------------------------+----------+

| id | corpus       | handle          | att_family | description                       | datatype |

+----+--------------+-----------------+------------+-----------------------------------+----------+

|  1 | bncsampler   | s               | s          | s                                 |        0 |

|  2 | bncsampler   | text            | text       | text                              |        0 |

|  3 | bncsampler   | text_id         | text       | text_id                           |        3 |

|  4 | lcmc         | s               | s          | s                                 |        0 |

|  5 | lcmc         | text            | text       | text                              |        0 |

|  6 | lcmc         | text_id         | text       | text_id                           |        3 |

|  7 | test_coscach | s               | s          | Sentence                          |        0 |

|  8 | test_coscach | text            | text       | Text                              |        0 |

|  9 | test_coscach | text_id         | text       | Unique Text ID                    |        3 |

| 10 | test_coscach | text_corpus     | text       | Corpus name                       |        2 |

| 11 | test_coscach | text_tagger     | text       | Corpus tagger                     |        2 |

| 12 | test_coscach | text_language   | text       | Text language                     |        1 |

| 13 | test_coscach | text_channel    | text       | Spoken or written?                |        2 |

| 14 | test_coscach | text_instrument | text       | Elicitation instrument            |        1 |

| 15 | test_coscach | text_lingualism | text       | Speaker monolingual or bilingual? |        1 |

| 16 | test_coscach | text_location   | text       | Speaker location                  |        1 |

| 17 | test_coscach | text_sex        | text       | Speaker sex                       |        1 |

| 18 | test_coscach | text_generation | text       | Speaker generation                |        1 |

| 19 | test_coscach | text_sel        | text       | Speaker SEL                       |        1 |

+----+--------------+-----------------+------------+-----------------------------------+----------+

19 rows in set (0.00 sec)



mysql>





From: Hardie, Andrew
Sent: 25 July 2016 23:48
To: Open source development of the Corpus WorkBench
Subject: RE: [CWB] WebInABox: Can't import existing corpora from host



OK, 2 things:



First – the result of the MySQL query shows that none of the XML of your corpus has been detected.



Second – the other error you report is clearly referring to your earlier index data. The check on text ID validity is done at point of extraction from the index to CQPweb’s internal data structures. So, it is reading the index and getting bad values. This implies that your earelier index files still exist and are being read by CQPweb.



So, the overall picture would seem to be that you have data hanging around from previous incarnations of the corpus, and your reinstallation did not work properly. Your best bet might be to make doubly sure everything is wiped from that corpus, then start over again. This will probably not fix all the problems but it should make the issues that remain clearer.



best



Andrew.



From: cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it>] On Behalf Of Scott Sadowsky
Sent: 25 July 2016 17:15

To: Open source development of the Corpus WorkBench
Cc: Open source development of the Corpus WorkBench
Subject: Re: [CWB] WebInABox: Can't import existing corpora from host



On Mon, Jul 25, 2016 at 5:48 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:



Try running



          select * from xml_metadata;



in the MySQL command line client, and see what you get.



This is what I get:



$ mysql -u root -p cqpweb

Enter password:

Reading table information for completion of table and column names

[...]

mysql> select * from xml_metadata;

+----+------------+---------+------------+-------------+----------+

| id | corpus     | handle  | att_family | description | datatype |

+----+------------+---------+------------+-------------+----------+

|  1 | bncsampler | s       | s          | s           |        0 |

|  2 | bncsampler | text    | text       | text        |        0 |

|  3 | bncsampler | text_id | text       | text_id     |        3 |

|  4 | lcmc       | s       | s          | s           |        0 |

|  5 | lcmc       | text    | text       | text        |        0 |

|  6 | lcmc       | text_id | text       | text_id     |        3 |

+----+------------+---------+------------+-------------+----------+

6 rows in set (0.00 sec)



mysql>





I have noted something anomalous on another front which may be relevant. When I go to the "Manage Metadata" page of the corpus I'm trying to get set up, and hit the "Create minimalist metadata table" button, I get an error which has nothing to do with my current corpus:



The data source you specified for the text metadata contains badly-formatted text ID codes, as follows: <strong> '<no annotation>'; 'CCN-F2-01_Ca_St.ortografica.txt'; 'CCN-F2-02_D_StB.ortografica.txt'; 'CCN-F2-03_Ca_St.ortografica.txt'; 'CCN-F2-04_Cb_St.ortografica.txt';[...]</strong> (text ids can only contain unaccented letters, numbers, and underscore).



None of these values are present in my current corpus, though they were in an earlier version, However, I removed them from the tagged texts after you explained that these values had to be handles. Here's what my metadata currently looks like:



<text id="CCN_F2_27_B" corpus="coscach" tagger="freeling_xml" language="spanish" channel="oral" instrument="interview" lingualism="monolingual" location="concepcion" sex="f" generation="G2" sel="B">



So values like 'CCN-F2-01_Ca_St.ortografica.txt' are not in my corpus any more (and I recompiled it from these files, of course), but they seem to be cached somewhere by CQPweb, and they are not getting updated by newer corpora I try to import. (Note that I've used different names, e.g. test_corpus, test_corpus_two, in order to try to get around this, but it hasn't worked).



Cheers,
Scott







best



Andrew.







From: cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it>] On Behalf Of Scott Sadowsky
Sent: 24 July 2016 17:17
To: Open source development of the Corpus WorkBench
Cc: Open source development of the Corpus WorkBench
Subject: Re: [CWB] WebInABox: Can't import existing corpora from host



On Sun, Jul 24, 2016 at 11:29 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:



First point – your text ID codes won’t work, they need to be handles, i.e. just ASCII letters, numbers, and underscore – no hyphens/full stops.



Now corrected!



Second point – the various s-attributes text_corpus , text_tagger etc. need (a) to exist in the registry – did your correction fix this? (b) CQPweb needs to have logged their existence – if it’s saying “No XML annotations found” that suggests it hasn’t, which could be a consequence of (a), or could be a bug.



Unless I'm mistaken about what attributes are what, they are indeed in the registry. I've pasted it at the end of this e-mail, along with a single tagged source text sentence.



There was in fact a bug with s-attributes in the registry failing to be detected which I fixed a few months back: I cannot recall if that was before or after the version of the code in the VM image. If you want to rule this out, connect the VM’s networking, upgrade CQPweb to the latest version from SVN (don’t forget to do the database upgrade!), and try again: if that fixes it, it was the old bug.



I've been using revision 879 (3.2.20) the whole time, so it shouldn't be the old bug.





Once CQPweb is aware of your XML attributes you should be able to use them to derive text metadata.



Thanks for your patience!



Cheers,

Scott





<text id="CCN_F2_25_Ca" corpus="test_two" tagger="freeling_xml" language="spanish" channel="oral" instrument="interview" lingualism="monolingual" location="concepcion" sex="f" generation="G2" sel="Ca">

<s>

¿       ¿       Fia     Fia     punctuation     questionmark

todavía todavía RG      RG      adverb  general

está    estar   VAIP3S0 VAI     verb    auxiliary

grabando        grabar  VMG0000 VMG     verb    main

?       ?       Fit     Fit     punctuation     questionmark

</s>

</text>







##

## registry entry for corpus TEST_TWO

##



# long descriptive name for the corpus

NAME ""

# corpus ID (must be lowercase in registry!)

ID   test_two

# path to binary data files

HOME /var/cqpweb/index/test_two

# optional info file (displayed by "info;" command in CQP)

INFO /var/cqpweb/index/test_two/.info



# corpus properties provide additional information about the corpus:

##:: charset  = "utf8" # character encoding of corpus data

##:: language = "es"     # insert ISO code for language (de, en, fr, ...)





##

## p-attributes (token annotations)

##



ATTRIBUTE word

ATTRIBUTE lemma

ATTRIBUTE tag

ATTRIBUTE ctag

ATTRIBUTE pos

ATTRIBUTE type





##

## s-attributes (structural markup)

##



# <s> ... </s>

# (no recursive embedding allowed)

STRUCTURE s



# <text id=".." corpus=".." tagger=".." file=".." language=".." channel=".." instrument=".." lingualism=".." location=".." sex=".." generation=".." sel=".."> ... </text>

# (no recursive embedding allowed)

STRUCTURE text

STRUCTURE text_id              # [annotations]

STRUCTURE text_corpus          # [annotations]

STRUCTURE text_tagger          # [annotations]

STRUCTURE text_file            # [annotations]

STRUCTURE text_language        # [annotations]

STRUCTURE text_channel         # [annotations]

STRUCTURE text_instrument      # [annotations]

STRUCTURE text_lingualism      # [annotations]

STRUCTURE text_location        # [annotations]

STRUCTURE text_sex             # [annotations]

STRUCTURE text_generation      # [annotations]

STRUCTURE text_sel             # [annotations]





# Yours sincerely, the Encode tool.







From: cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it>] On Behalf Of Scott Sadowsky
Sent: 24 July 2016 15:52
To: CWBdev Mailing List

Subject: [CWB] WebInABox: Can't import existing corpora from host



On Sun, Jul 24, 2016 at 10:19 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>> wrote:



CQPweb requires all corpora to have at least one <text> element, and every text element has to have an id i.e. everything within the corpus has to be contained within a sequence of one or more



<text id=”somethinghere”>

…

</text>



Thanks, Andrew. It turns out the problem was that I had been using the name "id" instead of "text" for the element. Now that I've changed that, I was able to successfully create the corpus in CQPweb.



My source files have quite a bit of metadata, which I've encoded as follows:



<text id="CCN-F2-02_D_StB.ortografica.txt" corpus="test" tagger="freeling-xml" language="spanish" location="concepcion" sex="f">

...

</text>

I'm now at the CQPweb "Design and insert a text-metadata table for the corpus" page, but it tells me that "No XML annotations found for this corpus". Is there something wrong with how I did the encoding above? I can use all of these XML elements in cqp searches directly, but here they aren't recognized.



(I've checked chapter 6 of the manual, to no avail).



_______________________________________________
CWB mailing list
CWB at liste.sslmit.unibo.it<mailto:CWB at liste.sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20160801/2e512e35/attachment-0001.html>


More information about the CWB mailing list