[CWB] WebInABox: Can't import existing corpora from host

Scott Sadowsky ssadowsky at gmail.com
Mon Aug 1 19:08:22 CEST 2016


Thanks so much, Andrew -- that did the trick, and everything's working
splendidly now.

Best,
Scott

On Mon, Aug 1, 2016 at 10:10 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
wrote:

> Sure.
>
>
>
> 1.    Scrub your existing text metadata table : go to Manage Metadata ,
> form headed “Reset the metadata table for this corpus” – tick the “are you
> sure box” then click button “Delete metadata table for this corpus”
>
> 2.    You’ll see the install metadata form.
>
> 3.    Go down to the link “Click here to install metadata from
> within-corpus XML annotation”
>
> 4.    See a form listing your text metadata XML attributes
>
> 5.    Select the ones you wish to use for text metadata, and supply extra
> details as needed.
>
>
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at liste.sslmit.unibo.it [mailto:
> cwb-bounces at liste.sslmit.unibo.it] *On Behalf Of *Scott Sadowsky
> *Sent:* 01 August 2016 10:23
> *To:* Open source development of the Corpus WorkBench
> *Cc:* Open source development of the Corpus WorkBench
> *Subject:* Re: [CWB] WebInABox: Can't import existing corpora from host
>
>
>
> On Mon, Aug 1, 2016 at 3:26 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
>
>
> Hi Andrew,
>
>
>
> Thanks for dealing with this.
>
> I have just now committed to the repo a fix that stops the restriction
> block from showing text as a sub-text XML element. This means that what was
> previously a non-obviously incorrect way of doing things will simply no
> longer work.
>
> Now, when I try to run a restricted query or create a subcorpus, I'm told
> that "There are no text classification schemes set up for this corpus". I
> assume this is the intended behavior, as all of my metadata is encoded as
> sub-keys of the <text> element )(e.g. <text id="001" language="spanish"
> location="santiago"), and it shows up in the "Manage corpus XML" page.
>
>
>
> The *correct* way in this case is to generate  a text-metadata table from
> the data stored in the “text_*” XML attributes, using the variant of the
> install-text-metadata table accessed via the “” button.
>
> Could you expound on this a bit, please... I can't simply make heads or
> tails of it, I'm afraid!
>
>
>
> Cheers,
>
> Scott
>
>
>
>
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at liste.sslmit.unibo.it [mailto:
> cwb-bounces at liste.sslmit.unibo.it] *On Behalf Of *Hardie, Andrew
> *Sent:* 28 July 2016 16:25
> *To:* Open source development of the Corpus WorkBench
> *Subject:* Re: [CWB] WebInABox: Can't import existing corpora from host
>
>
>
>
>
> Hi Scott,
>
>
>
> This is a bug. It arises, as best as I can tell, from confusion between  a
> text-level restriction, and a restriction on the <text> XML object
> (equivalent, but distinct due to the fact that text metadata exists
> separately from the XML metadata from which it derives, which in turn is a
> consequence of the special status of “text” as an entitity in CQPweb). Can
> you send me (off list) screenshots of the search form with the tickboxes
> you selected that led to this error? I will then investigate.
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at liste.sslmit.unibo.it [
> mailto:cwb-bounces at liste.sslmit.unibo.it
> <cwb-bounces at liste.sslmit.unibo.it>] *On Behalf Of *Scott Sadowsky
> *Sent:* 26 July 2016 17:53
> *To:* Open source development of the Corpus WorkBench
> *Cc:* Open source development of the Corpus WorkBench
> *Subject:* Re: [CWB] WebInABox: Can't import existing corpora from host
>
>
>
> On Tue, Jul 26, 2016 at 12:18 PM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
>
>
> >>> But how do I restrict searches using the s-attributes (say, speaker
> sex)? When I do a query and then select "Distribution", for example, I'm
> told that "This corpus has no text-classification metadata, so the
> distribution cannot be shown".
>
> ·         Go to Restricted query
>
> ·         You should see options to restrict your query to XML segments
> where the given attribute has a particular category handle for any s-att
> that you set to datatype “Classifcation”
>
> Thanks. That makes sense.
>
>
>
> When I run one of these queries, though, CQPweb throws an SQL error
> (pasted below).
>
>
>
> ·         OR, go to “Create / edit subcorpora” and define subcorpora
> using the same control, then use those SCs as restriction criteria.
>
> This also throws an error (also pasted below).
>
>
>
>  Note that non-text-based corpus restrictions and subcorpora aren’t
> currently supported in the Distribution display. I know this is a pain, and
> it’s high on my feature list. (but quite a big job so can’t be done
> quickly!)
>
> I can only imagine!
>
>
>
> Thanks again,
>
> Scott
>
>
>
>
>
>
>
> ===== ERROR 1 =====
>
> CQPweb encountered an error and could not continue.
>
> A MySQL query did not run successfully!
>
>
>
> Original query: SELECT count(*), sum(words) FROM
> text_metadata_for_test_coscach WHERE /* from User: user | Function:
> do_append_mysql_comment() | 2016-Jul-26 16:42:47 */
>
>
>
> Error # 1064: You have an error in your SQL syntax; check the manual that
> corresponds to your MySQL server version for the right syntax to use near
> '' at line 2
>
>
>
> PHP debugging backtrace
>
>
>
> array(7) {
>
>   [1]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(40) "/var/www/html/cqpweb/lib/library.inc.php"
>
>     ["line"]=>
>
>     int(282)
>
>     ["function"]=>
>
>     string(20) "exiterror_mysqlquery"
>
>     ["args"]=>
>
>     array(3) {
>
>       [0]=>
>
>       &int(1064)
>
>       [1]=>
>
>       &string(146) "You have an error in your SQL syntax; check the manual
> that corresponds to your MySQL server version for the right syntax to use
> near '' at line 2"
>
>       [2]=>
>
>       &string(156) "SELECT count(*), sum(words) FROM
> text_metadata_for_test_coscach WHERE
>
>         /* from User: user | Function: do_append_mysql_comment() |
> 2016-Jul-26 16:42:47 */"
>
>     }
>
>   }
>
>   [2]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
>
>     ["line"]=>
>
>     int(1556)
>
>     ["function"]=>
>
>     string(14) "do_mysql_query"
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       &string(71) "SELECT count(*), sum(words) FROM
> text_metadata_for_test_coscach WHERE  "
>
>     }
>
>   }
>
>   [3]=>
>
>   array(7) {
>
>     ["file"]=>
>
>     string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
>
>     ["line"]=>
>
>     int(1214)
>
>     ["function"]=>
>
>     string(15) "initialise_size"
>
>     ["class"]=>
>
>     string(11) "Restriction"
>
>     ["object"]=>
>
>     object(Restriction)#14 (15) {
>
>       ["serialised":"Restriction":private]=>
>
>       string(26) "$^text|location~concepcion"
>
>       ["parsed_conditions":"Restriction":private]=>
>
>       array(1) {
>
>         ["text"]=>
>
>         array(1) {
>
>           [0]=>
>
>           string(19) "location~concepcion"
>
>         }
>
>       }
>
>       ["stored_text_metadata_where":"Restriction":private]=>
>
>       NULL
>
>       ["stored_idlink_where":"Restriction":private]=>
>
>       NULL
>
>       ["cpos_collection":"Restriction":private]=>
>
>       NULL
>
>       ["corpus":"Restriction":private]=>
>
>       string(12) "test_coscach"
>
>       ["item_type":"Restriction":private]=>
>
>       string(4) "text"
>
>       ["n_items":"Restriction":private]=>
>
>       NULL
>
>       ["n_tokens":"Restriction":private]=>
>
>       NULL
>
>       ["freqtable_record":"Restriction":private]=>
>
>       NULL
>
>       ["hasrun_initialise_text_metadata_where":"Restriction":private]=>
>
>       bool(false)
>
>       ["hasrun_initialise_idlink_where":"Restriction":private]=>
>
>       bool(false)
>
>       ["hasrun_initialise_cpos_collection":"Restriction":private]=>
>
>       bool(false)
>
>       ["hasrun_initialise_size":"Restriction":private]=>
>
>       bool(false)
>
>       ["needs_to_be_added_to_cache":"Restriction":private]=>
>
>       bool(false)
>
>     }
>
>     ["type"]=>
>
>     string(2) "->"
>
>     ["args"]=>
>
>     array(0) {
>
>     }
>
>   }
>
>   [4]=>
>
>   array(6) {
>
>     ["file"]=>
>
>     string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
>
>     ["line"]=>
>
>     int(670)
>
>     ["function"]=>
>
>     string(12) "new_from_url"
>
>     ["class"]=>
>
>     string(11) "Restriction"
>
>     ["type"]=>
>
>     string(2) "::"
>
>     ["args"]=>
>
>     array(2) {
>
>       [0]=>
>
>       &string(85)
> "theData=gente&qmode=sq_nocase&pp=50&del=begin&t=text|location~concepcion&del=end&uT=y"
>
>       [1]=>
>
>       &bool(true)
>
>     }
>
>   }
>
>   [5]=>
>
>   array(7) {
>
>     ["file"]=>
>
>     string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
>
>     ["line"]=>
>
>     int(589)
>
>     ["function"]=>
>
>     string(14) "parse_from_url"
>
>     ["class"]=>
>
>     string(10) "QueryScope"
>
>     ["object"]=>
>
>     object(QueryScope)#15 (4) {
>
>       ["type"]=>
>
>       int(0)
>
>       ["restriction":"QueryScope":private]=>
>
>       NULL
>
>       ["subcorpus":"QueryScope":private]=>
>
>       NULL
>
>       ["serialised":"QueryScope":private]=>
>
>       string(0) ""
>
>     }
>
>     ["type"]=>
>
>     string(2) "->"
>
>     ["args"]=>
>
>     array(2) {
>
>       [0]=>
>
>       &string(89)
> "theData=gente&qmode=sq_nocase&pp=50&del=begin&t=text%7Clocation%7Econcepcion&del=end&uT=y"
>
>       [1]=>
>
>       &bool(true)
>
>     }
>
>   }
>
>   [6]=>
>
>   array(6) {
>
>     ["file"]=>
>
>     string(44) "/var/www/html/cqpweb/lib/concordance.inc.php"
>
>     ["line"]=>
>
>     int(156)
>
>     ["function"]=>
>
>     string(12) "new_from_url"
>
>     ["class"]=>
>
>     string(10) "QueryScope"
>
>     ["type"]=>
>
>     string(2) "::"
>
>     ["args"]=>
>
>     array(2) {
>
>       [0]=>
>
>       &string(89)
> "theData=gente&qmode=sq_nocase&pp=50&del=begin&t=text%7Clocation%7Econcepcion&del=end&uT=y"
>
>       [1]=>
>
>       &bool(true)
>
>     }
>
>   }
>
>   [7]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(40) "/var/www/html/cqpweb/exe/concordance.php"
>
>     ["line"]=>
>
>     int(1)
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       string(44) "/var/www/html/cqpweb/lib/concordance.inc.php"
>
>     }
>
>     ["function"]=>
>
>     string(7) "require"
>
>   }
>
> }
>
>
>
> ===== ERROR 2 =====
>
>
>
> CQPweb encountered an error and could not continue.
>
> A MySQL query did not run successfully!
>
>
>
> Original query: SELECT count(*), sum(words) FROM
> text_metadata_for_test_coscach WHERE /* from User: user | Function:
> do_append_mysql_comment() | 2016-Jul-26 16:49:17 */
>
>
>
> Error # 1064: You have an error in your SQL syntax; check the manual that
> corresponds to your MySQL server version for the right syntax to use near
> '' at line 2
>
>
>
>
>
> PHP debugging backtrace
>
>
>
> array(5) {
>
>   [1]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(40) "/var/www/html/cqpweb/lib/library.inc.php"
>
>     ["line"]=>
>
>     int(282)
>
>     ["function"]=>
>
>     string(20) "exiterror_mysqlquery"
>
>     ["args"]=>
>
>     array(3) {
>
>       [0]=>
>
>       &int(1064)
>
>       [1]=>
>
>       &string(146) "You have an error in your SQL syntax; check the manual
> that corresponds to your MySQL server version for the right syntax to use
> near '' at line 2"
>
>       [2]=>
>
>       &string(156) "SELECT count(*), sum(words) FROM
> text_metadata_for_test_coscach WHERE
>
>         /* from User: user | Function: do_append_mysql_comment() |
> 2016-Jul-26 16:49:17 */"
>
>     }
>
>   }
>
>   [2]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
>
>     ["line"]=>
>
>     int(1556)
>
>     ["function"]=>
>
>     string(14) "do_mysql_query"
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       &string(71) "SELECT count(*), sum(words) FROM
> text_metadata_for_test_coscach WHERE  "
>
>     }
>
>   }
>
>   [3]=>
>
>   array(7) {
>
>     ["file"]=>
>
>     string(42) "/var/www/html/cqpweb/lib/subcorpus.inc.php"
>
>     ["line"]=>
>
>     int(1214)
>
>     ["function"]=>
>
>     string(15) "initialise_size"
>
>     ["class"]=>
>
>     string(11) "Restriction"
>
>     ["object"]=>
>
>     object(Restriction)#16 (15) {
>
>       ["serialised":"Restriction":private]=>
>
>       string(26) "$^text|location~concepcion"
>
>       ["parsed_conditions":"Restriction":private]=>
>
>       array(1) {
>
>         ["text"]=>
>
>         array(1) {
>
>           [0]=>
>
>           string(19) "location~concepcion"
>
>         }
>
>       }
>
>       ["stored_text_metadata_where":"Restriction":private]=>
>
>       NULL
>
>       ["stored_idlink_where":"Restriction":private]=>
>
>       NULL
>
>       ["cpos_collection":"Restriction":private]=>
>
>       NULL
>
>       ["corpus":"Restriction":private]=>
>
>       string(12) "test_coscach"
>
>       ["item_type":"Restriction":private]=>
>
>       string(4) "text"
>
>       ["n_items":"Restriction":private]=>
>
>       NULL
>
>       ["n_tokens":"Restriction":private]=>
>
>       NULL
>
>       ["freqtable_record":"Restriction":private]=>
>
>       NULL
>
>       ["hasrun_initialise_text_metadata_where":"Restriction":private]=>
>
>       bool(false)
>
>       ["hasrun_initialise_idlink_where":"Restriction":private]=>
>
>       bool(false)
>
>       ["hasrun_initialise_cpos_collection":"Restriction":private]=>
>
>       bool(false)
>
>       ["hasrun_initialise_size":"Restriction":private]=>
>
>       bool(false)
>
>       ["needs_to_be_added_to_cache":"Restriction":private]=>
>
>       bool(false)
>
>     }
>
>     ["type"]=>
>
>     string(2) "->"
>
>     ["args"]=>
>
>     array(0) {
>
>     }
>
>   }
>
>   [4]=>
>
>   array(6) {
>
>     ["file"]=>
>
>     string(48) "/var/www/html/cqpweb/lib/subcorpus-admin.inc.php"
>
>     ["line"]=>
>
>     int(128)
>
>     ["function"]=>
>
>     string(12) "new_from_url"
>
>     ["class"]=>
>
>     string(11) "Restriction"
>
>     ["type"]=>
>
>     string(2) "::"
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       &string(178)
> "subcorpusNewName=concepcion&action=Create+subcorpus+from+selected+categories&scriptMode=create_from_metadata&thisQ=subcorpus&del=begin&t=text%7Clocation%7Econcepcion&del=end&uT=y"
>
>     }
>
>   }
>
>   [5]=>
>
>   array(4) {
>
>     ["file"]=>
>
>     string(44) "/var/www/html/cqpweb/exe/subcorpus-admin.php"
>
>     ["line"]=>
>
>     int(1)
>
>     ["args"]=>
>
>     array(1) {
>
>       [0]=>
>
>       string(48) "/var/www/html/cqpweb/lib/subcorpus-admin.inc.php"
>
>     }
>
>     ["function"]=>
>
>     string(7) "require"
>
>   }
>
> }
>
>
>
> *From:* cwb-bounces at liste.sslmit.unibo.it [mailto:
> cwb-bounces at liste.sslmit.unibo.it] *On Behalf Of *Scott Sadowsky
> *Sent:* 26 July 2016 17:12
>
>
> *To:* Open source development of the Corpus WorkBench
> *Cc:* Open source development of the Corpus WorkBench
> *Subject:* Re: [CWB] WebInABox: Can't import existing corpora from host
>
>
>
> On Tue, Jul 26, 2016 at 7:25 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
>
>
> Hi Andrew,
>
> I have had a dig, and found the bug (it was a regex glitch parsing the
> inserted registry file). Update the code to rev 880 and you should find
> that the system will obediently detect your s-attributes. (You will still,
> naturally, need to go through the first step that IO mentioned,  of making
> sure all data from earlier passes is properly scrubbed.)
>
> Eureka - with this new rev CQPweb now imports my XML metadata! Thanks so
> much for hunting this down and fixing it!
>
>
>
> I've now done the following:
>
>
>
> 1. I went through the "Manage Corpus XML" page and set descriptions and
> data types, defining the attributes I want to be able to search on in
> queries, subqueries, sub-corpora, etc. to "classification" (e.g. speaker
> sex and location).
>
>
>
> 2. I went through the "Manage Annotation" page and linked the "Annotation
> setup for CEQL queries" fields to the various annotation data in my corpus.
>
>
>
> 3. On the "Manage frequency lists" page I (re)generated everything (I've
> attached the metadata table from mysql below).
>
>
>
> I can now perform queries, and my metadata is recognized. But how do I
> restrict searches using the s-attributes (say, speaker sex)? When I do a
> query and then select "Distribution", for example, I'm told that "This
> corpus has no text-classification metadata, so the distribution cannot be
> shown".
>
>
>
> Thanks!
>
> Scott
>
>
>
>
>
> mysql> select * from xml_metadata;
>
>
> +----+--------------+-----------------+------------+-----------------------------------+----------+
>
> | id | corpus       | handle          | att_family | description
>             | datatype |
>
>
> +----+--------------+-----------------+------------+-----------------------------------+----------+
>
> |  1 | bncsampler   | s               | s          | s
>             |        0 |
>
> |  2 | bncsampler   | text            | text       | text
>              |        0 |
>
> |  3 | bncsampler   | text_id         | text       | text_id
>             |        3 |
>
> |  4 | lcmc         | s               | s          | s
>             |        0 |
>
> |  5 | lcmc         | text            | text       | text
>              |        0 |
>
> |  6 | lcmc         | text_id         | text       | text_id
>             |        3 |
>
> |  7 | test_coscach | s               | s          | Sentence
>              |        0 |
>
> |  8 | test_coscach | text            | text       | Text
>              |        0 |
>
> |  9 | test_coscach | text_id         | text       | Unique Text ID
>              |        3 |
>
> | 10 | test_coscach | text_corpus     | text       | Corpus name
>             |        2 |
>
> | 11 | test_coscach | text_tagger     | text       | Corpus tagger
>             |        2 |
>
> | 12 | test_coscach | text_language   | text       | Text language
>             |        1 |
>
> | 13 | test_coscach | text_channel    | text       | Spoken or written?
>              |        2 |
>
> | 14 | test_coscach | text_instrument | text       | Elicitation
> instrument            |        1 |
>
> | 15 | test_coscach | text_lingualism | text       | Speaker monolingual
> or bilingual? |        1 |
>
> | 16 | test_coscach | text_location   | text       | Speaker location
>              |        1 |
>
> | 17 | test_coscach | text_sex        | text       | Speaker sex
>             |        1 |
>
> | 18 | test_coscach | text_generation | text       | Speaker generation
>              |        1 |
>
> | 19 | test_coscach | text_sel        | text       | Speaker SEL
>             |        1 |
>
>
> +----+--------------+-----------------+------------+-----------------------------------+----------+
>
> 19 rows in set (0.00 sec)
>
>
>
> mysql>
>
>
>
>
>
> *From:* Hardie, Andrew
> *Sent:* 25 July 2016 23:48
> *To:* Open source development of the Corpus WorkBench
> *Subject:* RE: [CWB] WebInABox: Can't import existing corpora from host
>
>
>
> OK, 2 things:
>
>
>
> First – the result of the MySQL query shows that none of the XML of your
> corpus has been detected.
>
>
>
> Second – the other error you report is clearly referring to your earlier
> index data. The check on text ID validity is done at point of extraction
> *from* the index *to *CQPweb’s internal data structures. So, it is
> reading the index and getting bad values. This implies that your earelier
> index files still exist and are being read by CQPweb.
>
>
>
> So, the overall picture would seem to be that you have data hanging around
> from previous incarnations of the corpus, and your reinstallation did not
> work properly. Your best bet might be to make doubly sure everything is
> wiped from that corpus, then start over again. This will probably not fix
> all the problems but it *should* make the issues that remain clearer.
>
>
>
> best
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at liste.sslmit.unibo.it [mailto:
> cwb-bounces at liste.sslmit.unibo.it] *On Behalf Of *Scott Sadowsky
> *Sent:* 25 July 2016 17:15
>
> *To:* Open source development of the Corpus WorkBench
> *Cc:* Open source development of the Corpus WorkBench
> *Subject:* Re: [CWB] WebInABox: Can't import existing corpora from host
>
>
>
> On Mon, Jul 25, 2016 at 5:48 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
>
>
> Try running
>
>
>
>           select * from xml_metadata;
>
>
>
> in the MySQL command line client, and see what you get.
>
>
>
> This is what I get:
>
>
>
> $ mysql -u root -p cqpweb
>
> Enter password:
>
> Reading table information for completion of table and column names
>
> [...]
>
> mysql> select * from xml_metadata;
>
> +----+------------+---------+------------+-------------+----------+
>
> | id | corpus     | handle  | att_family | description | datatype |
>
> +----+------------+---------+------------+-------------+----------+
>
> |  1 | bncsampler | s       | s          | s           |        0 |
>
> |  2 | bncsampler | text    | text       | text        |        0 |
>
> |  3 | bncsampler | text_id | text       | text_id     |        3 |
>
> |  4 | lcmc       | s       | s          | s           |        0 |
>
> |  5 | lcmc       | text    | text       | text        |        0 |
>
> |  6 | lcmc       | text_id | text       | text_id     |        3 |
>
> +----+------------+---------+------------+-------------+----------+
>
> 6 rows in set (0.00 sec)
>
>
>
> mysql>
>
>
>
>
>
> I have noted something anomalous on another front which may be relevant.
> When I go to the "Manage Metadata" page of the corpus I'm trying to get set
> up, and hit the "Create minimalist metadata table" button, I get an error
> which has nothing to do with my current corpus:
>
>
>
> The data source you specified for the text metadata contains
> badly-formatted text ID codes, as follows: <strong> '<no annotation>';
> 'CCN-F2-01_Ca_St.ortografica.txt'; 'CCN-F2-02_D_StB.ortografica.txt';
> 'CCN-F2-03_Ca_St.ortografica.txt';
> 'CCN-F2-04_Cb_St.ortografica.txt';[...]</strong> (text ids can only contain
> unaccented letters, numbers, and underscore).
>
>
>
> None of these values are present in my current corpus, though they *were*
> in an earlier version, However, I removed them from the tagged texts after
> you explained that these values had to be handles. Here's what my metadata
> currently looks like:
>
>
>
> <text id="CCN_F2_27_B" corpus="coscach" tagger="freeling_xml"
> language="spanish" channel="oral" instrument="interview"
> lingualism="monolingual" location="concepcion" sex="f" generation="G2"
> sel="B">
>
>
>
> So values like 'CCN-F2-01_Ca_St.ortografica.txt' are not in my corpus any
> more (and I recompiled it from these files, of course), but they seem to be
> cached somewhere by CQPweb, and they are not getting updated by newer
> corpora I try to import. (Note that I've used different names, e.g.
> test_corpus, test_corpus_two, in order to try to get around this, but it
> hasn't worked).
>
>
>
> Cheers,
> Scott
>
>
>
>
>
>
>
> best
>
>
>
> Andrew.
>
>
>
>
>
>
>
> *From:* cwb-bounces at liste.sslmit.unibo.it [mailto:
> cwb-bounces at liste.sslmit.unibo.it] *On Behalf Of *Scott Sadowsky
> *Sent:* 24 July 2016 17:17
> *To:* Open source development of the Corpus WorkBench
> *Cc:* Open source development of the Corpus WorkBench
> *Subject:* Re: [CWB] WebInABox: Can't import existing corpora from host
>
>
>
> On Sun, Jul 24, 2016 at 11:29 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
>
>
> First point – your text ID codes won’t work, they need to be *handles*,
> i.e. just ASCII letters, numbers, and underscore – no hyphens/full stops.
>
>
>
> Now corrected!
>
>
>
> Second point – the various s-attributes text_corpus , text_tagger etc.
> need (a) to exist in the registry – did your correction fix this? (b)
> CQPweb needs to have logged their existence – if it’s saying “No XML
> annotations found” that suggests it hasn’t, which could be a consequence of
> (a), or could be a bug.
>
>
>
> Unless I'm mistaken about what attributes are what, they are indeed in the
> registry. I've pasted it at the end of this e-mail, along with a single
> tagged source text sentence.
>
>
>
> There was in fact a bug with s-attributes in the registry failing to be
> detected which I fixed a few months back: I cannot recall if that was
> before or after the version of the code in the VM image. If you want to
> rule this out, connect the VM’s networking, upgrade CQPweb to the latest
> version from SVN (don’t forget to do the database upgrade!), and try again:
> if that fixes it, it was the old bug.
>
>
>
> I've been using revision 879 (3.2.20) the whole time, so it shouldn't be
> the old bug.
>
>
>
>
>
> Once CQPweb is aware of your XML attributes you should be able to use them
> to derive text metadata.
>
>
>
> Thanks for your patience!
>
>
>
> Cheers,
>
> Scott
>
>
>
>
>
> <text id="CCN_F2_25_Ca" corpus="test_two" tagger="freeling_xml"
> language="spanish" channel="oral" instrument="interview"
> lingualism="monolingual" location="concepcion" sex="f" generation="G2"
> sel="Ca">
>
> <s>
>
> ¿       ¿       Fia     Fia     punctuation     questionmark
>
> todavía todavía RG      RG      adverb  general
>
> está    estar   VAIP3S0 VAI     verb    auxiliary
>
> grabando        grabar  VMG0000 VMG     verb    main
>
> ?       ?       Fit     Fit     punctuation     questionmark
>
> </s>
>
> </text>
>
>
>
>
>
>
>
> ##
>
> ## registry entry for corpus TEST_TWO
>
> ##
>
>
>
> # long descriptive name for the corpus
>
> NAME ""
>
> # corpus ID (must be lowercase in registry!)
>
> ID   test_two
>
> # path to binary data files
>
> HOME /var/cqpweb/index/test_two
>
> # optional info file (displayed by "info;" command in CQP)
>
> INFO /var/cqpweb/index/test_two/.info
>
>
>
> # corpus properties provide additional information about the corpus:
>
> ##:: charset  = "utf8" # character encoding of corpus data
>
> ##:: language = "es"     # insert ISO code for language (de, en, fr, ...)
>
>
>
>
>
> ##
>
> ## p-attributes (token annotations)
>
> ##
>
>
>
> ATTRIBUTE word
>
> ATTRIBUTE lemma
>
> ATTRIBUTE tag
>
> ATTRIBUTE ctag
>
> ATTRIBUTE pos
>
> ATTRIBUTE type
>
>
>
>
>
> ##
>
> ## s-attributes (structural markup)
>
> ##
>
>
>
> # <s> ... </s>
>
> # (no recursive embedding allowed)
>
> STRUCTURE s
>
>
>
> # <text id=".." corpus=".." tagger=".." file=".." language=".."
> channel=".." instrument=".." lingualism=".." location=".." sex=".."
> generation=".." sel=".."> ... </text>
>
> # (no recursive embedding allowed)
>
> STRUCTURE text
>
> STRUCTURE text_id              # [annotations]
>
> STRUCTURE text_corpus          # [annotations]
>
> STRUCTURE text_tagger          # [annotations]
>
> STRUCTURE text_file            # [annotations]
>
> STRUCTURE text_language        # [annotations]
>
> STRUCTURE text_channel         # [annotations]
>
> STRUCTURE text_instrument      # [annotations]
>
> STRUCTURE text_lingualism      # [annotations]
>
> STRUCTURE text_location        # [annotations]
>
> STRUCTURE text_sex             # [annotations]
>
> STRUCTURE text_generation      # [annotations]
>
> STRUCTURE text_sel             # [annotations]
>
>
>
>
>
> # Yours sincerely, the Encode tool.
>
>
>
>
>
>
>
> *From:* cwb-bounces at liste.sslmit.unibo.it [mailto:
> cwb-bounces at liste.sslmit.unibo.it] *On Behalf Of *Scott Sadowsky
> *Sent:* 24 July 2016 15:52
> *To:* CWBdev Mailing List
>
>
> *Subject:* [CWB] WebInABox: Can't import existing corpora from host
>
>
>
> On Sun, Jul 24, 2016 at 10:19 AM, Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
>
>
>
> CQPweb requires all corpora to have at least one <text> element, and every
> text element has to have an id i.e. everything within the corpus has to be
> contained within a sequence of one or more
>
>
>
> <text id=”somethinghere”>
>
>>
> </text>
>
>
>
> Thanks, Andrew. It turns out the problem was that I had been using the
> name "id" instead of "text" for the element. Now that I've changed that, I
> was able to successfully create the corpus in CQPweb.
>
>
>
> My source files have quite a bit of metadata, which I've encoded as
> follows:
>
>
>
> <text id="CCN-F2-02_D_StB.ortografica.txt" corpus="test" tagger="freeling-xml"
> language="spanish" location="concepcion" sex="f">
>
> ...
>
> </text>
>
>
> I'm now at the CQPweb "Design and insert a text-metadata table for the
> corpus" page, but it tells me that "No XML annotations found for this
> corpus". Is there something wrong with how I did the encoding above? I can
> use all of these XML elements in cqp searches directly, but here they
> aren't recognized.
>
>
>
> (I've checked chapter 6 of the manual, to no avail).
>
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at liste.sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at liste.sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
>
>
> --
>
> Dr. Scott Sadowsky
> Profesor Asistente de Lingüística
>
> Pontificia Universidad Católica de Chile
>
>
>
> ssadowsky gmail com
>
> scsadowsky uc cl
> http://sadowsky.cl/
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at liste.sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>


-- 
Dr. Scott Sadowsky
Profesor Asistente de Lingüística
Pontificia Universidad Católica de Chile

ssadowsky gmail com
scsadowsky uc cl
http://sadowsky.cl/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20160801/eda410e8/attachment-0001.html>


More information about the CWB mailing list