[CWB] Can't create metadata

Jiayue Wang arthur0421 at gmail.com
Mon Nov 14 16:51:48 CET 2016


Thanks, Andrew and Hannah. I think I know what to do now.

Best,
Jiayue

On 14/11/16 15:46, Hannah Kermes wrote:
> As Andrew said. You can't nest <text> elements. In the case of labeling
> smaller units as <text>, the larger units are not enclosed in <text>
> elements in these cases we used an attribute to mark the elements
> belonging to the larger unit.
>
> But for the beginning it is easier to stick to "Texts" as <text> elements.
>
> Ciao, ciao
>
> Hannah
>
>
> Am 14.11.2016 um 16:33 schrieb Hardie, Andrew:
>> You can't nest <text> elements!
>>
>> If you want to delineate sub-text units, use some other tag: e.g.
>> <section type="XXX"> or something like that.
>>
>> Andrew.
>>
>> -----Original Message-----
>> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it]
>> On Behalf Of Jiayue Wang
>> Sent: 14 November 2016 15:27
>> To: Open source development of the Corpus WorkBench
>> Subject: Re: [CWB] Can't create metadata
>>
>> Thanks Hannah. Do you mean in those corpora both whole texts and their
>> sections were enclosed between tags, something like
>> <text><text>...</text><text>...</text></text>?
>>
>> On 14/11/16 14:20, Hannah Kermes wrote:
>>> Hi Jiayue,
>>>
>>> the <text>-elements are used for the build-in distribution of CQPweb,
>>> so it makes sence to ask yourself what is most usefully enclosed in
>>> these elements.
>>>
>>> Usually, you will enclose every text in your corpus in a separate
>>> <text>-element, this could be articles, essays, whole books, depending
>>> on what your corpus consists of. But we also had corpora where we
>>> enclosed smaller units, e.g. chapters of a book or utterances in
>>> <text>-elements to be able to use the build-in distribution.
>>>
>>> The metadata allow to group the texts into different subcorpora (e.g.
>>> author_sex, year, register, genre). Each column (in the
>>> tab-deliminated
>>> file) or each attribute in the <text>-elment stands for a different
>>> set of subcorpora (author_sex: male, female; register: academic, news,
>>> ...)
>>>
>>> Best
>>>
>>> Hannah
>>>
>>>
>>> Am 14.11.2016 um 15:08 schrieb Hardie, Andrew:
>>>> Daniel's sample of a datafile exemplifies one of the two methods for
>>>> more-than-minimal text metadata. This can either be loaded from a
>>>> tab-delimited file, or deduced from XML. The latter method is the one
>>>> Daniel exemplifies.
>>>>
>>>>
>>>>
>>>> For /minimal/ metadata you only require text with the ID attribute
>>>> (whose values must be /handles/, i.e. just letters, numbers with no
>>>> space / punctuation).
>>>>
>>>>
>>>>
>>>> It is a rule of CQPweb corpora that the whole corpus needs to occur
>>>> within <text> elements, each of which must have an id, and there
>>>> can't be any words that are not inside a <text> element. If you don't
>>>> care about text boundaries, you can just wrap the whole corpus in one
>>>> <text id="CORPUS"> . </text>
>>>>
>>>>
>>>>
>>>> This is explained in my paper:
>>>>
>>>>
>>>>
>>>>    * Hardie, Andrew (2012). CQPweb - combining power, flexibility and
>>>>      usability in a corpus analysis tool
>>>>
>>>> <http://www.ingentaconnect.com/content/jbp/ijcl/2012/00000017/00000003/art00004>.
>>>> /International
>>>>      Journal of Corpus Linguistics/ 17 (3): 380-409. [alternative link]
>>>>      <http://www.lancs.ac.uk/staff/hardiea/cqpweb-paper.pdf>
>>>>
>>>>
>>>>
>>>> Sorry it's not written up in the manual yet, only so many hours in a
>>>> day alas.
>>>>
>>>>
>>>>
>>>> best
>>>>
>>>>
>>>>
>>>> Andrew.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> *From:*cwb-bounces at sslmit.unibo.it
>>>> [mailto:cwb-bounces at sslmit.unibo.it] *On Behalf Of *Daniel Renau
>>>> *Sent:* 14 November 2016 13:46
>>>> *To:* Open source development of the Corpus WorkBench
>>>> *Subject:* Re: [CWB] Can't create metadata
>>>>
>>>>
>>>>
>>>> Hi jiayue,
>>>>
>>>> My team works with verticalized texts like this:
>>>>
>>>> <text id="ST1" title="namewithoutspaces" author="name"> <s> word pos
>>>> lemma word pos lemma word pos lemma word pos lemma </s> </text>
>>>>
>>>> <text id="ST2" title="anothertextname" author="otherperson"> <s> word
>>>> pos lemma word pos lemma word pos lemma </s> </text>
>>>>
>>>> You can add more text tags as: author_sex, language, year,
>>>> translator...
>>>>
>>>>
>>>>
>>>> El 14 nov. 2016 2:37 p. m., "Jiayue Wang" <arthur0421 at gmail.com
>>>> <mailto:arthur0421 at gmail.com>> escribió:
>>>>
>>>> Thanks Andrew. I still don't understand where the tags <text id="">
>>>> and </text> should be added. Should they enclose a corpus file? I
>>>> notice that section 7.6 "Metadata template" of the CQPwebAdminManual
>>>> is empty. Could you show me a template?
>>>>
>>>> Best,
>>>> Jiayue
>>>>
>>>> On 14/11/16 09:38, Hardie, Andrew wrote:
>>>>
>>>> Well it looks rather as if you don't have any text tags at all
>>>> there... which would be part of the problem. Try again with <text
>>>> id="...">...</text> tags added to the file, as required.
>>>>
>>>> As for why indexing is taking so long, it's very difficult for me to
>>>> diagnose at a distance. You should keep an eye on your process list
>>>> (e.g. via top) to see if anything is actually happening. As long as a
>>>> cwb-*** process is running, something productive is happening, and
>>>> you shouldn't abort.
>>>>
>>>> best
>>>>
>>>> Andrew.
>>>>
>>>> -----Original Message-----
>>>> From: cwb-bounces at sslmit.unibo.it
>>>> <mailto:cwb-bounces at sslmit.unibo.it>
>>>> [mailto:cwb-bounces at sslmit.unibo.it
>>>> <mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of Jiayue Wang
>>>> Sent: 13 November 2016 11:06
>>>> To: Open source development of the Corpus WorkBench
>>>> Subject: Re: [CWB] Can't create metadata
>>>>
>>>> Hi Andrew,
>>>>
>>>> Thanks a lot.
>>>> I deleted the us_rhodeisland corpus and tried again to install it.
>>>> The corpus file looks like this:
>>>>
>>>> If      IN      if
>>>> you     PP      you
>>>> have    VBP     have
>>>> any     DT      any
>>>> questions       NNS     question
>>>> or      CC      or
>>>> suggestions     NNS     suggestion
>>>> how     WRB     how
>>>> this    DT      this
>>>> website NN      website
>>>> might   MD      might
>>>> be      VB      be
>>>> improved        VBN     improve
>>>> ,       ,       ,
>>>> please  VB      please
>>>> feel    VB      feel
>>>> free    JJ      free
>>>> to      TO      to
>>>> contact VB      contact
>>>> us      PP      us
>>>> .       SENT    .
>>>>
>>>> The corpus contains only this file (44.0 MB). For P-attribute I
>>>> selected the POS and lemma (TreeTagger format) option. Then I clicked
>>>> Install, 31 files were created in the index/us_rhodeisland folder,
>>>> but the process goes on endlessly. I interrupted this process and
>>>> tried again but the same happened. I'm wondering how long time does
>>>> this approximately take on my laptop, which has 8 GB of ram, and a,
>>>> Intel i5 quadcore CPU?
>>>>
>>>> Best
>>>> Jiayue
>>>>
>>>> On 13/11/16 06:19, Hardie, Andrew wrote:
>>>>
>>>> This error message suggests that your <text> elements lack valid ID
>>>> codes.
>>>>
>>>> The most likely reason for [UNREADABLE] is that you have declared a
>>>> primary annotation, e.g. a part of speech tag, but the annotation in
>>>> question does not exist. This can happen if you use a template that
>>>> your data does not match, for instance.
>>>>
>>>> best
>>>>
>>>> Andrew.
>>>>
>>>> -----Original Message----- From: cwb-bounces at sslmit.unibo.it
>>>> <mailto:cwb-bounces at sslmit.unibo.it>
>>>> [mailto:cwb-bounces at sslmit.unibo.it
>>>> <mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of Jiayue Wang Sent:
>>>> 11 November 2016 20:17 To: Open source development of the Corpus
>>>> WorkBench Subject: [CWB] Can't create metadata
>>>>
>>>> Hi,
>>>>
>>>> After a full installation of CQBweb I installed a corpus called
>>>> "us_rhodeisland" (including 2 files, a raw text, and a TreeTagger
>>>> tagged text) without metadata. Since I have no idea what a metadata
>>>> file looks like, I selected "No thanks, I'll run this myself (safer
>>>> for very large corpora)" and clicked "Create minimalist metadata
>>>> table" and saw the following error message:
>>>>
>>>>
>>>> A MySQL query did not run successfully!
>>>>
>>>>
>>>> Original query: insert into
>>>> ___temp_cqp_text_positions_for_us_rhodeisland (text_id, cqp_begin,
>>>> cqp_end) VALUES ('', 0, 55858),('', 55859, 3058358) /* from User:
>>>> admin | Function: do_append_mysql_comment() | 2016-Nov-11 20:04:20 */
>>>>
>>>>
>>>> Error # 1062: Duplicate entry '' for key 'PRIMARY'
>>>>
>>>>
>>>> BTW, when I try a standard query, each concordance line begins with
>>>> "[UNREADABLE] [UNREADABLE] [UNREADABLE]". What is the most likely
>>>> reason?
>>>>
>>>> Any help is appreciated, thanks!
>>>>
>>>> Jiayue Wang _______________________________________________ CWB
>>>> mailing list CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>> _______________________________________________ CWB mailing list
>>>> CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>>
>>>> _______________________________________________
>>>> CWB mailing list
>>>> CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>> _______________________________________________
>>>> CWB mailing list
>>>> CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>>
>>>> _______________________________________________
>>>> CWB mailing list
>>>> CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> CWB mailing list
>>>> CWB at sslmit.unibo.it
>>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>
>>>
>>> _______________________________________________
>>> CWB mailing list
>>> CWB at sslmit.unibo.it
>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list