[CWB] Can't create metadata

Daniel Renau alphak87 at gmail.com
Mon Nov 14 14:45:38 CET 2016


Hi jiayue,

My team works with verticalized texts like this:

<text id="ST1" title="namewithoutspaces" author="name">
<s>
word pos lemma
word pos lemma
word pos lemma
word pos lemma
</s>
</text>

<text id="ST2" title="anothertextname" author="otherperson">
<s>
word pos lemma
word pos lemma
word pos lemma
</s>
</text>

You can add more text tags as: author_sex, language, year, translator...

El 14 nov. 2016 2:37 p. m., "Jiayue Wang" <arthur0421 en gmail.com> escribió:

> Thanks Andrew. I still don't understand where the tags <text id=""> and
> </text> should be added. Should they enclose a corpus file? I notice that
> section 7.6 "Metadata template" of the CQPwebAdminManual is empty. Could
> you show me a template?
>
> Best,
> Jiayue
>
> On 14/11/16 09:38, Hardie, Andrew wrote:
>
>> Well it looks rather as if you don't have any text tags at all there...
>> which would be part of the problem. Try again with <text
>> id="...">...</text> tags added to the file, as required.
>>
>> As for why indexing is taking so long, it's very difficult for me to
>> diagnose at a distance. You should keep an eye on your process list (e.g.
>> via top) to see if anything is actually happening. As long as a cwb-***
>> process is running, something productive is happening, and you shouldn't
>> abort.
>>
>> best
>>
>> Andrew.
>>
>> -----Original Message-----
>> From: cwb-bounces en sslmit.unibo.it [mailto:cwb-bounces en sslmit.unibo.it]
>> On Behalf Of Jiayue Wang
>> Sent: 13 November 2016 11:06
>> To: Open source development of the Corpus WorkBench
>> Subject: Re: [CWB] Can't create metadata
>>
>> Hi Andrew,
>>
>> Thanks a lot.
>> I deleted the us_rhodeisland corpus and tried again to install it. The
>> corpus file looks like this:
>>
>> If      IN      if
>> you     PP      you
>> have    VBP     have
>> any     DT      any
>> questions       NNS     question
>> or      CC      or
>> suggestions     NNS     suggestion
>> how     WRB     how
>> this    DT      this
>> website NN      website
>> might   MD      might
>> be      VB      be
>> improved        VBN     improve
>> ,       ,       ,
>> please  VB      please
>> feel    VB      feel
>> free    JJ      free
>> to      TO      to
>> contact VB      contact
>> us      PP      us
>> .       SENT    .
>>
>> The corpus contains only this file (44.0 MB). For P-attribute I selected
>> the POS and lemma (TreeTagger format) option. Then I clicked Install, 31
>> files were created in the index/us_rhodeisland folder, but the process
>> goes on endlessly. I interrupted this process and tried again but the
>> same happened. I'm wondering how long time does this approximately take
>> on my laptop, which has 8 GB of ram, and a, Intel i5 quadcore CPU?
>>
>> Best
>> Jiayue
>>
>> On 13/11/16 06:19, Hardie, Andrew wrote:
>>
>>> This error message suggests that your <text> elements lack valid ID
>>> codes.
>>>
>>> The most likely reason for [UNREADABLE] is that you have declared a
>>> primary annotation, e.g. a part of speech tag, but the annotation in
>>> question does not exist. This can happen if you use a template that
>>> your data does not match, for instance.
>>>
>>> best
>>>
>>> Andrew.
>>>
>>> -----Original Message----- From: cwb-bounces en sslmit.unibo.it
>>> [mailto:cwb-bounces en sslmit.unibo.it] On Behalf Of Jiayue Wang Sent:
>>> 11 November 2016 20:17 To: Open source development of the Corpus
>>> WorkBench Subject: [CWB] Can't create metadata
>>>
>>> Hi,
>>>
>>> After a full installation of CQBweb I installed a corpus called
>>> "us_rhodeisland" (including 2 files, a raw text, and a TreeTagger
>>> tagged text) without metadata. Since I have no idea what a metadata
>>> file looks like, I selected "No thanks, I'll run this myself (safer
>>> for very large corpora)" and clicked "Create minimalist metadata
>>> table" and saw the following error message:
>>>
>>>
>>> A MySQL query did not run successfully!
>>>
>>>
>>> Original query: insert into
>>> ___temp_cqp_text_positions_for_us_rhodeisland (text_id, cqp_begin,
>>> cqp_end) VALUES ('', 0, 55858),('', 55859, 3058358) /* from User:
>>> admin | Function: do_append_mysql_comment() | 2016-Nov-11 20:04:20
>>> */
>>>
>>>
>>> Error # 1062: Duplicate entry '' for key 'PRIMARY'
>>>
>>>
>>> BTW, when I try a standard query, each concordance line begins with
>>> "[UNREADABLE] [UNREADABLE] [UNREADABLE]". What is the most likely
>>> reason?
>>>
>>> Any help is appreciated, thanks!
>>>
>>> Jiayue Wang _______________________________________________ CWB
>>> mailing list CWB en sslmit.unibo.it
>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>> _______________________________________________ CWB mailing list
>>> CWB en sslmit.unibo.it
>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>>
>>> _______________________________________________
>> CWB mailing list
>> CWB en sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>> _______________________________________________
>> CWB mailing list
>> CWB en sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>
>> _______________________________________________
> CWB mailing list
> CWB en sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
------------ pr�xima parte ------------
Se ha borrado un adjunto en formato HTML...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20161114/42afbc01/attachment.html>


More information about the CWB mailing list