[CWB] Can't create metadata
Hardie, Andrew
a.hardie at lancaster.ac.uk
Mon Nov 14 15:08:14 CET 2016
Daniel’s sample of a datafile exemplifies one of the two methods for more-than-minimal text metadata. This can either be loaded from a tab-delimited file, or deduced from XML. The latter method is the one Daniel exemplifies.
For minimal metadata you only require text with the ID attribute (whose values must be handles, i.e. just letters, numbers with no space / punctuation).
It is a rule of CQPweb corpora that the whole corpus needs to occur within <text> elements, each of which must have an id, and there can’t be any words that are not inside a <text> element. If you don’t care about text boundaries, you can just wrap the whole corpus in one <text id="CORPUS"> … </text>
This is explained in my paper:
* Hardie, Andrew (2012). CQPweb – combining power, flexibility and usability in a corpus analysis tool<http://www.ingentaconnect.com/content/jbp/ijcl/2012/00000017/00000003/art00004>. International Journal of Corpus Linguistics 17 (3): 380-409. [alternative link]<http://www.lancs.ac.uk/staff/hardiea/cqpweb-paper.pdf>
Sorry it’s not written up in the manual yet, only so many hours in a day alas…
best
Andrew.
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Daniel Renau
Sent: 14 November 2016 13:46
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Can't create metadata
Hi jiayue,
My team works with verticalized texts like this:
<text id="ST1" title="namewithoutspaces" author="name">
<s>
word pos lemma
word pos lemma
word pos lemma
word pos lemma
</s>
</text>
<text id="ST2" title="anothertextname" author="otherperson">
<s>
word pos lemma
word pos lemma
word pos lemma
</s>
</text>
You can add more text tags as: author_sex, language, year, translator...
El 14 nov. 2016 2:37 p. m., "Jiayue Wang" <arthur0421 at gmail.com<mailto:arthur0421 at gmail.com>> escribió:
Thanks Andrew. I still don't understand where the tags <text id=""> and </text> should be added. Should they enclose a corpus file? I notice that section 7.6 "Metadata template" of the CQPwebAdminManual is empty. Could you show me a template?
Best,
Jiayue
On 14/11/16 09:38, Hardie, Andrew wrote:
Well it looks rather as if you don't have any text tags at all there... which would be part of the problem. Try again with <text id="...">...</text> tags added to the file, as required.
As for why indexing is taking so long, it's very difficult for me to diagnose at a distance. You should keep an eye on your process list (e.g. via top) to see if anything is actually happening. As long as a cwb-*** process is running, something productive is happening, and you shouldn't abort.
best
Andrew.
-----Original Message-----
From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of Jiayue Wang
Sent: 13 November 2016 11:06
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Can't create metadata
Hi Andrew,
Thanks a lot.
I deleted the us_rhodeisland corpus and tried again to install it. The
corpus file looks like this:
If IN if
you PP you
have VBP have
any DT any
questions NNS question
or CC or
suggestions NNS suggestion
how WRB how
this DT this
website NN website
might MD might
be VB be
improved VBN improve
, , ,
please VB please
feel VB feel
free JJ free
to TO to
contact VB contact
us PP us
. SENT .
The corpus contains only this file (44.0 MB). For P-attribute I selected
the POS and lemma (TreeTagger format) option. Then I clicked Install, 31
files were created in the index/us_rhodeisland folder, but the process
goes on endlessly. I interrupted this process and tried again but the
same happened. I'm wondering how long time does this approximately take
on my laptop, which has 8 GB of ram, and a, Intel i5 quadcore CPU?
Best
Jiayue
On 13/11/16 06:19, Hardie, Andrew wrote:
This error message suggests that your <text> elements lack valid ID
codes.
The most likely reason for [UNREADABLE] is that you have declared a
primary annotation, e.g. a part of speech tag, but the annotation in
question does not exist. This can happen if you use a template that
your data does not match, for instance.
best
Andrew.
-----Original Message----- From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>
[mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>] On Behalf Of Jiayue Wang Sent:
11 November 2016 20:17 To: Open source development of the Corpus
WorkBench Subject: [CWB] Can't create metadata
Hi,
After a full installation of CQBweb I installed a corpus called
"us_rhodeisland" (including 2 files, a raw text, and a TreeTagger
tagged text) without metadata. Since I have no idea what a metadata
file looks like, I selected "No thanks, I'll run this myself (safer
for very large corpora)" and clicked "Create minimalist metadata
table" and saw the following error message:
A MySQL query did not run successfully!
Original query: insert into
___temp_cqp_text_positions_for_us_rhodeisland (text_id, cqp_begin,
cqp_end) VALUES ('', 0, 55858),('', 55859, 3058358) /* from User:
admin | Function: do_append_mysql_comment() | 2016-Nov-11 20:04:20
*/
Error # 1062: Duplicate entry '' for key 'PRIMARY'
BTW, when I try a standard query, each concordance line begins with
"[UNREADABLE] [UNREADABLE] [UNREADABLE]". What is the most likely
reason?
Any help is appreciated, thanks!
Jiayue Wang _______________________________________________ CWB
mailing list CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
_______________________________________________ CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20161114/fa5d24a3/attachment-0001.html>
More information about the CWB
mailing list