<html>
  <head>
    <meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
  </head>
  <body bgcolor="#FFFFFF" text="#000000">
    <p>Hi Jiayue,
    </p>
    <p>the &lt;text&gt;-elements are used for the build-in distribution
      of CQPweb, so it makes sence to ask yourself what is most usefully
      enclosed in these elements.</p>
    <p>Usually, you will enclose every text in your corpus in a separate
      &lt;text&gt;-element, this could be articles, essays, whole books,
      depending on what your corpus consists of. But we also had corpora
      where we enclosed smaller units, e.g. chapters of a book or
      utterances in &lt;text&gt;-elements to be able to use the build-in
      distribution.</p>
    <p>The metadata allow to group the texts into different subcorpora
      (e.g. author_sex, year, register, genre). Each column (in the
      tab-deliminated file) or each attribute in the &lt;text&gt;-elment
      stands for a different set of subcorpora (author_sex: male,
      female; register: academic, news, ...)</p>
    <p>Best</p>
    <p>Hannah<br>
    </p>
    <br>
    <div class="moz-cite-prefix">Am 14.11.2016 um 15:08 schrieb Hardie,
      Andrew:<br>
    </div>
    <blockquote
      cite="mid:28078EC3FBF1B940A3EF3D0D19BE351D7FBCF373@EX-1-MB2.lancs.local"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
      <meta name="Generator" content="Microsoft Word 14 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Wingdings;
        panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p
        {mso-style-priority:99;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
        {mso-style-priority:99;
        mso-style-link:"Balloon Text Char";
        margin:0cm;
        margin-bottom:.0001pt;
        font-size:8.0pt;
        font-family:"Tahoma","sans-serif";}
span.BalloonTextChar
        {mso-style-name:"Balloon Text Char";
        mso-style-priority:99;
        mso-style-link:"Balloon Text";
        font-family:"Tahoma","sans-serif";
        mso-fareast-language:EN-GB;}
span.EmailStyle20
        {mso-style-type:personal-reply;
        font-family:"Verdana","sans-serif";
        color:#1F497D;}
.MsoChpDefault
        {mso-style-type:export-only;
        mso-fareast-language:EN-US;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
/* List Definitions */
@list l0
        {mso-list-id:988090549;
        mso-list-template-ids:803661404;}
@list l0:level1
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:36.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
@list l0:level2
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:72.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
@list l0:level3
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:108.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
@list l0:level4
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:144.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
@list l0:level5
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:180.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
@list l0:level6
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:216.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
@list l0:level7
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:252.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
@list l0:level8
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:288.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
@list l0:level9
        {mso-level-number-format:bullet;
        mso-level-text:;
        mso-level-tab-stop:324.0pt;
        mso-level-number-position:left;
        text-indent:-18.0pt;
        mso-ansi-font-size:10.0pt;
        font-family:Wingdings;}
ol
        {margin-bottom:0cm;}
ul
        {margin-bottom:0cm;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Daniel’s
            sample of a datafile exemplifies one of the two methods for
            more-than-minimal text metadata. This can either be loaded
            from a tab-delimited file, or deduced from XML. The latter
            method is the one Daniel exemplifies.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">For
            <i>minimal</i> metadata you only require text with the ID
            attribute (whose values must be
            <i>handles</i>, i.e. just letters, numbers with no space /
            punctuation).<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">It
            is a rule of CQPweb corpora that the whole corpus needs to
            occur within &lt;text&gt; elements, each of which must have
            an id, and there can’t be any words that are not inside a
            &lt;text&gt; element. If you don’t care about text
            boundaries, you can just wrap the whole corpus in one
            &lt;text id="CORPUS"&gt; … &lt;/text&gt;
            <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">This
            is explained in my paper:
            <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <ul style="margin-top:0cm" type="square">
          <li class="MsoNormal" style="color:#1F497D;mso-list:l0 level1
            lfo1"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;">Hardie,
              Andrew (2012). <a moz-do-not-send="true"
href="http://www.ingentaconnect.com/content/jbp/ijcl/2012/00000017/00000003/art00004"
                target="_blank">CQPweb – combining power, flexibility
                and usability in a corpus analysis tool</a>. <i>International
                Journal of Corpus Linguistics</i> 17 (3): 380-409. <a
                moz-do-not-send="true"
                href="http://www.lancs.ac.uk/staff/hardiea/cqpweb-paper.pdf"
                target="_blank">[alternative link]</a><o:p></o:p></span></li>
        </ul>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Sorry
            it’s not written up in the manual yet, only so many hours in
            a day alas…<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">best<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D">Andrew.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,&quot;sans-serif&quot;;color:#1F497D"><o:p> </o:p></span></p>
        <p class="MsoNormal"><b><span
style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"
              lang="EN-US">From:</span></b><span
style="font-size:10.0pt;font-family:&quot;Tahoma&quot;,&quot;sans-serif&quot;"
            lang="EN-US"> <a class="moz-txt-link-abbreviated" href="mailto:cwb-bounces@sslmit.unibo.it">cwb-bounces@sslmit.unibo.it</a>
            [<a class="moz-txt-link-freetext" href="mailto:cwb-bounces@sslmit.unibo.it">mailto:cwb-bounces@sslmit.unibo.it</a>]
            <b>On Behalf Of </b>Daniel Renau<br>
            <b>Sent:</b> 14 November 2016 13:46<br>
            <b>To:</b> Open source development of the Corpus WorkBench<br>
            <b>Subject:</b> Re: [CWB] Can't create metadata<o:p></o:p></span></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p>Hi jiayue,<o:p></o:p></p>
        <p>My team works with verticalized texts like this:<o:p></o:p></p>
        <p>&lt;text id="ST1" title="namewithoutspaces" author="name"&gt;<br>
          &lt;s&gt;<br>
          word pos lemma<br>
          word pos lemma<br>
          word pos lemma<br>
          word pos lemma<br>
          &lt;/s&gt;<br>
          &lt;/text&gt;<o:p></o:p></p>
        <p>&lt;text id="ST2" title="anothertextname"
          author="otherperson"&gt;<br>
          &lt;s&gt;<br>
          word pos lemma<br>
          word pos lemma<br>
          word pos lemma<br>
          &lt;/s&gt;<br>
          &lt;/text&gt;<o:p></o:p></p>
        <p>You can add more text tags as: author_sex, language, year,
          translator...<o:p></o:p></p>
        <div>
          <p class="MsoNormal"><o:p> </o:p></p>
          <div>
            <p class="MsoNormal">El 14 nov. 2016 2:37 p. m., "Jiayue
              Wang" &lt;<a moz-do-not-send="true"
                href="mailto:arthur0421@gmail.com">arthur0421@gmail.com</a>&gt;
              escribió:<o:p></o:p></p>
            <p class="MsoNormal">Thanks Andrew. I still don't understand
              where the tags &lt;text id=""&gt; and &lt;/text&gt; should
              be added. Should they enclose a corpus file? I notice that
              section 7.6 "Metadata template" of the CQPwebAdminManual
              is empty. Could you show me a template?<br>
              <br>
              Best,<br>
              Jiayue<br>
              <br>
              On 14/11/16 09:38, Hardie, Andrew wrote:<o:p></o:p></p>
            <p class="MsoNormal">Well it looks rather as if you don't
              have any text tags at all there... which would be part of
              the problem. Try again with &lt;text
              id="..."&gt;...&lt;/text&gt; tags added to the file, as
              required.<br>
              <br>
              As for why indexing is taking so long, it's very difficult
              for me to diagnose at a distance. You should keep an eye
              on your process list (e.g. via top) to see if anything is
              actually happening. As long as a cwb-*** process is
              running, something productive is happening, and you
              shouldn't abort.<br>
              <br>
              best<br>
              <br>
              Andrew.<br>
              <br>
              -----Original Message-----<br>
              From: <a moz-do-not-send="true"
                href="mailto:cwb-bounces@sslmit.unibo.it"
                target="_blank">cwb-bounces@sslmit.unibo.it</a> [mailto:<a
                moz-do-not-send="true"
                href="mailto:cwb-bounces@sslmit.unibo.it"
                target="_blank">cwb-bounces@sslmit.unibo.it</a>] On
              Behalf Of Jiayue Wang<br>
              Sent: 13 November 2016 11:06<br>
              To: Open source development of the Corpus WorkBench<br>
              Subject: Re: [CWB] Can't create metadata<br>
              <br>
              Hi Andrew,<br>
              <br>
              Thanks a lot.<br>
              I deleted the us_rhodeisland corpus and tried again to
              install it. The<br>
              corpus file looks like this:<br>
              <br>
              If      IN      if<br>
              you     PP      you<br>
              have    VBP     have<br>
              any     DT      any<br>
              questions       NNS     question<br>
              or      CC      or<br>
              suggestions     NNS     suggestion<br>
              how     WRB     how<br>
              this    DT      this<br>
              website NN      website<br>
              might   MD      might<br>
              be      VB      be<br>
              improved        VBN     improve<br>
              ,       ,       ,<br>
              please  VB      please<br>
              feel    VB      feel<br>
              free    JJ      free<br>
              to      TO      to<br>
              contact VB      contact<br>
              us      PP      us<br>
              .       SENT    .<br>
              <br>
              The corpus contains only this file (44.0 MB). For
              P-attribute I selected<br>
              the POS and lemma (TreeTagger format) option. Then I
              clicked Install, 31<br>
              files were created in the index/us_rhodeisland folder, but
              the process<br>
              goes on endlessly. I interrupted this process and tried
              again but the<br>
              same happened. I'm wondering how long time does this
              approximately take<br>
              on my laptop, which has 8 GB of ram, and a, Intel i5
              quadcore CPU?<br>
              <br>
              Best<br>
              Jiayue<br>
              <br>
              On 13/11/16 06:19, Hardie, Andrew wrote:<o:p></o:p></p>
            <p class="MsoNormal" style="margin-bottom:12.0pt">This error
              message suggests that your &lt;text&gt; elements lack
              valid ID<br>
              codes.<br>
              <br>
              The most likely reason for [UNREADABLE] is that you have
              declared a<br>
              primary annotation, e.g. a part of speech tag, but the
              annotation in<br>
              question does not exist. This can happen if you use a
              template that<br>
              your data does not match, for instance.<br>
              <br>
              best<br>
              <br>
              Andrew.<br>
              <br>
              -----Original Message----- From: <a
                moz-do-not-send="true"
                href="mailto:cwb-bounces@sslmit.unibo.it"
                target="_blank">
                cwb-bounces@sslmit.unibo.it</a><br>
              [mailto:<a moz-do-not-send="true"
                href="mailto:cwb-bounces@sslmit.unibo.it"
                target="_blank">cwb-bounces@sslmit.unibo.it</a>] On
              Behalf Of Jiayue Wang Sent:<br>
              11 November 2016 20:17 To: Open source development of the
              Corpus<br>
              WorkBench Subject: [CWB] Can't create metadata<br>
              <br>
              Hi,<br>
              <br>
              After a full installation of CQBweb I installed a corpus
              called<br>
              "us_rhodeisland" (including 2 files, a raw text, and a
              TreeTagger<br>
              tagged text) without metadata. Since I have no idea what a
              metadata<br>
              file looks like, I selected "No thanks, I'll run this
              myself (safer<br>
              for very large corpora)" and clicked "Create minimalist
              metadata<br>
              table" and saw the following error message:<br>
              <br>
              <br>
              A MySQL query did not run successfully!<br>
              <br>
              <br>
              Original query: insert into<br>
              ___temp_cqp_text_positions_for_us_rhodeisland (text_id,
              cqp_begin,<br>
              cqp_end) VALUES ('', 0, 55858),('', 55859, 3058358) /*
              from User:<br>
              admin | Function: do_append_mysql_comment() | 2016-Nov-11
              20:04:20<br>
              */<br>
              <br>
              <br>
              Error # 1062: Duplicate entry '' for key 'PRIMARY'<br>
              <br>
              <br>
              BTW, when I try a standard query, each concordance line
              begins with<br>
              "[UNREADABLE] [UNREADABLE] [UNREADABLE]". What is the most
              likely<br>
              reason?<br>
              <br>
              Any help is appreciated, thanks!<br>
              <br>
              Jiayue Wang
              _______________________________________________ CWB<br>
              mailing list <a moz-do-not-send="true"
                href="mailto:CWB@sslmit.unibo.it" target="_blank">CWB@sslmit.unibo.it</a><br>
              <a moz-do-not-send="true"
                href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb"
                target="_blank">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a><br>
              _______________________________________________ CWB
              mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:CWB@sslmit.unibo.it" target="_blank">CWB@sslmit.unibo.it</a><br>
              <a moz-do-not-send="true"
                href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb"
                target="_blank">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a><o:p></o:p></p>
            <p class="MsoNormal" style="margin-bottom:12.0pt">_______________________________________________<br>
              CWB mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:CWB@sslmit.unibo.it" target="_blank">CWB@sslmit.unibo.it</a><br>
              <a moz-do-not-send="true"
                href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb"
                target="_blank">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a><br>
              _______________________________________________<br>
              CWB mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:CWB@sslmit.unibo.it" target="_blank">CWB@sslmit.unibo.it</a><br>
              <a moz-do-not-send="true"
                href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb"
                target="_blank">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a><o:p></o:p></p>
            <p class="MsoNormal">_______________________________________________<br>
              CWB mailing list<br>
              <a moz-do-not-send="true"
                href="mailto:CWB@sslmit.unibo.it" target="_blank">CWB@sslmit.unibo.it</a><br>
              <a moz-do-not-send="true"
                href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb"
                target="_blank">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a><o:p></o:p></p>
          </div>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
CWB mailing list
<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>
<a class="moz-txt-link-freetext" href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>