<html>
  <head>
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
  </head>
  <body text="#000000" bgcolor="#FFFFFF">
    Many, many thanks for all this, Andrew, as ever... A fully
    signposted route, which I hope will get me where I want to go!<br>
    Best,<br>
    Graham.<br>
    <br>
    <div class="moz-cite-prefix">Le 07/05/2019 à 16:53, Hardie, Andrew a
      écrit :<br>
    </div>
    <blockquote type="cite"
cite="mid:LO2P265MB1117B6BF853C91465EB64AB1CB310@LO2P265MB1117.GBRP265.PROD.OUTLOOK.COM">
      <meta http-equiv="Content-Type" content="text/html; charset=utf-8">
      <meta name="Generator" content="Microsoft Word 15 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0cm;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        color:black;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
        {mso-style-priority:34;
        margin-top:0cm;
        margin-right:0cm;
        margin-bottom:0cm;
        margin-left:36.0pt;
        margin-bottom:.0001pt;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        color:black;}
p.msonormal0, li.msonormal0, div.msonormal0
        {mso-style-name:msonormal;
        mso-margin-top-alt:auto;
        margin-right:0cm;
        mso-margin-bottom-alt:auto;
        margin-left:0cm;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;
        color:black;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Verdana",sans-serif;
        color:#1F497D;
        font-weight:normal;
        font-style:normal;
        text-decoration:none none;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:612.0pt 792.0pt;
        margin:72.0pt 72.0pt 72.0pt 72.0pt;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Hi
            Graham,<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">&gt;&gt;</span>
          I have not been able to find the English and German Holmes
          files used in Stefan Evert's tutorial<span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">They
            aren’t currently in the release package (which I believe is
            probably in need of an update!) but they are in the SVN
            tree:
          </span><a
href="https://sourceforge.net/p/cwb/code/HEAD/tree/doc/corpora/encoding_tutorial_data/"
            moz-do-not-send="true">https://sourceforge.net/p/cwb/code/HEAD/tree/doc/corpora/encoding_tutorial_data/</a><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">&gt;&gt;</span>
          what exactly is the required input format for the cwb-align
          command?<span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">cwb-align
            is an aligner, you don’t need to use it if you are dealing
            with pre-aligned data. Its
            <b>output</b> format is what you need to generate (to then
            input into cwb-align-encode, the program that actually
            creates the a-attribute). This means in the tutorial, you
            can skip to sec 8.4.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">The
            format necessary (called “.align file”) is described in the
            section “OUTPUT FORMAT” of
            <b>man cwb-align</b>. Pasted below.  <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">You
            would need to output the begin/end points of the s
            attributes in the first corpus (using cwb-s-decode), and
            then combine that with the begin/end points of the s
            attribute in the second corpus, to get the 4-tuples needed
            by cwb-align-encode. And then you’d need to add
            <b>1:1</b> at the end of each line. That 5 column file
            defines the alignment.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">&gt;&gt;</span>
          If I have .vrt files created in two languages with treetagger,
          and if I have prealigned these, in such a way that the first
          sentence of one file corresponds to the first sentence of the
          other, the second sentence to the second, etc. then is that
          enough?<span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Yes,
            that is enough if you use the method above to create the
            input file for cwb-align-encode.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">&gt;&gt;</span>
          Or should my files also including numerical information with
          all sentences numbered?<span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">If
            your ranges do actually have unique ID attributes &lt;s
            id=”..”&gt; or maybe &lt;s n=”..”&gt;, you can optionally
            use
            <b>cwb-align-import</b> instead of cwb-align-encode: see
            section 8.5 . This has a different input format than
            cwb-align-encode: it identifies matching segments by some ID
            attribute, rather than by spelling out the corpus token
            position ranges, as in cwb-align-encode. This is called a <b>bead
              file</b> whereas the other is called an <b>.align file</b>
            (not the clearest terminology I know).<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Your
            sentences would have to be numbered through your whole
            corpus. In which case, your bead file could look like this:<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">CORP1   
            CORP2    s     {n}<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">1      
            1<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">2      
            2<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">3      
            3<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">[…]
            <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"> which
            might be easier to auto-generate than the cwb-align-encode
            format based on raw corpus position numbers.
            <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Finally,
            whatever method you use, don’t forget the need to update the
            registry file with the new a-attribute.
            <o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">best<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">Andrew.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">=====================<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US">From
            man cwb-align:<o:p></o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">OUTPUT
            FORMAT<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            cwb-align's output file uses CWB's ".align" file format.
            ".align" files are ASCII text files<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            (although they may contain characters from another encoding
            if the corpus IDs include non-ASCII<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            characters), formatted as follows.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            The first line is a header line, which contains the
            following four elements, separated by tabs:<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">     
             ·   The ID of the source corpus<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The ID of the aligned s-attribute (the grid attribute -
            see above)<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The ID of the target corpus<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The ID of the aligned s-attribute (repeated)<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            Following the header, each individual line represents a
            single pair of aligned regions in the<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            corpus.  This is specified by six fields of information,
            separated by tabs. The six fields are<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            as follows:<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The beginning of the range in the source corpus
            (expressed as a cpos, i.e. a token number)<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The end of the range in the source corpus (expressed as
            a cpos)<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The beginning of the range in the target corpus
            (expressed as a cpos)<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The end of the range in the target corpus (expressed as
            a cpos)<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The type of alignment: 1:1, 2:1, 1:2 or 2:2<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            ·   The quality of the alignment: a score calculated by the
            alignment engine<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            For example,<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">        
            140    169    137    180    1:2<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            means that corpus position ranges [140,169] and [137,180]
            form a 1:2 alignment pair.<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            (The final field, the quality, is optional in this file
            format, and is absent in the example<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US">      
            above; however, cwb-align will always provide it.)<o:p></o:p></span></p>
        <p class="MsoNormal"><span
            style="font-size:10.0pt;font-family:&quot;Courier
            New&quot;;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span
style="font-size:10.0pt;font-family:&quot;Verdana&quot;,sans-serif;color:#1F497D;mso-fareast-language:EN-US"><o:p> </o:p></span></p>
        <div>
          <div style="border:none;border-top:solid #E1E1E1
            1.0pt;padding:3.0pt 0cm 0cm 0cm">
            <p class="MsoNormal"><b><span style="color:windowtext"
                  lang="EN-US">From:</span></b><span
                style="color:windowtext" lang="EN-US">
                <a class="moz-txt-link-abbreviated" href="mailto:cwb-bounces@sslmit.unibo.it">cwb-bounces@sslmit.unibo.it</a>
                <a class="moz-txt-link-rfc2396E" href="mailto:cwb-bounces@sslmit.unibo.it">&lt;cwb-bounces@sslmit.unibo.it&gt;</a>
                <b>On Behalf Of </b>Graham Ranger -- UAPV<br>
                <b>Sent:</b> 07 May 2019 13:57<br>
                <b>To:</b> Open source development of the Corpus
                WorkBench <a class="moz-txt-link-rfc2396E" href="mailto:cwb@sslmit.unibo.it">&lt;cwb@sslmit.unibo.it&gt;</a><br>
                <b>Subject:</b> [CWB] Aligning parallel corpora<o:p></o:p></span></p>
          </div>
        </div>
        <p class="MsoNormal"><o:p> </o:p></p>
        <p class="MsoNormal">Hello to all,<o:p></o:p></p>
        <p class="MsoNormal"><o:p> </o:p></p>
        <div>
          <p class="MsoNormal">I have set up a parallel corpus on cqpweb
            using s-attributes for the visualisation of translations but
            I would like to be able to do the same thing more cleanly,
            using alignment attributes. However, try as I might, I
            cannot seem to follow the instructions in the encoding
            tutorial. I have not been able to find the English and
            German Holmes files used in Stefan Evert's tutorial for
            illustration. Now, what I would like to know is: what
            exactly is the required input format for the cwb-align
            command? If I have .vrt files created in two languages with
            treetagger, and if I have prealigned these, in such a way
            that the first sentence of one file corresponds to the first
            sentence of the other, the second sentence to the second,
            etc. then is that enough? Or should my files also including
            numerical information with all sentences numbered? I suspect
            this is a very naive question, but it's one that I do not
            seem to be able to find my way around without help!<br>
            <br>
            <o:p></o:p></p>
          <p class="MsoNormal" style="text-align:right" align="right">Best,<br>
            <br>
            <o:p></o:p></p>
          <p class="MsoNormal">Graham.<o:p></o:p></p>
        </div>
      </div>
      <br>
      <fieldset class="mimeAttachmentHeader"></fieldset>
      <br>
      <pre wrap="">_______________________________________________
CWB mailing list
<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>
<a class="moz-txt-link-freetext" href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a>
</pre>
    </blockquote>
    <br>
  </body>
</html>