<!DOCTYPE html>
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
Hello to all,<br>
I'm currently trying to set up a parallel corpus including a source
text and four different translations.<br>
The method I use to set up a parallel corpus is this (copied and
adapted from the cqp / cwb manuals):<br>
<br>
To set up parallel corpora:<br>
<br>
1) Get them installed on cqpweb with the different xml tags
declared, etc.<br>
2) Use cwb-align to generate an alignment file suffixed .align, i.e.
<br>
cwb-align -r /var/cqpweb/registry/ -o test.align TEST_EN TEST_FR s<br>
This indicates the registry directory explicitly with the -r option.<br>
3) Modify the registry files using nano to indicate the other
aligned corpus. Th<br>
is means modifying /var/cqpweb/registry/"my_corpus" and appending
ALIGNED "other<br>
_corpus".<br>
4) Use cwb-align-encode to point to the alignment file. This need to
be done as <br>
admin i.e. with su and using -d and -r options to point to the data
and registry<br>
directories<br>
The second command does the same thing backwards, i.e. reads the
alignments the <br>
other way round, with the -R switch.<br>
cwb-align-encode -d /var/cqpweb/index/test_en/ -r
/var/cqpweb/registry/ test.ali<br>
gn<br>
cwb-align-encode -d /var/cqpweb/index/test_fr/ -r
/var/cqpweb/registry/ -R test.<br>
align<br>
5) Test it out in cqpweb. <br>
<br>
Now, my question is: can I set up a parallel corpus in such a way
that a search in the source will display all the aligned
translations simultaneously?<br>
If so, is it just a question of following this how-to for each
source-target pair, and then declaring multiple alignments in cqpweb
or do I align all the text from the CLI?<br>
I hope the question is clear and thank you in advance for any
guidance.<br>
Best,<br>
Graham.
</body>
</html>