[CWB] cwb-align-import help.
Serge Sharoff
S.Sharoff at leeds.ac.uk
Mon Feb 8 18:32:13 CET 2010
the way I do this is by creating ids for the <tu> tags:
<tu id="1">
..
</tu>
<tu id="2">
..
</tu>
and running the align process:
cwb-align -V tu -o $1-$2.align $1 $2 tu
cwb-align-encode -D $1-$2.align
I think it's a bit an overkill (it tries aligning the ids, which are identical in any case), but it works.
Serge
________________________________________
From: cwb-bounces at sslmit.unibo.it [cwb-bounces at sslmit.unibo.it] On Behalf Of Alberto Simões [ambs at di.uminho.pt]
Sent: 08 February 2010 16:19
To: Open source development of the Corpus WorkBench
Subject: [CWB] cwb-align-import help.
Hi
Supposedly cwb-align-import can be used to import pre-aligned corpora.
Unfortunately the documentation is not much and I can't find out how to
work with it.
As far as I've gone, I know I need to import source and target languages
as distinct corpora.
I used the <tu> tag to separate translation units on each side.
Therefore, I have the same number of translation units in each side.
I just do not understand how to write the alignment_beads.txt file.
Supposedly I will need some simple file, like
1:1
2:2
3:3 (or whatever syntax).
Also, I am not sure if I need to add attributes to my <tu> tags in order
to have a number associated to each.
And, by the way, the -inverse option imports a pair of alignments
(source-target and target-source) or just the second?
I would say I need to use:
cwb-align-import -l1 sourceCorpus -l2 targetCorpus -s tu
but no idea what to use for -k (is it needed?) and for the
alignment_beads.txt file contents.
Thanks
Alberto
--
Alberto Simões
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list