[CWB] cwb-align-import help.

Serge Sharoff S.Sharoff at leeds.ac.uk
Mon Feb 8 18:32:13 CET 2010


the way I do this is by creating ids for the <tu> tags:
<tu id="1">
..
</tu>
<tu id="2">
..
</tu>

and running the align process: 
cwb-align -V tu -o $1-$2.align $1 $2 tu 
cwb-align-encode -D $1-$2.align 

I think it's a bit an overkill (it tries aligning the ids, which are identical in any case), but it works.
Serge

________________________________________
From: cwb-bounces at sslmit.unibo.it [cwb-bounces at sslmit.unibo.it] On Behalf Of Alberto Simões [ambs at di.uminho.pt]
Sent: 08 February 2010 16:19
To: Open source development of the Corpus WorkBench
Subject: [CWB] cwb-align-import help.

Hi

Supposedly cwb-align-import can be used to import pre-aligned corpora.
Unfortunately the documentation is not much and I can't find out how to
work with it.

As far as I've gone, I know I need to import source and target languages
as distinct corpora.

I used the <tu> tag to separate translation units on each side.
Therefore, I have the same number of translation units in each side.

I just do not understand how to write the alignment_beads.txt file.
Supposedly I will need some simple file, like
  1:1
  2:2
  3:3  (or whatever syntax).

Also, I am not sure if I need to add attributes to my <tu> tags in order
to have a number associated to each.

And, by the way, the -inverse option imports a pair of alignments
(source-target and target-source) or just the second?

I would say I need to use:

 cwb-align-import -l1 sourceCorpus -l2 targetCorpus -s tu

but no idea what to use for -k (is it needed?) and for the
alignment_beads.txt file contents.

Thanks
Alberto

--
Alberto Simões
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list