[CWB] Alignment format

Heiden Serge slh at ens-lsh.fr
Fri Feb 5 08:14:13 CET 2010


Alberto,

In CQP, the corpora don't have to know each other to be aligned,
in the first time. So there is no specific format for that.
First you have to produce two independant CQP corpora with
at least one structural attribute, like 's', in each of them.
Then, you have to declare in each corpus registry file the fact
that a specific attribute is aligned to another one in another corpus.
In the "Corpus Administrator’s Manual", this is described in section
"4.2.6 Alignment attributes" page 35.
To use that declaration when querying your corpus, you have to
tell CQP from which corpus you want the display of the results from.
See the command "show +hansard-f;" in :
http://www.ims.uni-stuttgart.de/projekte/CorpusWorkbench/CQPUserManual/HTML/node9.html

Best,
Serge

Selon Alberto Simões le 04/02/2010 21:47:
> Hello, Serge.
>
> Thanks for the answer.
>
> I know how to use easyalign and align a bitext.
> In this case I have the sentence aligned bitext and want to add it to
> CQP. Was hoping there was a simple textual file format I could use for
> the alignment.
>
> Unfortunately the document you suggested seems not to include that
> information. Especially because the section on alignment attributes is
> empty ;)
>
> Thanks,
> Alberto
>
> On 04/02/2010 19:38, Heiden Serge wrote:
>    
>> Alberto,
>>
>> the "Corpus Administrator’s Manual"
>> that you can find here :
>> http://bulba.sdsu.edu/technical-manual.ps
>> gives you instructions on how to align to different
>> CWB corpus on a specific structural attribute
>> (not positional) like your 's' .
>>
>> Regards,
>> Serge
>>
>> Selon Alberto Simões le 04/02/2010 16:46:
>>      
>>> Hello
>>>
>>> I was looking to the encode tutorial but it misses the alignment part :)
>>> I would like to know how is alignment encoded. Is it as a common
>>> attribute?
>>>
>>> Let's say:
>>>
>>> <s>
>>> I    1
>>> saw  1
>>> a    1
>>> cat  1
>>> </s>
>>> <s>
>>> The   2
>>> house 2
>>> is    2
>>> blue  2
>>> </s>
>>>
>>> and
>>>
>>> <s>
>>> Eu    1
>>> vi    1
>>> um    1
>>> gato  1
>>> </s>
>>> <s>
>>> A     2
>>> casa  2
>>> é     2
>>> azul  2
>>> </s>
>>>
>>> Is this the case?
>>> If so, identifiers can be used in multiple sentences?
>>>
>>> <s>
>>> Yes 1
>>> !   1
>>> </s>
>>> <s>
>>> Sure 1
>>> !    1
>>> </s>
>>>
>>> and
>>>
>>> <s>
>>> Sim   1
>>> ,     1
>>> claro 1
>>> !     1
>>> </s>
>>>
>>> Thanks
>>> Alberto
>>>
>>>        
>    


More information about the CWB mailing list