[CWB] A question about the aligning using cwb-encoding

Munich LEE leemh at yonsei.ac.kr
Mon Jan 27 00:15:25 CET 2014


Hi,

 

I am building an English-Korean bilingual corpus using cwb-align-encode.

 

So, I encoded and aligned.

 

At firts it seemed that it worked.

 

However I found a problem, when I checked the search results.

 

Some first sentences were aligned as right pairs.

But the others were not.

It seems to be related with statistical aligning process.

 

Actually I made two corpora so, that every pair sentence should have the same sentence id like  or , in order to avoid the failure of statistical alignment.

I am working with 60000 sentences. And I manually aligned all sentences and put the information into the xml tag "s_id".

 

My question is how I can make useful the manually created xml tag "s_id"?

 

Could anyone help me?

 

I will appreciate your support.

 

Thanks.

 

Munich
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20140127/0de7cdf1/attachment.html>


More information about the CWB mailing list