[CWB] CQPweb now supports parallel corpora

Giorgina Cerutti Benitez Giorgina.Cerutti at unige.ch
Wed Sep 21 16:58:40 CEST 2016


Thank you for your quick reply and for the clarification. Don’t worry. I just wanted to make sure I understood correctly.

Best,

Giorgina

De : cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] De la part de Hardie, Andrew
Envoyé : mercredi 21 septembre 2016 16:54
À : Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Objet : Re: [CWB] CQPweb now supports parallel corpora

Hi Giorgina,

No, it’s not possible.

I can attempt to rework things to make it possible, but I am afraid I won’t have time for this in the immediate future.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Giorgina Cerutti Benitez
Sent: 21 September 2016 15:49
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] CQPweb now supports parallel corpora

Dear Andrew,

I am sorry to repeat the question, but I am trying to build trilingual corpora for CQPweb and I am forced to review this issue. You explained that it is possible to create an interlinked trilingual dataset with one-way links, but my question was actually if it is possible to create and upload a dataset with two-way links, so that the corpus is displayed in three languages simultaneously and in the same window, as shown here:






I’m sorry for not being sufficiently clear before.

Thank you very much,

Giorgina


De : cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it] De la part de Hardie, Andrew
Envoyé : mardi 2 août 2016 11:41
À : Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Objet : Re: [CWB] CQPweb now supports parallel corpora

Yes, this is because each alignment link is always from one language to one other language. But by having multiple such links, you can accommodate as many parallel languages as you have installed. EG if you have English/French/German, you could create 6 of these one-way links to have a fully interlinked trilingual dataset:

en->fr
en->de
fr->en
fr->de
de->en
de->fr

In Europarl, there are 6 languages so the complete network involves 30 different language-A-to-language-B links.

Hope that makes sense.

best

Andrew.

From: cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it] On Behalf Of Giorgina Cerutti Benitez
Sent: 02 August 2016 10:04
To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
Subject: Re: [CWB] CQPweb now supports parallel corpora


Yes, sorry, my question was if it is currently possible to upload tri-/quadri-lingual corpora to CQPweb, as the manual makes mainly reference to corpora in language A and B.

Best,

Giorgina

De : cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it] De la part de Hardie, Andrew
Envoyé : mardi 2 août 2016 11:00
À : Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Objet : Re: [CWB] CQPweb now supports parallel corpora

Can you expand on the question? The Europarl corpora are not only tri-lingual, they are six-lingual*, so they already demonstrate how the system can go beyond just 2 languages.

(*Hexilingual? I’m not sure that’s a word.)

best

Andrew.

From: Giorgina Cerutti Benitez [mailto:Giorgina.Cerutti at unige.ch]
Sent: 02 August 2016 09:57
To: Open source development of the Corpus WorkBench
Cc: Hardie, Andrew
Subject: TR: CQPweb now supports parallel corpora

Dear Andrew,

Thank you very much for this very good news. I’ve been going through the Europarl corpora as well as through the manual and I wonder if trilingual data can be supported or if it should be treated as you did by uploading it by language pair.

Best regards,

Giorgina

De : cwb-bounces at liste.sslmit.unibo.it<mailto:cwb-bounces at liste.sslmit.unibo.it> [mailto:cwb-bounces at liste.sslmit.unibo.it] De la part de Hardie, Andrew
Envoyé : lundi 1 août 2016 00:41
À : Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Objet : [CWB] CQPweb now supports parallel corpora

Hi everyone,

CQPweb v 3.2.22 is now in the SVN repo. It adds support for parallel corpora. Since this is a much requested feature that has been on-the-list for several years, I thought it was worth sending a note to the list to let everyone know it has appeared.

Documentation for setup is in chapter 8 of the manual (also in SVN, also here: https://cqpweb.lancs.ac.uk/doc/CQPwebAdminManual.pdf

I am working on adding the Europarl corpora on the Lancaster server, so people who don’t have their own server but are interested in parallel corpora can try it out. This should be done by Mon a.m.  UK time.

Bug reports are, as ever, most appreciated.

Known issue: display of parallel data works when in categorisation mode, but there is currently no widget in the interface to switch it on (it can be switched on by manually entering the right attribute handle in the URL, but that is not then preserved across subsequent sessions in the categorisation UI). This will be fixed in a subsequent version.

best

Andrew.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20160921/75216f3d/attachment-0001.html>


More information about the CWB mailing list