[CWB] Parallel Corpora

Philippe Baudrion Philippe.Baudrion at unige.ch
Fri Jun 10 15:41:07 CEST 2016


Thank you Susanne for your quick answer.
Until now I have only tried automatic indexing through CQPweb.
I guess I will need to dig a bit more CQP encoding options in order to 
have it work.
Thank you for putting me on the right track, Philippe

On 06/10/2016 02:54 PM, Susanne Flach wrote:
> Dear Philippe,
>
> Have you tried declaring nested XML elements with :0 as described in 
> Sec 4?
> http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial/node5.html
>
> I’ve never had your problem, but I have always used the :0.
>
> Best,
> Susanne
>
> --
> Susanne Flach, M.A.
> Arbeitsbereich Linguistik
> Institut für Englische Philologie
> Freie Universität Berlin
> Habelschwerdter Allee 45
> 14195 Berlin
>
> NEU! Korpustutorium mit CQP 
> <http://userpage.fu-berlin.de/%7Eflach/corpling/>
>
> http://userpage.fu-berlin.de/~flach/ 
> <http://userpage.fu-berlin.de/%7Eflach/>
>
> Raum JK29/223
> Telefon +49 30 838 72311
>
>> On 10 Jun 2016, at 14:39, Philippe Baudrion 
>> <Philippe.Baudrion at unige.ch <mailto:Philippe.Baudrion at unige.ch>> wrote:
>>
>> Dear all,
>> I am trying to index the following corpus structure but it is not 
>> working. Here is an extract of the corpus:
>>
>> <text id="FR_DI_2000_1" organisation="CERD" country="Francia" 
>> type="Documento informativo" year="2000" signature="CERD/C/SR.1373">
>>     <s id="1">
>>         <seg lang="fr">
>> La
>> séance
>> est
>> ouverte
>> à
>> 10h05
>> .
>> </seg>
>>         <seg lang="es">
>> Se
>> declara
>> abierta
>> la
>> sesión
>> a
>> las
>> 10.05
>> horas
>> .
>>         </seg>
>>     </s>
>> ...
>> </text>
>>
>> The corresponding files on the disk drive remains empty:
>> > ll /export/data/CQPweb_data/corpus/test_pb_fr_es/
>>            total 120
>>            drwxr-xr-x  2 www-data www-data 4096 Jun  6 12:18 ./
>>            drwxrwxr-x 58 www-data letrint  4096 Jun  6 12:18 ../
>>            -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 seg_lang.avs
>>            -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 seg_lang.avx
>>            -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 seg_lang.rng
>>            -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 seg.rng
>>            -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 s_id.avs
>>            -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 s_id.avx
>>            -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 s_id.rng
>>            -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 s.rng
>>            -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_country.avs
>>            -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_country.avx
>>            -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_country.rng
>>            -rw-r--r--  1 www-data www-data   13 Jun  6 12:18 text_id.avs
>>            -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_id.avx
>>            -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_id.rng
>>            ...
>>
>> The indexing command is as follow:
>> > cwb-encode -xsB -c utf8 -d /export/data/CQPweb_data/corpus/test_pb_fr_es -f /export/data/CQPweb_data/upload/Test-PB-FR_ES.vrt -R "/export/data/CQPweb_data/registry/test_pb_fr_es"  -S text+id+organisation+country+type+year+signature -S s+id -S seg+lang 2>&1
>> > cwb-makeall -r "/export/data/CQPweb_data/registry" -V TEST_PB_FR_ES 2>&1
>>
>> I guess due to the redundence of the <seg> element it is impossible 
>> to correctely index that corpus, but I want to have your opinion on 
>> that. In case it is possible, what would then be the correct indexing 
>> command. Thank you for your help, greetings,
>> -- 
>> Baudrion Philippe
>> Correspondant Informatique
>>
>> UNIVERSITE DE GENEVE
>> Faculté de traduction et d'interprétation
>> 40, bd. du Pont d'Arve
>> 1211 GENEVE 4
>>
>> Tél +41 22 379 94 95
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>

-- 
Baudrion Philippe
Correspondant Informatique

UNIVERSITE DE GENEVE
Faculté de traduction et d'interprétation
40, bd. du Pont d'Arve
1211 GENEVE 4

Tél +41 22 379 94 95

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160610/9e07ee2e/attachment-0001.html>


More information about the CWB mailing list