[CWB] Parallel Corpora

Susanne Flach susanne.flach at fu-berlin.de
Fri Jun 10 14:54:56 CEST 2016


Dear Philippe,

Have you tried declaring nested XML elements with :0 as described in Sec 4?
http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial/node5.html <http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial/node5.html>

I’ve never had your problem, but I have always used the :0.

Best,
Susanne

--
Susanne Flach, M.A.
Arbeitsbereich Linguistik
Institut für Englische Philologie
Freie Universität Berlin
Habelschwerdter Allee 45
14195 Berlin

NEU! Korpustutorium mit CQP <http://userpage.fu-berlin.de/~flach/corpling/>

http://userpage.fu-berlin.de/~flach/

Raum JK29/223
Telefon +49 30 838 72311

> On 10 Jun 2016, at 14:39, Philippe Baudrion <Philippe.Baudrion at unige.ch> wrote:
> 
> Dear all,
> I am trying to index the following corpus structure but it is not working. Here is an extract of the corpus:
> 
> <text id="FR_DI_2000_1" organisation="CERD" country="Francia" type="Documento informativo" year="2000" signature="CERD/C/SR.1373">
>     <s id="1">
>         <seg lang="fr">
> La
> séance
> est
> ouverte
> à
> 10h05
> .
> </seg>
>         <seg lang="es">
> Se
> declara
> abierta
> la
> sesión
> a
> las
> 10.05
> horas
> .
>         </seg>
>     </s>
> ...
> </text>
> 
> The corresponding files on the disk drive remains empty:
> > ll /export/data/CQPweb_data/corpus/test_pb_fr_es/
>           total 120
>           drwxr-xr-x  2 www-data www-data 4096 Jun  6 12:18 ./
>           drwxrwxr-x 58 www-data letrint  4096 Jun  6 12:18 ../
>           -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 seg_lang.avs
>           -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 seg_lang.avx
>           -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 seg_lang.rng
>           -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 seg.rng
>           -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 s_id.avs
>           -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 s_id.avx
>           -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 s_id.rng
>           -rw-r--r--  1 www-data www-data    0 Jun  6 12:18 s.rng
>           -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_country.avs
>           -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_country.avx
>           -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_country.rng
>           -rw-r--r--  1 www-data www-data   13 Jun  6 12:18 text_id.avs
>           -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_id.avx
>           -rw-r--r--  1 www-data www-data    8 Jun  6 12:18 text_id.rng
>           ...
> 
> The indexing command is as follow:
> > cwb-encode -xsB -c utf8 -d /export/data/CQPweb_data/corpus/test_pb_fr_es -f /export/data/CQPweb_data/upload/Test-PB-FR_ES.vrt -R "/export/data/CQPweb_data/registry/test_pb_fr_es"  -S text+id+organisation+country+type+year+signature -S s+id -S seg+lang 2>&1
> > cwb-makeall -r "/export/data/CQPweb_data/registry" -V TEST_PB_FR_ES 2>&1
> 
> I guess due to the redundence of the <seg> element it is impossible to correctely index that corpus, but I want to have your opinion on that.
> In case it is possible, what would then be the correct indexing command.
> 
> Thank you for your help, greetings,
> -- 
> Baudrion Philippe
> Correspondant Informatique
> 
> UNIVERSITE DE GENEVE
> Faculté de traduction et d'interprétation
> 40, bd. du Pont d'Arve
> 1211 GENEVE 4
> 
> Tél +41 22 379 94 95
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20160610/cb6e8b8e/attachment.html>


More information about the CWB mailing list