[CWB] query parallel corpus from command line
"Andrés Chandía"
andres at chandia.net
Tue Nov 21 12:44:35 CET 2017
cwb-describe-corpus -s BANCTRADDECA_CA
============================================================
Corpus: BANCTRADDECA_CA
============================================================
description:Â Â Â
registry file:Â
/usr/local/share/cwb/registry/banctraddeca_ca
home directory:
/mnt/vmdata/iac/cqp/data/banctraddeca_ca/
info file:Â Â Â Â Â
/mnt/vmdata/iac/cqp/data/banctraddeca_ca/.info
size (tokens):Â 394668
 3 positional attributes
 27 structural attributes
 1
alignment attributes
p-ATT
word               Â
394668 tokens,   28277 types
p-ATT
lemma              Â
394668 tokens,   14391 types
p-ATT
pos                Â
394668 tokens,      64 types
s-ATT
text                   Â
17 regions
s-ATT
text_id                Â
17 regions (with annotations)
s-ATT
text_lleng_tr           17 regions
(with annotations)
s-ATT
text_lleng_or           17 regions
(with annotations)
s-ATT
text_cpr               Â
17 regions (with annotations)
s-ATT
text_for               Â
17 regions (with annotations)
s-ATT
text_ftr               Â
17 regions (with annotations)
s-ATT
text_indexador          17 regions (with
annotations)
s-ATT
text_dif               Â
17 regions (with annotations)
s-ATT
text_reg               Â
17 regions (with annotations)
s-ATT
text_esp               Â
17 regions (with annotations)
s-ATT
text_tem               Â
17 regions (with annotations)
s-ATT
text_tipus             Â
17 regions (with annotations)
s-ATT
text_data_or            17
regions (with annotations)
s-ATT
text_data_tr            17
regions (with annotations)
s-ATT
text_autor             Â
17 regions (with annotations)
s-ATT
text_traductor          17 regions (with
annotations)
s-ATT
text_titol_or           17 regions
(with annotations)
s-ATT
text_titol_tr           17 regions
(with annotations)
s-ATT
s                   Â
26347 regions
s-ATT
s_id                Â
26347 regions (with annotations)
s-ATT
enty                 Â
9957 regions
s-ATT
contrac              Â
6766 regions
s-ATT contrac_forma        Â
6766 regions (with annotations)
s-ATT
abr                   Â
209 regions
s-ATT
date                   Â
18 regions
s-ATT
p                       Â
0 regions
a-ATT banctraddeca_de      25170 alignment
blocks (extended)
# cwb-describe-corpus -s BANCTRADDECA_de
============================================================
Corpus: BANCTRADDECA_de
============================================================
description:Â Â Â
registry file:Â
/usr/local/share/cwb/registry/banctraddeca_de
home directory:
/mnt/vmdata/iac/cqp/data/banctraddeca_de/
info file:Â Â Â Â Â
/mnt/vmdata/iac/cqp/data/banctraddeca_de/.info
size (tokens):Â 344966
 3 positional attributes
 27 structural attributes
 1
alignment attributes
p-ATT
word               Â
344966 tokens,   35681 types
p-ATT
lemma              Â
344966 tokens,   19332 types
p-ATT
pos                Â
344966 tokens,      53 types
s-ATT
text                   Â
17 regions
s-ATT
text_id                Â
17 regions (with annotations)
s-ATT
text_lleng_tr           17 regions
(with annotations)
s-ATT
text_lleng_or           17 regions
(with annotations)
s-ATT
text_cpr               Â
17 regions (with annotations)
s-ATT
text_for               Â
17 regions (with annotations)
s-ATT
text_ftr               Â
17 regions (with annotations)
s-ATT
text_indexador          17 regions (with
annotations)
s-ATT
text_dif               Â
17 regions (with annotations)
s-ATT
text_reg               Â
17 regions (with annotations)
s-ATT
text_esp               Â
17 regions (with annotations)
s-ATT
text_tem               Â
17 regions (with annotations)
s-ATT
text_tipus             Â
17 regions (with annotations)
s-ATT
text_data_or            17
regions (with annotations)
s-ATT
text_data_tr            17
regions (with annotations)
s-ATT
text_autor             Â
17 regions (with annotations)
s-ATT
text_traductor          17 regions (with
annotations)
s-ATT
text_titol_or           17 regions
(with annotations)
s-ATT
text_titol_tr           17 regions
(with annotations)
s-ATT
s                   Â
26347 regions
s-ATT
s_id                Â
26347 regions (with annotations)
s-ATT
enty                    Â
0 regions
s-ATT
contrac                 Â
0 regions
s-ATT
contrac_forma            0
regions (with annotations)
s-ATT
abr                     Â
0 regions
s-ATT
date                    Â
0 regions
s-ATT
p                       Â
0 regions
a-ATT banctraddeca_ca      25170 alignment
blocks (extended)
So what does cwb-describe-corpus -s CORPUS_OL tell you?
_______________________
            andrés
chandÃa
NMT |
Dungupeyem | Corlexim
administrador de:
Parles.upf | Amind
terapia | ONG Mapuche koyaktu | Nocando | IAC
| CddZ | CatCg |
mail: ONG Mapuche koyaktu | Psicoaching |
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20171121/d05fe20b/attachment-0001.html>
More information about the CWB
mailing list