[CWB] query parallel corpus from command line

"Andrés Chandía" andres at chandia.net
Tue Nov 21 12:44:35 CET 2017



cwb-describe-corpus -s BANCTRADDECA_CA

============================================================
Corpus: BANCTRADDECA_CA
============================================================

description:    
registry file: 
/usr/local/share/cwb/registry/banctraddeca_ca
home directory:
/mnt/vmdata/iac/cqp/data/banctraddeca_ca/
info file:     
/mnt/vmdata/iac/cqp/data/banctraddeca_ca/.info
size (tokens):  394668

  3 positional attributes
 27 structural attributes
  1
alignment  attributes

p-ATT
word                
394668 tokens,    28277 types
p-ATT
lemma               
394668 tokens,    14391 types
p-ATT
pos                 
394668 tokens,       64 types
s-ATT
text                    
17 regions
s-ATT
text_id                 
17 regions (with annotations)
s-ATT
text_lleng_tr            17 regions
(with annotations)
s-ATT
text_lleng_or            17 regions
(with annotations)
s-ATT
text_cpr                
17 regions (with annotations)
s-ATT
text_for                
17 regions (with annotations)
s-ATT
text_ftr                
17 regions (with annotations)
s-ATT
text_indexador           17 regions (with
annotations)
s-ATT
text_dif                
17 regions (with annotations)
s-ATT
text_reg                
17 regions (with annotations)
s-ATT
text_esp                
17 regions (with annotations)
s-ATT
text_tem                
17 regions (with annotations)
s-ATT
text_tipus              
17 regions (with annotations)
s-ATT
text_data_or             17
regions (with annotations)
s-ATT
text_data_tr             17
regions (with annotations)
s-ATT
text_autor              
17 regions (with annotations)
s-ATT
text_traductor           17 regions (with
annotations)
s-ATT
text_titol_or            17 regions
(with annotations)
s-ATT
text_titol_tr            17 regions
(with annotations)
s-ATT
s                    
26347 regions
s-ATT
s_id                 
26347 regions (with annotations)
s-ATT
enty                  
9957 regions
s-ATT
contrac               
6766 regions
s-ATT contrac_forma         
6766 regions (with annotations)
s-ATT
abr                    
209 regions
s-ATT
date                    
18 regions
s-ATT
p                        
0 regions
a-ATT banctraddeca_de       25170 alignment
blocks (extended)



# cwb-describe-corpus -s BANCTRADDECA_de

============================================================
Corpus: BANCTRADDECA_de
============================================================

description:    
registry file: 
/usr/local/share/cwb/registry/banctraddeca_de
home directory:
/mnt/vmdata/iac/cqp/data/banctraddeca_de/
info file:     
/mnt/vmdata/iac/cqp/data/banctraddeca_de/.info
size (tokens):  344966

  3 positional attributes
 27 structural attributes
  1
alignment  attributes

p-ATT
word                
344966 tokens,    35681 types
p-ATT
lemma               
344966 tokens,    19332 types
p-ATT
pos                 
344966 tokens,       53 types
s-ATT
text                    
17 regions
s-ATT
text_id                 
17 regions (with annotations)
s-ATT
text_lleng_tr            17 regions
(with annotations)
s-ATT
text_lleng_or            17 regions
(with annotations)
s-ATT
text_cpr                
17 regions (with annotations)
s-ATT
text_for                
17 regions (with annotations)
s-ATT
text_ftr                
17 regions (with annotations)
s-ATT
text_indexador           17 regions (with
annotations)
s-ATT
text_dif                
17 regions (with annotations)
s-ATT
text_reg                
17 regions (with annotations)
s-ATT
text_esp                
17 regions (with annotations)
s-ATT
text_tem                
17 regions (with annotations)
s-ATT
text_tipus              
17 regions (with annotations)
s-ATT
text_data_or             17
regions (with annotations)
s-ATT
text_data_tr             17
regions (with annotations)
s-ATT
text_autor              
17 regions (with annotations)
s-ATT
text_traductor           17 regions (with
annotations)
s-ATT
text_titol_or            17 regions
(with annotations)
s-ATT
text_titol_tr            17 regions
(with annotations)
s-ATT
s                    
26347 regions
s-ATT
s_id                 
26347 regions (with annotations)
s-ATT
enty                     
0 regions
s-ATT
contrac                  
0 regions
s-ATT
contrac_forma             0
regions (with annotations)
s-ATT
abr                      
0 regions
s-ATT
date                     
0 regions
s-ATT
p                        
0 regions
a-ATT banctraddeca_ca       25170 alignment
blocks (extended)



So what does          cwb-describe-corpus -s CORPUS_OL  tell you?  


_______________________

            andrés
chandía

NMT |
Dungupeyem | Corlexim

administrador de:
Parles.upf | Amind
terapia | ONG Mapuche koyaktu | Nocando | IAC
| CddZ | CatCg |
mail: ONG Mapuche koyaktu | Psicoaching |
P No imprima innecesariamente. ¡Cuide el medio ambiente!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20171121/d05fe20b/attachment-0001.html>


More information about the CWB mailing list