[CWB] Install corpus: is there a way to "select all files"?

Hardie, Andrew a.hardie at lancaster.ac.uk
Sun Nov 20 14:33:39 CET 2016


CQPweb (like CWB generally) doesn't care whether the data comes in one file or many. It cares about the <text> elements. It doesn't matter whether you have many input files each with a <text id="XXX"> ... </text> covering the whole file, or a single input file with many <text id="XXX"> ... </text>  spans lined up one after another. The outcome is 100% the same.

If you have metadata relating to the texts (i.e. your Name, Age, Major fields) then you can import it into the system two ways. (1) put the metadata in a tab-delimited text file with text IDs in the first column; upload this file; use it to set up text metadata. OR, (2) have the metadata as additional attributes on the <text> element; declare these when indexing; then import the text metadata from the resulting s-attributes. 

So, in short, it's not impossible at all!

best

Andrew.

-----Original Message-----
From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Jiayue Wang
Sent: 20 November 2016 13:27
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] Install corpus: is there a way to "select all files"?

Thanks Andrew, that's what I did at last. In fact the 1000+ files are 
student essays, each of which has such info as Name, Age, Major and so 
on, so I hope they can be separate files, but with just one or a few 
files (concatenated ones) I guess the annotation of such individual 
properties would be impossible? (In the few corpus files I finally used 
none of those properties exist.)

Best
Jiayue

On 20/11/16 11:02, Hardie, Andrew wrote:
>
>
> -----Original Message-----
> From: Hardie, Andrew
> Sent: 18 November 2016 12:55
> To: Open source development of the Corpus WorkBench
> Subject: RE: [CWB] Install corpus: is there a way to "select all files"?
>
> Easiest solution: concatenate the files together (on the command line using cat). Then you only have to tick one little checkbox.
>
> e.g.
>
> cat *.txt > MyBigInputFile
>
> or whatever.
>
> best
>
> Andrew.
>
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Jiayue Wang
> Sent: 18 November 2016 10:00
> To: Open source development of the Corpus WorkBench
> Subject: Re: [CWB] Install corpus: is there a way to "select all files"?
>
> Sorry I forgot to mention that I was working on CQPweb.
>
> After selecting all the files and clicking Install, CQPweb told me that my request exceeded the max URL length. So what to do if I want to install such corpora?
>
> Any help will be much appreciated.
>
> Jiayue
>
> On 18/11/16 09:37, Jiayue Wang wrote:
>> I'm trying to install a corpus of more than a thousand files. Is there
>> a way in which I select all files listed, without having to click the
>> little checkboxes one by one?
>>
>> Jiayue
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://liste.sslmit.unibo.it/mailman/listinfo/cwb


More information about the CWB mailing list