[CWB] Format for metadata files?

Graham Ranger -- UAPV graham.ranger at univ-avignon.fr
Sun Dec 4 11:19:05 CET 2016


Thanks for your answers... No progress, unfortunately. I really can't 
see what is happening. I've created basic .meta files in the past, with 
no hitches, and the regex search-and-replace routines on geany and 
similar haven't yielded anything. I've tried ANSI, CR and LF line ends, 
etc. but no go. Will keep trying, but otherwise I may just end up 
putting this metadata into each of the texts -- an option I was trying 
to avoid.
Best,
Graham.

Le 03/12/2016 22:17, Jiayue Wang a écrit :
> Hi,
> I pasted the error message in Geany and found that each text id is led 
> by an invisible character (between the ' and the first visible letter).
>
> Jiayue
>
> On 03/12/16 17:19, Graham Ranger -- UAPV wrote:
>> Hello,
>> I'm getting the following error message when I try to load the metadata
>> file for a corpus:
>>
>> The data source you specified for the text metadata contains
>> badly-formatted text ID codes, as follows: <strong>
>> 'assollant_rose_d_amour'; 'bruno_le_tour_de_la_france';
>> 'bruyere_l_epee_de_charlemagne'; 'daudet_lettres_de_mon_moulin';
>> 'malot_sans_famille'; 'marcel_les_petits_vagabonds';
>> 'robida_les_assieges_de_compiegne'; 'segur_malheurs_de_sophie';
>> 'segur_un_bon_petit_diable'; 'verne_cinq_semaines_en_ballon';
>> 'verne_le_tour_du_monde'; 'zola_nouveaux_contes_a_ninon';</strong>
>> (text ids can only contain unaccented letters, numbers, and underscore).
>>
>> The metadata is in a file called jeunesse.meta in which each line begins
>> with the text id of the texts in the corpus.
>> Inside the metadata file, the lines read as follows:
>>
>> assollant_rose_d_amour    alfred_assollant    rose_d_amour 1889
>> 1850_1899    roman    avance
>> bruno_le_tour_de_la_france    bruno    le_tour_de_la_france 1877
>> 1850-1899    manuel_scolaire    elementaire
>> etc.
>>
>> with text id, author, title, date, period, genre and level.
>>
>> I can't see what is wrong with the file: the error message suggests that
>> it's formatted as <strong>, but it's just plain text!
>> Thanks as always for any help.
>> Best,
>> Graham.
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list