[CWB] Format for metadata files?

Graham Ranger -- UAPV graham.ranger at univ-avignon.fr
Sun Dec 4 18:07:37 CET 2016


I couldn't get my head round hexedit via the command line on ubuntu, but 
have resolved the problem, the unintelligent way, by opening a new file 
and copying, the content of one to the other, testing the .meta file 
line by line, and re-typing any characters that didn't make the grade. I 
wonder whether there might be more than one way of getting an 
underscore, and whether this was the source of the trouble...
Thanks for your help, and for taking the time to answer, Andrew and Jiayue.
Best,
Graham.


Le 04/12/2016 12:40, Hardie, Andrew a écrit :
> Try hexedit? If there *is* something invisible that will definitely show it.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Graham Ranger -- UAPV
> Sent: 04 December 2016 10:19
> To: cwb at sslmit.unibo.it
> Subject: Re: [CWB] Format for metadata files?
>
> Thanks for your answers... No progress, unfortunately. I really can't
> see what is happening. I've created basic .meta files in the past, with
> no hitches, and the regex search-and-replace routines on geany and
> similar haven't yielded anything. I've tried ANSI, CR and LF line ends,
> etc. but no go. Will keep trying, but otherwise I may just end up
> putting this metadata into each of the texts -- an option I was trying
> to avoid.
> Best,
> Graham.
>
> Le 03/12/2016 22:17, Jiayue Wang a écrit :
>> Hi,
>> I pasted the error message in Geany and found that each text id is led
>> by an invisible character (between the ' and the first visible letter).
>>
>> Jiayue
>>
>> On 03/12/16 17:19, Graham Ranger -- UAPV wrote:
>>> Hello,
>>> I'm getting the following error message when I try to load the metadata
>>> file for a corpus:
>>>
>>> The data source you specified for the text metadata contains
>>> badly-formatted text ID codes, as follows: <strong>
>>> 'assollant_rose_d_amour'; 'bruno_le_tour_de_la_france';
>>> 'bruyere_l_epee_de_charlemagne'; 'daudet_lettres_de_mon_moulin';
>>> 'malot_sans_famille'; 'marcel_les_petits_vagabonds';
>>> 'robida_les_assieges_de_compiegne'; 'segur_malheurs_de_sophie';
>>> 'segur_un_bon_petit_diable'; 'verne_cinq_semaines_en_ballon';
>>> 'verne_le_tour_du_monde'; 'zola_nouveaux_contes_a_ninon';</strong>
>>> (text ids can only contain unaccented letters, numbers, and underscore).
>>>
>>> The metadata is in a file called jeunesse.meta in which each line begins
>>> with the text id of the texts in the corpus.
>>> Inside the metadata file, the lines read as follows:
>>>
>>> assollant_rose_d_amour    alfred_assollant    rose_d_amour 1889
>>> 1850_1899    roman    avance
>>> bruno_le_tour_de_la_france    bruno    le_tour_de_la_france 1877
>>> 1850-1899    manuel_scolaire    elementaire
>>> etc.
>>>
>>> with text id, author, title, date, period, genre and level.
>>>
>>> I can't see what is wrong with the file: the error message suggests that
>>> it's formatted as <strong>, but it's just plain text!
>>> Thanks as always for any help.
>>> Best,
>>> Graham.
>>> _______________________________________________
>>> CWB mailing list
>>> CWB at sslmit.unibo.it
>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list