[CWB] Format for metadata files?
Graham Ranger -- UAPV
graham.ranger at univ-avignon.fr
Sun Dec 4 18:07:37 CET 2016
I couldn't get my head round hexedit via the command line on ubuntu, but
have resolved the problem, the unintelligent way, by opening a new file
and copying, the content of one to the other, testing the .meta file
line by line, and re-typing any characters that didn't make the grade. I
wonder whether there might be more than one way of getting an
underscore, and whether this was the source of the trouble...
Thanks for your help, and for taking the time to answer, Andrew and Jiayue.
Best,
Graham.
Le 04/12/2016 12:40, Hardie, Andrew a écrit :
> Try hexedit? If there *is* something invisible that will definitely show it.
>
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Graham Ranger -- UAPV
> Sent: 04 December 2016 10:19
> To: cwb at sslmit.unibo.it
> Subject: Re: [CWB] Format for metadata files?
>
> Thanks for your answers... No progress, unfortunately. I really can't
> see what is happening. I've created basic .meta files in the past, with
> no hitches, and the regex search-and-replace routines on geany and
> similar haven't yielded anything. I've tried ANSI, CR and LF line ends,
> etc. but no go. Will keep trying, but otherwise I may just end up
> putting this metadata into each of the texts -- an option I was trying
> to avoid.
> Best,
> Graham.
>
> Le 03/12/2016 22:17, Jiayue Wang a écrit :
>> Hi,
>> I pasted the error message in Geany and found that each text id is led
>> by an invisible character (between the ' and the first visible letter).
>>
>> Jiayue
>>
>> On 03/12/16 17:19, Graham Ranger -- UAPV wrote:
>>> Hello,
>>> I'm getting the following error message when I try to load the metadata
>>> file for a corpus:
>>>
>>> The data source you specified for the text metadata contains
>>> badly-formatted text ID codes, as follows: <strong>
>>> 'assollant_rose_d_amour'; 'bruno_le_tour_de_la_france';
>>> 'bruyere_l_epee_de_charlemagne'; 'daudet_lettres_de_mon_moulin';
>>> 'malot_sans_famille'; 'marcel_les_petits_vagabonds';
>>> 'robida_les_assieges_de_compiegne'; 'segur_malheurs_de_sophie';
>>> 'segur_un_bon_petit_diable'; 'verne_cinq_semaines_en_ballon';
>>> 'verne_le_tour_du_monde'; 'zola_nouveaux_contes_a_ninon';</strong>
>>> (text ids can only contain unaccented letters, numbers, and underscore).
>>>
>>> The metadata is in a file called jeunesse.meta in which each line begins
>>> with the text id of the texts in the corpus.
>>> Inside the metadata file, the lines read as follows:
>>>
>>> assollant_rose_d_amour alfred_assollant rose_d_amour 1889
>>> 1850_1899 roman avance
>>> bruno_le_tour_de_la_france bruno le_tour_de_la_france 1877
>>> 1850-1899 manuel_scolaire elementaire
>>> etc.
>>>
>>> with text id, author, title, date, period, genre and level.
>>>
>>> I can't see what is wrong with the file: the error message suggests that
>>> it's formatted as <strong>, but it's just plain text!
>>> Thanks as always for any help.
>>> Best,
>>> Graham.
>>> _______________________________________________
>>> CWB mailing list
>>> CWB at sslmit.unibo.it
>>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
More information about the CWB
mailing list