[CWB] Bad metadata value on input file

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu May 19 12:54:50 CEST 2022


Re linebreaks, you might want to install the 3.4.x version of  the CWB core from the repo – it supports Windows linebreaks on Unix (and vice versa) (plus is proof against problems arising from invisible byte-order marks.)

best

Andrew.

From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of Peric Bojan (perc)
Sent: 18 May 2022 22:04
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: [CWB] Bad metadata value on input file

I figured it out 😊 We had differently formatted the text tags because we had collected data from different sources. After “standardizing” the tags, everything works. The problems with searching by lemma were due to the windows formatting of line breaks; as soon as we changed it to linux line breaks, it worked again.

Thank you for you help and best regards
Bojan


------------------------------------------------------------

lic. phil. Bojan Peric
Wissenschaftlicher Mitarbeiter

ZHAW School of Management and Law
Gertrudstrasse 15
CH-8400 Winterthur

perc at zhaw.ch<mailto:perc at zhaw.ch>

Von: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> Im Auftrag von Peric Bojan (perc)
Gesendet: Mittwoch, 18. Mai 2022 15:51
An: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Betreff: [CWB] Bad metadata value on input file

Hi Andrew

Thank you very much for your reply. I’m not sure if “blanks” means “space characters” or “empty attributes”, so I tried to remove space characters and fill in dummy values if there is an empty attribute. So now there are neither space characters nor empty attributes, however, the problem persists. By the way, I can’t change the data type to classification under “Manage text metadata”, all handles are automatically set to free text and are not changeable. It doesn’t matter which boxes I check, I get the same error.

Funnily enough, the corpus works in CWB – at least more or less, searching by lemma does not.

What am I missing?

Many thanks
Bojan


------------------------------------------------------------

lic. phil. Bojan Peric
Wissenschaftlicher Mitarbeiter

ZHAW School of Management and Law
Gertrudstrasse 15
CH-8400 Winterthur

perc at zhaw.ch<mailto:perc at zhaw.ch>

Von: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> Im Auftrag von Hardie, Andrew
Gesendet: Mittwoch, 18. Mai 2022 13:07
An: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Betreff: Re: [CWB] Bad metadata value on input file

It is the blanks. If you specify a metadata field as being of datatype “classification”, then every text needs a value for that field.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> On Behalf Of Peric Bojan (perc)
Sent: 17 May 2022 15:20
To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
Subject: [CWB] Bad metadata value on input file

Hi

When I try to import a corpus in CQPweb, I get a whole lot of “bad metadata value” errors:

Bad metadata value on input file line 13915 in column 0: n``'' .

I can’t figure out where the problem is. I thought maybe it’s the blanks in the text tag attributes, but the problem persists when the blanks are removed. Any idea how to pinpoint the issue?

Here’s what a typical text tag looks like:

<text id="DEB5780" author="" title="" source="BA" page="1-24" topics="" subtopics="" language="de" date="1891" description="" type="Parlamentsdebatten" file="1891_001(AB1891N1-24).tetml" year="1891" decade="1890" url="debatten_data/debatten_tetml/1891_001(AB1891N1-24).tetml">

Any help is greatly appreciated.

Best
Bojan


------------------------------------------------------------

lic. phil. Bojan Peric
Wissenschaftlicher Mitarbeiter

ZHAW School of Management and Law
Gertrudstrasse 15
CH-8400 Winterthur

perc at zhaw.ch<mailto:perc at zhaw.ch>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20220519/e75dcd90/attachment-0001.html>


More information about the CWB mailing list