<html>

<head>

<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

<style type="text/css" style="display:none;"><!-- P {margin-top:0;margin-bottom:0;} --></style>

</head>

<body dir="ltr">

<div id="divtagdefaultwrapper" dir="ltr" style="font-size:12pt; color:rgb(0,0,0); font-family:Calibri,Helvetica,sans-serif,EmojiFont,&quot;Apple Color Emoji&quot;,&quot;Segoe UI Emoji&quot;,NotoColorEmoji,&quot;Segoe UI Symbol&quot;,&quot;Android Emoji&quot;,EmojiSymbols">

<p>Dear All,</p>

<p><br>

</p>

<p>I realise this question may not be a perfect fit for this mailing list, but I'm&nbsp;not sure who or&nbsp;where else to ask, so here goes:&nbsp;<span style="font-size:12pt">Have a</span><span style="font-size:12pt">ny</span><span style="font-size:12pt">&nbsp;of you</span><span style="font-size:12pt">&nbsp;ever

 worked with components from the <a href="http://ice-corpora.net/ice/index.html" class="OWAAutoLink">

International Corpus of English</a></span><span style="font-size:12pt">? T</span><span style="font-size:12pt">he xml-like&nbsp;annotations&nbsp;in the&nbsp;</span><span style="font-size:12pt">original files seem to be broken in many ways (e.g., inconsistent,&nbsp;unclosed&nbsp;and

 open&nbsp;tags,&nbsp;invalid&nbsp;overlaps, reserved characters in content), so preparing them for&nbsp;CQP&nbsp;turned out to be&nbsp;quite

</span><span style="font-size:12pt">challenging (</span><span style="font-size:12pt">at least for me). It's not really&nbsp;that I got caught on a&nbsp;specific

</span><span style="font-size:12pt">problem;</span><span style="font-size:12pt">&nbsp;I'm rather curious&nbsp;whether you</span><span style="font-size:12pt">&nbsp;have&nbsp;some general advice for correcting such ill-formed texts, perhaps from experience. I feel like regular expressions

 can only go so far (though I may very well just not&nbsp;be&nbsp;sufficiently knowledgable). There is an International Corpus of Learner English on the Lancaster CQPweb page. Is that similar by any chance?</span></p>

<p><span style="font-size:12pt"><br>

</span></p>

<p><span style="font-size:12pt">Best,</span></p>

<p><span style="font-size:12pt">Florian</span></p>

</div>

</body>

</html>