<html>
<head>
<meta content="text/html; charset=utf-8" http-equiv="Content-Type">
</head>
<body bgcolor="#FFFFFF" text="#000000">
<div class="moz-cite-prefix">Dear Simon,<br>
<br>
</div>
<blockquote
cite="mid:42293D5C-E17F-4138-8B25-AD3B0F183B15@tu-berlin.de"
type="cite">
<pre wrap="">Sorry for posting a question not concerning CQP in the first place but the TreeTagger for Spanish texts:
Using the script „tree-tagger-spanish“ a list of multiword expressions is included in the tagging procedure, e.g. printing
</pre>
<blockquote type="cite" style="color: #000000;">
<pre wrap="">Por el contrario        ADV        por~el~contrario
</pre>
</blockquote>
</blockquote>
<br>
I think that this phenomenon is referred to as "secondary
preposition", i.e., a multiword expression with a prepositional
semantics -- I believe it does exist in Italian as well.<br>
<br>
As far as I know, Spanish language model is the only TreeTgger model
expecting/allowing secondary prepositions being tokenized like
that. I am not sure whether a native speaker of Spanish without any
corpus linguistics background would have any special expectations
regarding tokenization. This is why we decided just to "switch off"
this feature for our Araneum Hispanicum corpus :-)<br>
<br>
Best, <br>
<p>Vlado B, 12:20<br>
</p>
<div class="moz-signature">-- <br>
<font color="navy">Vladimír Benko</font>
<p>
Université Comenius de Bratislava<br>
Chaire UNESCO de communication<br>
plurilingue et multiculturelle</p>
<p>
Šafárikovo námestie 6, SK-81499 Bratislava</p>
<p>
<a class="moz-txt-link-freetext" href="http://unesco.uniba.sk/guest/">http://unesco.uniba.sk/guest/</a><br>
<a class="moz-txt-link-freetext" href="https://www.facebook.com/araneawebcorpora/">https://www.facebook.com/araneawebcorpora/</a><br>
<a class="moz-txt-link-freetext" href="https://vk.com/araneawebcorpora">https://vk.com/araneawebcorpora</a>
</p>
</div>
</body>
</html>