[CWB] Multi-word units

BOFÍAS ALBERCH, EVA eva.bofias at upf.edu
Thu Feb 14 19:53:25 CET 2013


Hi,

I don't know whether this is possible at all but it doesn't hurt to ask.
OK, here's the problem we have. We are developing a corpus to be exploited
via CQP and we would like future users to access information in different
ways. This is a diachronic corpus and sometimes it is important to know
what parts a given multi-word expression has. So for instance in Old
Spanish we  have expressions such as "apressurada mientre" ('mientre' is
the equivalent to the English -ly) which are clearly working as their
contemporary Spanish equivalent expressions: "apresuradamente". It is
important to encode this as a single word marked as 'adverb' but some
potential users might be interested in studying the evolution of these
forms and might want to distinguish between forms that the scribes wrote as
a single word (the same texts also have these adverbs with "mente" as
single words) from the ones that are written as two different words. The
idea would be to find some way of coding the corpus so that multiword
expressions such as these ones could be tagged as a single word but if a
user wanted to find all the instances of 'mientre' independently of whether
it is attached to the preceding word or not s/he would be able to do it as
well. Any suggestions? Or are we asking for something that is not possible?

Eva
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130214/3f4b0ca3/attachment.html>


More information about the CWB mailing list