[CWB] [Corpora-List] Formal definition of "Collocation strength"

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Nov 5 17:55:54 CET 2015


>>>
-----Original Message-----
From: corpora-bounces at uib.no [mailto:corpora-bounces at uib.no] On Behalf Of John F Sowa

I believe that Matías was asking for the underlying reason or
insight that explains *why* two terms have a strong collocation:

Matías Guzmán Naranjo wrote:
> By 'attraction' do we mean the ranking, the actual mathematical
> definition of the test used, some sort of property of the words tested?
<<<

Just so, and I did in fact note:

"(Or, if you want to put it psychologically: the strength of an activation link between A and B within the language system such that when A is perceived or produced, B is more likely to be produced.)"

It's increasingly less controversial that psychological activation links between two or more things (words, syntactic categories, whatever), of some kind or another, are the underlying reason why pairs/groups of items tend to be produced more often together than chance would predict, leading to the existence of collocation, colligation, etc.

So that's the real-world property that collocation measures are getting at, and that's what is meant by the shorthand term "attraction". In other words, the strength of the psychological activation link is approximated by the increase in probability of occurrence based on textual evidence.

Unfortunately statistical co-occurrence measures are too blunt a tool to distinguish words linked by grammar versus words linked by meaning versus words linked by pragmatic function. Thus what you refer to as "noise".

One day, we will perhaps be able to measure the psychological links directly via neural evidence rather than using probability estimates based on text...

best

Andrew.


More information about the CWB mailing list