[CWB] Impossible query

Stefan Evert stefanML at collocations.de
Mon Feb 29 15:22:38 CET 2016


> On 28 Feb 2016, at 08:32, Ruprecht von Waldenfels <ruprecht.waldenfels at gmx.net> wrote:
> 
> (ii) Are you really sure you want to do that?  What you get isn't a random sample in any sense that would allow you to draw statistical inferences.
> 
> Thanks for the comment. The goal is to restrict the influence of high-frequency lemmata in the next step that consists in observing the overall behaviour in the translated word forms. One other thing I could do is give multiple occurrences of the same lemma less weight, it seems to me, but I didnt't go for that (I don't quite remember now why, but the general point seemed to be that it didn't make a difference).  I need to normalize for frequency in some way. Any other idea?

This is a very interesting methodological issue and I'd be happy to discuss it in greater detail.  I think I need to know more about what exactly you intend to do with the sample – i.e. what inferences you want to be able to draw – in order to make reliable recommendations.  Perhaps we should take this off-list?

Best,
Stefan


More information about the CWB mailing list