[CWB] { EXTERN } Re: CQPweb 3.3.18 rev. 1883: Set collocation window from and collocation window to to ten tokens
Jörn Stegmeier
stegmeier at uni-trier.de
Wed Dec 4 17:45:40 CET 2024
Dear list,
I just realized that I did not reply to the list but to Andrew directly.
Here's what I wrote after he answered my initial question:
Hi Andrew,
Thank you for explanation and sorry that I still don't understand
completely. Just to make sure: the "Collocation window from" and the
"Collocation window to" constitute the span by adding the value of the
first to the value of the other? As in "5 to the Left" and "5 to the
Right" makes for a window span of 10?
With the default span for calculating collocations (user settings) set
to 10 L and 10 R, and the maximum window span in the "Choose settings
for proximity-based calculations" set to "+/- 10" I would then expect
the initial window span to amount to 20 and the value for the
"Collocation window from" to be "10 to the Left" and the value for the
"Collocation window to" to be "10 to the Right". However, at least in
our installation, the maximum value for both is always "5 to the Left" /
"5 to the Right".
After some digging in the code I think I identified the reason for the
fixed values in the Collocation controls: In defaults.php there is a
variable called "default_colloc_range" which is set to 5 in the
original. After setting it to "10" the options in the Collocation
controls now reach to "10 to the Left" and "10 to the Right" (which I
understand means a window span of 20). Some tests make me believe that
it is working as expected. The "Distance" column in the window which is
displayed after clicking on a collocate now shows the positions from -10
to 10 and following the links to the concordances in the Display column
show sensible results as well.
So to sum up: I think with the default_colloc_range set to 10 it works
as expected. However, I think I am still a little wobbly in my
understanding of "range" vs. "window span". In my understanding, a range
of 10 translates to a window span of 20 because the "10" is used for "to
the Left" and "to the Right".
Please let me know if I somehow misunderstood the results of my tests.
And if you have time, I would of course appreciate it if you could
comment on my terminology problem.
Best
Jörn
On 04.12.24 17:35, Stephanie Evert wrote:
>
>> Incidentally, this UI was inherited from BNCweb, which was designed
>> with hardware limitations of 20+ years ago in mind. (Thus the need to
>> set the collocation data just once when entering the collocation
>> screen; thus the avoidance of compiling data for attributes/spans
>> that weren't needed, in order to keep things fast.)
>>
>> On today's systems it is probably safe to recalculate the data
>> between displays of the collocation screen if necessary. That would
>> allow all the options to be moved into the Collocations screen,
>> without the separate little popup. (I'd probably then segment the
>> Collocation controls into "Basic" - the span and stats - and
>> "Detailed" - minima, p-attribute, etc, only appearing when invoked.)
>
> My impression is that building the collocation database can still take
> a substantial amount of time for a node with tens or hundreds of
> thousands of occurrences, but I haven't actually checked this in a
> current version of CQPweb running on current hardware. If we could
> avoid MySQL and do computations in memory with something like NumPy,
> this might indeed be faster in the end.
>
>> The TLDR is that your users should set the max span to L10 R10 in
>> order to get the flexibility to use large spans once they are on the
>> collocations screen.
>
> Given that hardware limitations aren't as tight as 20 years ago,
> perhaps a very simple change would be to make this the default setting
> for the collocation database? I.e. maximal supported span + include
> all annotations. Users would still be able to change these options to
> more conservative settings when running a very large collocation analysis.
>
> Best,
> Stephanie
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
--
Dr. Jörn Stegmeier
DFG-Projekt "Kontroverse Diskurse"
Teilprojekt 7 "Methodologie & Reflexion"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20241204/3e3d1c9e/attachment.html>
More information about the CWB
mailing list