<!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Dear list,</p>
<p>I just realized that I did not reply to the list but to Andrew
directly. Here's what I wrote after he answered my initial
question:</p>
<p>Hi Andrew,
<br>
<br>
Thank you for explanation and sorry that I still don't understand
completely. Just to make sure: the "Collocation window from" and
the "Collocation window to" constitute the span by adding the
value of the first to the value of the other? As in "5 to the
Left" and "5 to the Right" makes for a window span of 10?
<br>
<br>
With the default span for calculating collocations (user settings)
set to 10 L and 10 R, and the maximum window span in the "Choose
settings for proximity-based calculations" set to "+/- 10" I would
then expect the initial window span to amount to 20 and the value
for the "Collocation window from" to be "10 to the Left" and the
value for the "Collocation window to" to be "10 to the Right".
However, at least in our installation, the maximum value for both
is always "5 to the Left" / "5 to the Right".
<br>
<br>
After some digging in the code I think I identified the reason for
the fixed values in the Collocation controls: In defaults.php
there is a variable called "default_colloc_range" which is set to
5 in the original. After setting it to "10" the options in the
Collocation controls now reach to "10 to the Left" and "10 to the
Right" (which I understand means a window span of 20). Some tests
make me believe that it is working as expected. The "Distance"
column in the window which is displayed after clicking on a
collocate now shows the positions from -10 to 10 and following the
links to the concordances in the Display column show sensible
results as well.
<br>
<br>
So to sum up: I think with the default_colloc_range set to 10 it
works as expected. However, I think I am still a little wobbly in
my understanding of "range" vs. "window span". In my
understanding, a range of 10 translates to a window span of 20
because the "10" is used for "to the Left" and "to the Right".
<br>
<br>
Please let me know if I somehow misunderstood the results of my
tests. And if you have time, I would of course appreciate it if
you could comment on my terminology problem.
<br>
<br>
Best
<br>
<br>
Jörn
<br>
</p>
<div class="moz-cite-prefix">On 04.12.24 17:35, Stephanie Evert
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:5043DAA5-76AE-42C6-B5F8-0DD477CBE35B@collocations.de">
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<div>
<div><br>
</div>
<blockquote type="cite">
<div><span
style="caret-color: rgb(0, 0, 0); font-family: ArialMT; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;">Incidentally,
this UI was inherited from BNCweb, which was designed with
hardware limitations of 20+ years ago in mind. (Thus the
need to set the collocation data just once when entering
the collocation screen; thus the avoidance of compiling
data for attributes/spans that weren't needed, in order to
keep things fast.)</span><br
style="caret-color: rgb(0, 0, 0); font-family: ArialMT; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
<br
style="caret-color: rgb(0, 0, 0); font-family: ArialMT; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
<span
style="caret-color: rgb(0, 0, 0); font-family: ArialMT; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; float: none; display: inline !important;">On
today's systems it is probably safe to recalculate the
data between displays of the collocation screen if
necessary. That would allow all the options to be moved
into the Collocations screen, without the separate little
popup. (I'd probably then segment the Collocation controls
into "Basic" - the span and stats - and "Detailed" -
minima, p-attribute, etc, only appearing when invoked.)</span><br
style="caret-color: rgb(0, 0, 0); font-family: ArialMT; font-size: 14px; font-style: normal; font-variant-caps: normal; font-weight: 400; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none;">
</div>
</blockquote>
</div>
<br>
<div>My impression is that building the collocation database can
still take a substantial amount of time for a node with tens or
hundreds of thousands of occurrences, but I haven't actually
checked this in a current version of CQPweb running on current
hardware. If we could avoid MySQL and do computations in memory
with something like NumPy, this might indeed be faster in the
end.</div>
<div><br style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0);">
<blockquote type="cite"><span style="font-family: ArialMT;">The
TLDR is that your users should set the max span to L10 R10
in order to get the flexibility to use large spans once they
are on the collocations screen.</span></blockquote>
</div>
<div><span style="font-family: ArialMT;"><br>
</span></div>
<div><span style="font-family: ArialMT;">Given that hardware
limitations aren't as tight as 20 years ago, perhaps a very
simple change would be to make this the default setting for
the collocation database? I.e. maximal supported span +
include all annotations. Users would still be able to change
these options to more conservative settings when running a
very large collocation analysis.</span></div>
<div><span style="font-family: ArialMT;"><br>
</span></div>
<div><span style="font-family: ArialMT;">Best,</span></div>
<div><span style="font-family: ArialMT;">Stephanie</span></div>
<br>
<fieldset class="moz-mime-attachment-header"></fieldset>
<pre wrap="" class="moz-quote-pre">_______________________________________________
CWB mailing list
<a class="moz-txt-link-abbreviated" href="mailto:CWB@sslmit.unibo.it">CWB@sslmit.unibo.it</a>
<a class="moz-txt-link-freetext" href="http://liste.sslmit.unibo.it/mailman/listinfo/cwb">http://liste.sslmit.unibo.it/mailman/listinfo/cwb</a>
</pre>
</blockquote>
<pre class="moz-signature" cols="72">--
Dr. Jörn Stegmeier
DFG-Projekt "Kontroverse Diskurse"
Teilprojekt 7 "Methodologie & Reflexion"</pre>
</body>
</html>