[CWB] Suggestion: user intervention in constructing an index

Ruprecht von Waldenfels ruprecht.waldenfels at gmx.net
Sat Mar 31 08:22:38 CEST 2018


Dear Vlado,
interesting, that explains why the spaces are still there in my corpus. 
Where do I turn them off?

I think for those users that need to copy text it's really a needed 
function.

Best!
Ruprecht

Am 30.03.2018 um 18:25 schrieb Vladimír Benko:
> Dear All,
>
>> Just a small rectification re: |<g/>| in Manatee/Bonito: turns out it 
>> /is/ an opt-in configuration after all, cf. 
>> https://groups.google.com/a/sketchengine.co.uk/d/msg/noske/lYHa3WSb4L8/6ycvtxCYAwAJ. 
>> Sorry if I misled anyone earlier, I don’t use the feature myself, so 
>> I only had a vague recollection it was somehow there. And apologies 
>> to any (No)SkE devs who might be subscribed to the CWB list — this is 
>> actually a nice and clean way to do it :)
>
> The <g/> feature in (No)SkE can be opted in at two levels: Firstly, by 
> including or the <g/> structure into the source vertical (this must be 
> performed during tokenization), and defining it in the respective 
> corpus configuration file, the corpus designer decides that the 
> original appearance of spaces is preserved.  And secondly, any corpus 
> user can decide whether the <g/> structures are to be interpreted 
> (which is bit misleadingly called "displayed").
>
> In our (No)SkE installations, we prefer preserving information about 
> spaces for the text displayed on the screen, as two main groups of our 
> corpora (lexicographers and students of foreign languages) typically 
> need to copy longer texts fragments, which otherwise would require 
> manual editing.
>
> I admit, however, that use of <g/>'s may also confuse corpus users, as 
> some token boundaries become "hidden" and tonenization policy is less 
> apparent :-)
>
> Best regards,
>
> Vlado B, 18:20
>
>
> -- 
> Vladimír Benko
>
> Université Comenius de Bratislava
> Chaire UNESCO de communication
> plurilingue et multiculturelle
>
> Šafárikovo námestie 6, SK-81499 Bratislava
>
> http://unesco.uniba.sk/guest/
> https://www.facebook.com/araneawebcorpora/
> https://vk.com/araneawebcorpora
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180331/5c7b025b/attachment.html>


More information about the CWB mailing list