[CWB] Suggestion: user intervention in constructing an index
Ciarán Ó Duibhín
coduibhin at btinternet.com
Wed Mar 21 11:38:19 CET 2018
Thanks again Andrew.
>> I am not comfortable with the idea of storing two columns to hold things which (unlike with normal lemmatisation) can be automatically generated from one column — during the indexing process, if access by a user-supplied script were usable there, acting on the text shown in column 1 to produce what is shown in column 2.
But as I’ve explained, there is already a way to do that if you don’t want a permanent multi-column file – just put your user script into a pipeline with cwb-encode on the end. IE:
a.. cat one-col-file | column-transform-script | cwb-encode [options]
OK, I had thought your pipeline suggestion applied only to your first answer (transforming "word"), but I see now that it can apply to the second answer too (transform "word" and add "lemma"). Pipelining is not something I have worked with in Windows/DOS, but I assume it will be feasible.
It will avoid having a permanent multi-column file outside the corpus, but won't the multiple columns still exist internally in some form within the corpus? :-(
Some display systems like BNCweb remove non-original orthographic spaces from the CQP concordance. (BNCweb does this by having an additional binary p-attribute storing the “orthographic-space-after” data) ...
... you can address the second point (of rendering) by writing a display program which lays things out to your liking using one of the interface libraries i.e. the CWB-Perl modules or the cqp.inc.php module from CQPweb. Or, if you prefer, just write your rendering script to pipe text in and out of a cqp slave instance (which is what the Perl and PHP libraries do behind the scenes).
I'm not sure whether these two things — the additional binary attribute, and CWB-Perl — are two independent suggestions, or two aspects of the same suggestion.
I'm definitely interested in copying the BNCweb idea. Where can I get info about binary p-attributes? Where should I look to find out about reading this attribute from a script or program?
If I need to use CWB-Perl, or if using it would make things easier, I notice that the README in CWB-Perl 2.2.102 mentions "cwb-config", but https://github.com/cran/rcqp/blob/master/src/cwb/man/cwb-config.pod says that cwb-config is not yet available for Windows.
Regards,
Ciarán.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180321/b2b7618a/attachment-0001.html>
More information about the CWB
mailing list