[CWB] Suggestion: user intervention in constructing an index

Ciarán Ó Duibhín coduibhin at btinternet.com
Wed Mar 21 11:38:19 CET 2018


Thanks again Andrew.
  >> I am not comfortable with the idea of storing two columns to hold things which (unlike with normal lemmatisation) can be automatically generated from one column — during the indexing process, if access by a user-supplied script were usable there, acting on the text shown in column 1 to produce what is shown in column 2.

   

  But as I’ve explained, there is already a way to do that if you don’t want a permanent multi-column file – just put your user script into a pipeline with cwb-encode on the end. IE:

   

    a.. cat one-col-file |  column-transform-script | cwb-encode [options]
OK, I had thought your pipeline suggestion applied only to your first answer (transforming "word"), but I see now that it can apply to the second answer too (transform "word" and add "lemma").  Pipelining is not something I have worked with in Windows/DOS, but I assume it will be feasible.



It will avoid having a permanent multi-column file outside the corpus, but won't the multiple columns still exist internally in some form within the corpus?  :-(


  Some display systems like BNCweb remove non-original orthographic spaces from the CQP concordance. (BNCweb does this by having an additional binary p-attribute storing the “orthographic-space-after” data) ... 





  ...  you can address the second point (of rendering) by writing a display program which lays things out to your liking using one of the interface libraries i.e. the CWB-Perl modules or the cqp.inc.php module from CQPweb.  Or, if you prefer, just write your rendering script to pipe text in and out of a cqp slave instance (which is what the Perl and PHP libraries do behind the scenes). 

I'm not sure whether these two things — the additional binary attribute, and CWB-Perl — are two independent suggestions, or two aspects of the same suggestion.



I'm definitely interested in copying the BNCweb idea.  Where can I get info about binary p-attributes?  Where should I look to find out about reading this attribute from a script or program?



If I need to use CWB-Perl, or if using it would make things easier, I notice that the README in CWB-Perl 2.2.102 mentions "cwb-config", but https://github.com/cran/rcqp/blob/master/src/cwb/man/cwb-config.pod says that cwb-config is not yet available for Windows.



Regards,

Ciarán.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180321/b2b7618a/attachment-0001.html>


More information about the CWB mailing list