[CWB] Suggestion: user intervention in constructing an index

Hardie, Andrew a.hardie at lancaster.ac.uk
Thu Mar 29 17:14:59 CEST 2018


It’s true you can’t have empty attributes, but if you ignore the close tags, then you’ve effectively got one!

The easiest way is just to auto-close. Then the empty tags get interpreted as open, and a “tacit” close tag added when it is time to start a new range.

Then in the concordance, deal with the <g> and ignore the </g>.

best

Andrew.

From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Blätte, Andreas
Sent: 29 March 2018 15:10
To: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it>
Subject: Re: [CWB] Suggestion: user intervention in constructing an index

Dear colleagues,

following the discussion on glue, my technical understanding would be that using a s-attribute (<g/>) is technically not possible, as self-closing elements are not possible. A structural attribute minimally needs to wrap one token. Please correct me, if I am wrong!

So a binary p-attribute may be an option. Huffman coding may ensure that using a p-attribute to indicate where tokens should be glued may be fairly memory efficient. The implementation of the corpus library (using cl_cpos2id) ensures that the computational cost would be fairly modest.

Andrew’s argument was that it would contravene the logic of CQP to have it implemented there. You may know the R layer (package ‘polmineR’) I use for working with CQP. The time investment to implement using a p-attribute for gluing output (for concordances) would be very modest. But does anybody know / use polmineR (at CRAN, and github.com/PolMine/polmineR)?

Kind regards, and happy Easter
Andreas


Von: <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> im Auftrag von Ciarán Ó Duibhín <coduibhin at btinternet.com<mailto:coduibhin at btinternet.com>>
Antworten an: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Datum: Donnerstag, 29. März 2018 um 15:31
An: Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Betreff: Re: [CWB] Suggestion: user intervention in constructing an index

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180329/19e83886/attachment-0001.html>


More information about the CWB mailing list