[CWB] Problem with corpora on CQPweb

Katia Karanasiou katia.kar6 at gmail.com
Fri Nov 20 16:14:52 CET 2015


Hello,

Thank you very much for your help.
I changed the permissions and now it creates the page for the corpus
queries.
When i start a query at a specific corpus, it throws the following errors:






*Base class package "CWB::CEQL" is empty.(Perhaps you need to 'use' the
module which defines that package first, or make that module available in
@INC (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.20.2
/usr/local/share/perl/5.20.2 /usr/lib/x86_64-linux-gnu/perl5/5.20
/usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.20 /usr/share/perl/5.20
/usr/local/lib/site_perl .).at ../lib/perl/cqpwebCEQL.pm line 27.BEGIN
failed--compilation aborted at ../lib/perl/cqpwebCEQL.pm line 27Compilation
failed in require at - line 2.*

I've already installed Perl-CWB and i changed @INC to find the specific
Perl module ( using export
PERL5LIB=/var/www/CQPweb-3.2.1/lib/perl/cqpwebCEQL.pm ).

The CQPweb version is 3.2.1 and i installed the Perl-CWB-2.2.102 .

Any idea what the problem could be?
Thank you in advance.

Best regards,
Katia.



On Thu, Nov 19, 2015 at 3:39 PM, <cwb-request at sslmit.unibo.it> wrote:

> Send CWB mailing list submissions to
>         cwb at sslmit.unibo.it
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> or, via email, send a message with subject or body 'help' to
>         cwb-request at sslmit.unibo.it
>
> You can reach the person managing the list at
>         cwb-owner at sslmit.unibo.it
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of CWB digest..."
>
>
> Today's Topics:
>
>    1. Problem with corpora on CQPweb (Katia Karanasiou)
>    2. Re: Problem with corpora on CQPweb (Hardie, Andrew)
>    3. Re: Problem with corpora on CQPweb (Hannah Kermes)
>    4. Re: TEITOK (Maarten Janssen)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 19 Nov 2015 13:20:20 +0100
> From: Katia Karanasiou <katia.kar6 at gmail.com>
> To: cwb at sslmit.unibo.it
> Subject: [CWB] Problem with corpora on CQPweb
> Message-ID:
>         <CAN8HmPAK+miztjKA3CfsGo=
> yEWGiDPiHWkL_zZivZvFSpQ6SNA at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> I used "CQPweb Admin Control Panel" -> "Install new Corpus" option for
> uploading a new corpus to CQPweb. Although, it encodes the input corpus and
> creates index files, it does not appear the corpora on cqp web site.
> Does anyone know, which could be the problem?
> Thank you.
>
> Best regards,
> Katia
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20151119/61c6a01b/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 19 Nov 2015 13:04:00 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
>         <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] Problem with corpora on CQPweb
> Message-ID:
>         <28078EC3FBF1B940A3EF3D0D19BE351D70C9A27F at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> Have you checked whether the username that the web server runs under has
> permissions to create folders and symlinks in the main folder of CQPweb?
>
> best
>
> Andrew.
>
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On
> Behalf Of Katia Karanasiou
> Sent: 19 November 2015 12:20
> To: cwb at sslmit.unibo.it
> Subject: [CWB] Problem with corpora on CQPweb
>
> Hello,
> I used "CQPweb Admin Control Panel" -> "Install new Corpus" option for
> uploading a new corpus to CQPweb. Although, it encodes the input corpus and
> creates index files, it does not appear the corpora on cqp web site.
> Does anyone know, which could be the problem?
> Thank you.
> Best regards,
> Katia
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20151119/fc8cfb54/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Thu, 19 Nov 2015 14:49:06 +0100
> From: Hannah Kermes <h.kermes at mx.uni-saarland.de>
> To: cwb at sslmit.unibo.it
> Subject: Re: [CWB] Problem with corpora on CQPweb
> Message-ID: <564DD352.5040906 at mx.uni-saarland.de>
> Content-Type: text/plain; charset="windows-1252"; Format="flowed"
>
> I once forgot to set permissions or to make it visible.
>
> Best
> Hannah
>
> Am 19.11.2015 um 14:04 schrieb Hardie, Andrew:
> >
> > Have you checked whether the username that the web server runs under
> > has permissions to create folders and symlinks in the main folder of
> > CQPweb?
> >
> > best
> >
> > Andrew.
> >
> > *From:*cwb-bounces at sslmit.unibo.it
> > [mailto:cwb-bounces at sslmit.unibo.it] *On Behalf Of *Katia Karanasiou
> > *Sent:* 19 November 2015 12:20
> > *To:* cwb at sslmit.unibo.it
> > *Subject:* [CWB] Problem with corpora on CQPweb
> >
> > Hello,
> >
> > I used "CQPweb Admin Control Panel" -> "Install new Corpus" option for
> > uploading a new corpus to CQPweb. Although, it encodes the input
> > corpus and creates index files, it does not appear the corpora on cqp
> > web site.
> >
> > Does anyone know, which could be the problem?
> >
> > Thank you.
> >
> > Best regards,
> >
> > Katia
> >
> >
> >
> > _______________________________________________
> > CWB mailing list
> > CWB at sslmit.unibo.it
> > http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20151119/e00f5848/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Thu, 19 Nov 2015 15:39:40 +0100
> From: Maarten Janssen <maartenpt at gmail.com>
> To: cwb at sslmit.unibo.it
> Subject: Re: [CWB] TEITOK
> Message-ID: <EF9EC9F5-81F9-4650-868D-786E68E0CDE6 at gmail.com>
> Content-Type: text/plain; charset=utf-8
>
> Hi Stefan and Andrew,
>
> thanks for the answers! Here are some responses:
>
> > TEITOK looks like an excellent tool ? can we put a link to the server on
> the CWB homepage?
>
> Of course you can; I would be pleased if you did - the people that are
> using it seem quite pleased with it, so there is definitely a ?market? for
> it.
>
> > Also, having a mostly automated TEI converter program would be really
> useful.
>
> TEITOK is not really a TEI converter, and depending on what you want to
> convert you have to follow a different path:
>
> - The internal structure TEITOK uses it uses is not really TEI, although
> it is TEI compliant; there are too many options in TEI to really work with
> it directly, and what is specifically not used is the P4+ style <w>
> elements where annotation is modeled as text-nodes under child nodes.
> Instead, it uses the ?older? style of <w> where annotations are attributes
> (to make sure they are always strings), and calls them <tok> rather than
> <w> to avoid confusion (and since <w> typically excludes punctuation marks,
> while tokens do not). So to use TEITOK, you either have to start from a TEI
> file that is not tokenized (TEITOK has an XML tokenizer to create
> TEITOK-style tokenize TEI), or convert the TEI file to TEITOK style (in
> Ljubljana they wrote an XSLT that does excatly that), after which
> tt-cwb-encode will directly create a CQP corpus for you.
>
> - tt-cwb-encode can be used to direclty convert most TEI flavours to a CQP
> corpus (I should provide an  example settings file with it to show how to
> convert a typical <w> style TEI file to CQP), but tt-cwb-encode does not
> tokenize, so for doing that, you would need a file that IS already
> tokenized (and annotated), and specify exactly which information can be
> found where in your TEI file.
>
> >>>> - the technical manual quite explicitly states that structures cannot
> embed or overlap; however, the logic of .rng files does not seem to
> invalidate that in any way.
> >>
> >> *Different* attributes can embed and overlap. But instances of one
> attribute can't embed with, or overlap with, other instances of the same
> attribute. And yes, it is not the structure of the binary files but rather
> the way they are used that prevents that.
> >
> > Well, the unpublished file format specification ? which I assume you
> mean by the "logic of .rng files" ? mandates that regions don't nest or
> overlap: the integer values in a .rng file must form an increasing
> sequence.  If you violate the file format, bad things will happen (i.e.
> undefined behaviour of CQP and the other CWB tools).
>
> I by now fully implemented it and I can confirm that that is indeed a hard
> requirement: if you created two overlapping ranges, one from tokens 4-6
> with error_type=?agreement? and one from 5-7 with error_type=?collocation?
> (generated in the example I tried from stand-off annotation files where
> ranges can overlap), then only token 7 will be a ?collocation? error, while
> 4-6 are only ?agreement? errors. However, at least from simple tests, it
> does not in any way seem to break CWB - it just ignores any token inside a
> range <x> that was already inside another range <x>.
>
> >> For that reason, TEITOK since this week uses a custom c++ application
> to directly build the files needed by cwb-makeall from the XML files.
> >
> > Does that mean you actually create the binary data files (in
> uncompressed form) from your application, without going through the
> appropriate CWB tools?  You shouldn't do that, and I can't think of any
> good reason for doing it.[*]  One of the obvious consequences is that any
> file format changes ? such as those envisioned for CWB 4, will completely
> break your program, and it will be much harder to adapt than if you were
> using the CWB encoder tools.
> >
> > If you create .rng files through with the appropriate cwb-s-encode
> utility, it will stop you from generating overlapping or nested regions.
> >
> > [*] Ok, there's one fairly good reason if you're dealing with very large
> corpora: it may be more efficient to write files directly than to open
> pipes to a large number of cwb-encode and cwb-s-encode backends.  But I'm
> really not sure that this makes up for the loss in maintainability and
> reliability.
>
> Yes - tt-cwb-encode directly writes binary files; I initially wanted to
> use cwb-atoi (and later hence cwb-s-encode), but when opening up the code
> in that, I saw the conversion is so trivial that there was simply not need
> for the overhead (which would also involve making sure the application can
> be found, etc.). It is a simple function, which can easily be modified to a
> call to cwb-atoi on a major overhaul, or just implemented slighly
> differently (a direct copy would not really word since tt-cwb-encode is C++
> and not C)
>
> // Write CWB network style
> void write_network_number ( int towrite, FILE *stream ) {
>         int i = htonl(towrite);
>         fwrite(&i, 4, 1, stream);
> };
>
> The same holds for ranges, although that is of course vaguely more
> complicated. However, most of the work is in finding out what range to
> write in the first place, the 10 lines for
> void write_range ( int pos1, int pos2, string formkey )
> do not really add to the complexity and can also be modified in the future
> when needed.
>
> Also - I would hope that if CWB gets a major overhaul, the implementation
> of ranges could be rethought as well, which would probably mean that even
> cwb-s-encode would break. Here is a "suggestion?:
>
> Apart from allowing overlaps and/or nestings, the application of
> sattributes is hampered by the fact that they are so very different from
> pattributes, which means many of the nice functions on pattributes are not
> applicable to sattributes (I think even regex is not available for
> sattributes). In my opinion, the language would become much more expressive
> by blurring the distinction between p and s, and adopting a notation ala
> XPath where before the brackets you can indicate the range type (with
> nothing meaning a token), to allow for queries like
>
> np[case=?nominative|ergative"] [pos=?V.*?]
>
> and since these are ranges, they can of course nested:
>
> mwe[type=?name? [pos=?CC"]]
>
> which seems not only more elegant to me than [pos=?CC?] :: mwe_type=?name?
> but also should be more expressive...
>
> The difference with the current search style is not that big (and it
> should not affect backward compatibility), and since a new file format
> would require looking up data compeltely differently anyway, it might be
> worth while to profit from that to treat sattributes more like
> pattributes?. in the current set-up they are very similar behind the
> screens: the lexicon.idx file is largely the same as the .avx file and the
> .lexicon mimicks the .avs file, the only real difference being that of
> course .corpus indicates positions and .rng ranges. However, internally
> they are treated very differently, and there is no range-based variant of
> .rvs for instance. But from the looks of it, there is little preventing
> sattributes from being treated mostly like pattributes - and of course,
> there would be major implications when you would try to implement that in
> the current CWB, but when making dramatic changes anyway, would it not be
> possible to look into that?
>
> >>>> - ideally, the CQP tokens would direclty point to indexes in the XML
> files to make it possible to efficiently extract the matching data directly
> from the XML files. An inelegant method would be to add two pattributes for
> this, but would there be any more elegant way to link tokens in CQP to
> ranges in external files?
> >>
> >> Not any that I can think of.
> >
> > Nor I.  But that's not surprising, given that XML itself doesn't have an
> elegant way of linking to external files and is forced to use XPointers or
> other verbose and horrible concoctions.
> >
> > You could store XML IDs of the relevant elements as p-attributes, or
> byte offsets into the XML files (for better efficiency and flexibility).
> None of these solutions is efficient in CWB 3 ? they'll be much better in
> CWB 4 with "raw string" and "integer" attribute types.
>
> Keeping the IDs is what TEITOK (and CorpusWiki) have done from the start,
> and is why results from CQL queries link directly to their result in the
> XML file; however, when showing long lists of results, it would be very
> nice to be able to show the initial XML context rather than the CQP output,
> since CQP does not do spacing, not does it do typesetting. And every
> implementation I tried (including writing a dedicated app) still ends up
> being to slow for internet use: a list of 100 results takes several seconds
> to load, which is not acceptable. So what I was/am looking for is indeed a
> way to store byte-offsets. But I?ll just either put these in a CQP
> pattribute then or in an external index (potentially using the CWB format
> for coherence).
>
>
>
>
> ------------------------------
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>
> End of CWB Digest, Vol 106, Issue 18
> ************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20151120/c9935257/attachment-0001.html>


More information about the CWB mailing list