[CWB] cwb-huffcode issues warning during cwb-encode

Martí Quixal marti.quixal at gmail.com
Thu Nov 1 20:17:04 CET 2012


Hi Andrew, hi all,

could you suggest possible sources of error? I have checked for spaces that
should be tabs, and it does not seem to be the case. There are tokens for
which the tense attribute is empty. But that usually resulted into
__UNDEF__ or something like that. Isn't that the case any more?

Best,
Marti


On Thu, Nov 1, 2012 at 1:56 PM, <cwb-request at sslmit.unibo.it> wrote:

> Send CWB mailing list submissions to
>         cwb at sslmit.unibo.it
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> or, via email, send a message with subject or body 'help' to
>         cwb-request at sslmit.unibo.it
>
> You can reach the person managing the list at
>         cwb-owner at sslmit.unibo.it
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of CWB digest..."
>
>
> Today's Topics:
>
>    1. Re: Illegal division by zero (Stefan Evert)
>    2. Re: Illegal division by zero (Trevor Jenkins)
>    3. cwb-huffcode issues warning during cwb-encode (Mart? Quixal)
>    4. Re: cwb-huffcode issues warning during cwb-encode (Hardie, Andrew)
>    5. Re: cwb-huffcode issues warning during cwb-encode (Mart? Quixal)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 1 Nov 2012 15:30:53 +0100
> From: Stefan Evert <stefanML at collocations.de>
> To: Open source development of the Corpus WorkBench
>         <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] Illegal division by zero
> Message-ID: <CDFFCA68-5182-4367-97A1-CFCEC83ADC00 at collocations.de>
> Content-Type: text/plain; charset=iso-8859-1
>
>
> > The Perl scripts that do BNCweb installation gather data from the BNC
> file headers to build this database (among other things). So, alas, there
> is only one solution: You go back to setup and repeat whatever steps got
> missed or went wrong, so that the headerinfo table gets filled in!
>
> I've experienced similar errors -- e.g. distribution analysis shows all
> zeroes -- trying to get BNCweb installed on a university Web server I
> didn't have any access to.
>
> A likely cause is that the file exchange between the MySQL server and
> BNCweb doesn't work properly.  Keep in mind that
>
>   - BNCweb and the MySQL server must be running on the same machine
>   - the exchange directory must be readable and writable both by the Web
> server running BNCweb and by the MySQL server
>   - the bncweb account in MySQL needs the necessary permissions to access
> disk files (that's one of the highest permission in the system and the
> MySQL manual might have advised you not to grant it to users ...)
>
> Some versions of MySQL, depending on their configuration settings, may
> have additional limitations, such as disabling file access entirely or
> requiring that the exchange directory is world-readable and world-writable
> (i.e. _all_ users have full access there).
>
> Hope this helpsm
> Stefan
>
> ------------------------------
>
> Message: 2
> Date: Thu, 1 Nov 2012 14:58:33 +0000
> From: Trevor Jenkins <trevor.jenkins at suneidesis.com>
> To: Open source development of the Corpus WorkBench
>         <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] Illegal division by zero
> Message-ID: <601BF14C-3317-4043-97C1-2B6F83C80AA2 at suneidesis.com>
> Content-Type: text/plain; charset="iso-8859-1"
>
> On 1 Nov 2012, at 14:04, "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> wrote:
>
> > The Perl scripts that do BNCweb installation gather data from the BNC
> file headers to build this database (among other things). So, alas, there
> is only one solution: You go back to setup and repeat whatever steps got
> missed or went wrong, so that the headerinfo table gets filled in!
>
> Whilst the user should ensure that their data is correct perhaps a sanity
> check in those perl scripts is needed; "oy mate, you've not set the number
> of words per sentence properly".
>
> Regards, Trevor.
>
> <>< Re: deemed!
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/3bfb7513/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Thu, 1 Nov 2012 13:13:30 -0500
> From: Mart? Quixal <marti.quixal at gmail.com>
> To: cwb at sslmit.unibo.it
> Subject: [CWB] cwb-huffcode issues warning during cwb-encode
> Message-ID:
>         <CAMtTwm_KFkFzecTFHkCOKsbhG-8r4EYZi_77YKG6uZfrmdaY=
> g at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> I am compiling a corpus with the following bash file:
>
> echo "Deleting older version of CWB-formatted files in data directory"
> echo ""
> rm /WebCorpora/data/fall/*
>
> echo "Encoding corpus with CWB tools"
> echo ""
>
> cwb-encode -d /WebCorpora/data/fall/ -R /WebCorpora/registry/fall -f
> /WebCorpora/upload/spintxFall4web.cpr -xsB -P lemma -P pos -P punct -P
> start -P end -P tense -P mood -P numb -P pers -P gend -V text:0+id -V
> speaker:0+type -V lang:0+code
>
> cwb-make -M 256 -r /WebCorpora/registry/ FALL
> echo "DONE!"
> date
>
> And I get the following message:
>
> WARNING (SHELL CMD '/usr/local/bin/cwb-huffcode -r '/WebCorpora/registry/'
> -T -P tense FALL'):
> -> Warning on stderr:
> -> Problem: No output generated -- no items?
> CWB::Indexer: Creation of component tense/CIS
> (/WebCorpora/data/fall/tense.huf) failed (aborted).
>  at /opt/local/bin/cwb-make line 76
>
>
> However, it seems that the corpus has been encoded. A couple of basic
> queries I did did work. Should I be worried or warned about future
> problems? Or I can just ignore this message?
>
> Best regards,
> Marti
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/27c58539/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 4
> Date: Thu, 1 Nov 2012 18:27:50 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
>         <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] cwb-huffcode issues warning during cwb-encode
> Message-ID:
>         <28078EC3FBF1B940A3EF3D0D19BE351D0ECBA1 at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> That message indicates that there was no data for the tense attribute.
>  That may mean that errors won?t show up till you query that attribute.
>
> Usually this means that something is wrong  in the original file.
>
> best
>
> Andrew.
>
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On
> Behalf Of Mart? Quixal
> Sent: 01 November 2012 18:14
> To: cwb at sslmit.unibo.it
> Subject: [CWB] cwb-huffcode issues warning during cwb-encode
>
> Hi,
>
> I am compiling a corpus with the following bash file:
>
> echo "Deleting older version of CWB-formatted files in data directory"
> echo ""
> rm /WebCorpora/data/fall/*
>
> echo "Encoding corpus with CWB tools"
> echo ""
>
> cwb-encode -d /WebCorpora/data/fall/ -R /WebCorpora/registry/fall -f
> /WebCorpora/upload/spintxFall4web.cpr -xsB -P lemma -P pos -P punct -P
> start -P end -P tense -P mood -P numb -P pers -P gend -V text:0+id -V
> speaker:0+type -V lang:0+code
>
> cwb-make -M 256 -r /WebCorpora/registry/ FALL
> echo "DONE!"
> date
>
> And I get the following message:
>
> WARNING (SHELL CMD '/usr/local/bin/cwb-huffcode -r '/WebCorpora/registry/'
> -T -P tense FALL'):
> -> Warning on stderr:
> -> Problem: No output generated -- no items?
> CWB::Indexer: Creation of component tense/CIS
> (/WebCorpora/data/fall/tense.huf) failed (aborted).
>  at /opt/local/bin/cwb-make line 76
>
>
> However, it seems that the corpus has been encoded. A couple of basic
> queries I did did work. Should I be worried or warned about future
> problems? Or I can just ignore this message?
>
> Best regards,
> Marti
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/69ed3b09/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 5
> Date: Thu, 1 Nov 2012 13:55:36 -0500
> From: Mart? Quixal <marti.quixal at gmail.com>
> To: cwb at sslmit.unibo.it
> Subject: Re: [CWB] cwb-huffcode issues warning during cwb-encode
> Message-ID:
>         <
> CAMtTwm8kUHxki0tffingU4trcEzkpU2jCHmpDGCkRtWuLGco4g at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi,
>
> I just realised that I my queries using the p-attribute word result in no
> matches at all.
>
> For instance (using cqp in command line):
>
> FALL> "a";
> 0 matches.
>
> Could that be related to my error mentioned in the previous message?
>
> Best,
> Marti
>
>
> On Thu, Nov 1, 2012 at 1:13 PM, Mart? Quixal <marti.quixal at gmail.com>
> wrote:
>
> > Hi,
> >
> > I am compiling a corpus with the following bash file:
> >
> > echo "Deleting older version of CWB-formatted files in data directory"
> > echo ""
> >  rm /WebCorpora/data/fall/*
> >
> > echo "Encoding corpus with CWB tools"
> > echo ""
> >
> > cwb-encode -d /WebCorpora/data/fall/ -R /WebCorpora/registry/fall -f
> > /WebCorpora/upload/spintxFall4web.cpr -xsB -P lemma -P pos -P punct -P
> > start -P end -P tense -P mood -P numb -P pers -P gend -V text:0+id -V
> > speaker:0+type -V lang:0+code
> >
> > cwb-make -M 256 -r /WebCorpora/registry/ FALL
> > echo "DONE!"
> > date
> >
> > And I get the following message:
> >
> > WARNING (SHELL CMD '/usr/local/bin/cwb-huffcode -r
> '/WebCorpora/registry/'
> > -T -P tense FALL'):
> > -> Warning on stderr:
> > -> Problem: No output generated -- no items?
> > CWB::Indexer: Creation of component tense/CIS
> > (/WebCorpora/data/fall/tense.huf) failed (aborted).
> >  at /opt/local/bin/cwb-make line 76
> >
> >
> > However, it seems that the corpus has been encoded. A couple of basic
> > queries I did did work. Should I be worried or warned about future
> > problems? Or I can just ignore this message?
> >
> > Best regards,
> > Marti
> >
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/eff19f22/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>
> End of CWB Digest, Vol 71, Issue 3
> **********************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/7542f1f8/attachment.html>


More information about the CWB mailing list