[CWB] cwb-huffcode issues warning during cwb-encode

Martí Quixal marti.quixal at gmail.com
Thu Nov 1 22:24:34 CET 2012


Hi,

I found it... it was much more simple (and silly) than what I thought.
First there was one p-attribute I was not declaring, and second I was
simply pointing to the wrong input file (an older version of the corpus
with less data....).
:o|

Myabe there was something else cause I was like a mad man changing and
trying and re-typing. Bottom line is that I finally solved it.
:-))

Thanks for your patience!
Marti


On Thu, Nov 1, 2012 at 3:08 PM, <cwb-request at sslmit.unibo.it> wrote:

> Send CWB mailing list submissions to
>         cwb at sslmit.unibo.it
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> or, via email, send a message with subject or body 'help' to
>         cwb-request at sslmit.unibo.it
>
> You can reach the person managing the list at
>         cwb-owner at sslmit.unibo.it
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of CWB digest..."
>
>
> Today's Topics:
>
>    1. cwb-huffcode issues warning during cwb-encode (Mart? Quixal)
>    2. Re: cwb-huffcode issues warning during cwb-encode (Hardie, Andrew)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Thu, 1 Nov 2012 14:17:04 -0500
> From: Mart? Quixal <marti.quixal at gmail.com>
> To: cwb at sslmit.unibo.it
> Subject: [CWB] cwb-huffcode issues warning during cwb-encode
> Message-ID:
>         <CAMtTwm-wsWYFo5hViUFpoP0QzG40Uaki6gEXRSU7s3hjMRs=
> rw at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hi Andrew, hi all,
>
> could you suggest possible sources of error? I have checked for spaces that
> should be tabs, and it does not seem to be the case. There are tokens for
> which the tense attribute is empty. But that usually resulted into
> __UNDEF__ or something like that. Isn't that the case any more?
>
> Best,
> Marti
>
>
> On Thu, Nov 1, 2012 at 1:56 PM, <cwb-request at sslmit.unibo.it> wrote:
>
> > Send CWB mailing list submissions to
> >         cwb at sslmit.unibo.it
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >         http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> > or, via email, send a message with subject or body 'help' to
> >         cwb-request at sslmit.unibo.it
> >
> > You can reach the person managing the list at
> >         cwb-owner at sslmit.unibo.it
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of CWB digest..."
> >
> >
> > Today's Topics:
> >
> >    1. Re: Illegal division by zero (Stefan Evert)
> >    2. Re: Illegal division by zero (Trevor Jenkins)
> >    3. cwb-huffcode issues warning during cwb-encode (Mart? Quixal)
> >    4. Re: cwb-huffcode issues warning during cwb-encode (Hardie, Andrew)
> >    5. Re: cwb-huffcode issues warning during cwb-encode (Mart? Quixal)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Thu, 1 Nov 2012 15:30:53 +0100
> > From: Stefan Evert <stefanML at collocations.de>
> > To: Open source development of the Corpus WorkBench
> >         <cwb at sslmit.unibo.it>
> > Subject: Re: [CWB] Illegal division by zero
> > Message-ID: <CDFFCA68-5182-4367-97A1-CFCEC83ADC00 at collocations.de>
> > Content-Type: text/plain; charset=iso-8859-1
> >
> >
> > > The Perl scripts that do BNCweb installation gather data from the BNC
> > file headers to build this database (among other things). So, alas, there
> > is only one solution: You go back to setup and repeat whatever steps got
> > missed or went wrong, so that the headerinfo table gets filled in!
> >
> > I've experienced similar errors -- e.g. distribution analysis shows all
> > zeroes -- trying to get BNCweb installed on a university Web server I
> > didn't have any access to.
> >
> > A likely cause is that the file exchange between the MySQL server and
> > BNCweb doesn't work properly.  Keep in mind that
> >
> >   - BNCweb and the MySQL server must be running on the same machine
> >   - the exchange directory must be readable and writable both by the Web
> > server running BNCweb and by the MySQL server
> >   - the bncweb account in MySQL needs the necessary permissions to access
> > disk files (that's one of the highest permission in the system and the
> > MySQL manual might have advised you not to grant it to users ...)
> >
> > Some versions of MySQL, depending on their configuration settings, may
> > have additional limitations, such as disabling file access entirely or
> > requiring that the exchange directory is world-readable and
> world-writable
> > (i.e. _all_ users have full access there).
> >
> > Hope this helpsm
> > Stefan
> >
> > ------------------------------
> >
> > Message: 2
> > Date: Thu, 1 Nov 2012 14:58:33 +0000
> > From: Trevor Jenkins <trevor.jenkins at suneidesis.com>
> > To: Open source development of the Corpus WorkBench
> >         <cwb at sslmit.unibo.it>
> > Subject: Re: [CWB] Illegal division by zero
> > Message-ID: <601BF14C-3317-4043-97C1-2B6F83C80AA2 at suneidesis.com>
> > Content-Type: text/plain; charset="iso-8859-1"
> >
> > On 1 Nov 2012, at 14:04, "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> > wrote:
> >
> > > The Perl scripts that do BNCweb installation gather data from the BNC
> > file headers to build this database (among other things). So, alas, there
> > is only one solution: You go back to setup and repeat whatever steps got
> > missed or went wrong, so that the headerinfo table gets filled in!
> >
> > Whilst the user should ensure that their data is correct perhaps a sanity
> > check in those perl scripts is needed; "oy mate, you've not set the
> number
> > of words per sentence properly".
> >
> > Regards, Trevor.
> >
> > <>< Re: deemed!
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> >
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/3bfb7513/attachment-0001.html
> > >
> >
> > ------------------------------
> >
> > Message: 3
> > Date: Thu, 1 Nov 2012 13:13:30 -0500
> > From: Mart? Quixal <marti.quixal at gmail.com>
> > To: cwb at sslmit.unibo.it
> > Subject: [CWB] cwb-huffcode issues warning during cwb-encode
> > Message-ID:
> >         <CAMtTwm_KFkFzecTFHkCOKsbhG-8r4EYZi_77YKG6uZfrmdaY=
> > g at mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Hi,
> >
> > I am compiling a corpus with the following bash file:
> >
> > echo "Deleting older version of CWB-formatted files in data directory"
> > echo ""
> > rm /WebCorpora/data/fall/*
> >
> > echo "Encoding corpus with CWB tools"
> > echo ""
> >
> > cwb-encode -d /WebCorpora/data/fall/ -R /WebCorpora/registry/fall -f
> > /WebCorpora/upload/spintxFall4web.cpr -xsB -P lemma -P pos -P punct -P
> > start -P end -P tense -P mood -P numb -P pers -P gend -V text:0+id -V
> > speaker:0+type -V lang:0+code
> >
> > cwb-make -M 256 -r /WebCorpora/registry/ FALL
> > echo "DONE!"
> > date
> >
> > And I get the following message:
> >
> > WARNING (SHELL CMD '/usr/local/bin/cwb-huffcode -r
> '/WebCorpora/registry/'
> > -T -P tense FALL'):
> > -> Warning on stderr:
> > -> Problem: No output generated -- no items?
> > CWB::Indexer: Creation of component tense/CIS
> > (/WebCorpora/data/fall/tense.huf) failed (aborted).
> >  at /opt/local/bin/cwb-make line 76
> >
> >
> > However, it seems that the corpus has been encoded. A couple of basic
> > queries I did did work. Should I be worried or warned about future
> > problems? Or I can just ignore this message?
> >
> > Best regards,
> > Marti
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> >
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/27c58539/attachment-0001.html
> > >
> >
> > ------------------------------
> >
> > Message: 4
> > Date: Thu, 1 Nov 2012 18:27:50 +0000
> > From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> > To: Open source development of the Corpus WorkBench
> >         <cwb at sslmit.unibo.it>
> > Subject: Re: [CWB] cwb-huffcode issues warning during cwb-encode
> > Message-ID:
> >         <28078EC3FBF1B940A3EF3D0D19BE351D0ECBA1 at EX-0-MB1.lancs.local>
> > Content-Type: text/plain; charset="utf-8"
> >
> > That message indicates that there was no data for the tense attribute.
> >  That may mean that errors won?t show up till you query that attribute.
> >
> > Usually this means that something is wrong  in the original file.
> >
> > best
> >
> > Andrew.
> >
> > From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it]
> On
> > Behalf Of Mart? Quixal
> > Sent: 01 November 2012 18:14
> > To: cwb at sslmit.unibo.it
> > Subject: [CWB] cwb-huffcode issues warning during cwb-encode
> >
> > Hi,
> >
> > I am compiling a corpus with the following bash file:
> >
> > echo "Deleting older version of CWB-formatted files in data directory"
> > echo ""
> > rm /WebCorpora/data/fall/*
> >
> > echo "Encoding corpus with CWB tools"
> > echo ""
> >
> > cwb-encode -d /WebCorpora/data/fall/ -R /WebCorpora/registry/fall -f
> > /WebCorpora/upload/spintxFall4web.cpr -xsB -P lemma -P pos -P punct -P
> > start -P end -P tense -P mood -P numb -P pers -P gend -V text:0+id -V
> > speaker:0+type -V lang:0+code
> >
> > cwb-make -M 256 -r /WebCorpora/registry/ FALL
> > echo "DONE!"
> > date
> >
> > And I get the following message:
> >
> > WARNING (SHELL CMD '/usr/local/bin/cwb-huffcode -r
> '/WebCorpora/registry/'
> > -T -P tense FALL'):
> > -> Warning on stderr:
> > -> Problem: No output generated -- no items?
> > CWB::Indexer: Creation of component tense/CIS
> > (/WebCorpora/data/fall/tense.huf) failed (aborted).
> >  at /opt/local/bin/cwb-make line 76
> >
> >
> > However, it seems that the corpus has been encoded. A couple of basic
> > queries I did did work. Should I be worried or warned about future
> > problems? Or I can just ignore this message?
> >
> > Best regards,
> > Marti
> >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> >
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/69ed3b09/attachment-0001.html
> > >
> >
> > ------------------------------
> >
> > Message: 5
> > Date: Thu, 1 Nov 2012 13:55:36 -0500
> > From: Mart? Quixal <marti.quixal at gmail.com>
> > To: cwb at sslmit.unibo.it
> > Subject: Re: [CWB] cwb-huffcode issues warning during cwb-encode
> > Message-ID:
> >         <
> > CAMtTwm8kUHxki0tffingU4trcEzkpU2jCHmpDGCkRtWuLGco4g at mail.gmail.com>
> > Content-Type: text/plain; charset="utf-8"
> >
> > Hi,
> >
> > I just realised that I my queries using the p-attribute word result in no
> > matches at all.
> >
> > For instance (using cqp in command line):
> >
> > FALL> "a";
> > 0 matches.
> >
> > Could that be related to my error mentioned in the previous message?
> >
> > Best,
> > Marti
> >
> >
> > On Thu, Nov 1, 2012 at 1:13 PM, Mart? Quixal <marti.quixal at gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > I am compiling a corpus with the following bash file:
> > >
> > > echo "Deleting older version of CWB-formatted files in data directory"
> > > echo ""
> > >  rm /WebCorpora/data/fall/*
> > >
> > > echo "Encoding corpus with CWB tools"
> > > echo ""
> > >
> > > cwb-encode -d /WebCorpora/data/fall/ -R /WebCorpora/registry/fall -f
> > > /WebCorpora/upload/spintxFall4web.cpr -xsB -P lemma -P pos -P punct -P
> > > start -P end -P tense -P mood -P numb -P pers -P gend -V text:0+id -V
> > > speaker:0+type -V lang:0+code
> > >
> > > cwb-make -M 256 -r /WebCorpora/registry/ FALL
> > > echo "DONE!"
> > > date
> > >
> > > And I get the following message:
> > >
> > > WARNING (SHELL CMD '/usr/local/bin/cwb-huffcode -r
> > '/WebCorpora/registry/'
> > > -T -P tense FALL'):
> > > -> Warning on stderr:
> > > -> Problem: No output generated -- no items?
> > > CWB::Indexer: Creation of component tense/CIS
> > > (/WebCorpora/data/fall/tense.huf) failed (aborted).
> > >  at /opt/local/bin/cwb-make line 76
> > >
> > >
> > > However, it seems that the corpus has been encoded. A couple of basic
> > > queries I did did work. Should I be worried or warned about future
> > > problems? Or I can just ignore this message?
> > >
> > > Best regards,
> > > Marti
> > >
> > >
> > -------------- next part --------------
> > An HTML attachment was scrubbed...
> > URL: <
> >
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/eff19f22/attachment.html
> > >
> >
> > ------------------------------
> >
> > _______________________________________________
> > CWB mailing list
> > CWB at sslmit.unibo.it
> > http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> >
> >
> > End of CWB Digest, Vol 71, Issue 3
> > **********************************
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/7542f1f8/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Thu, 1 Nov 2012 20:08:12 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
>         <cwb at sslmit.unibo.it>
> Subject: Re: [CWB] cwb-huffcode issues warning during cwb-encode
> Message-ID:
>         <28078EC3FBF1B940A3EF3D0D19BE351D0ECC0A at EX-0-MB1.lancs.local>
> Content-Type: text/plain; charset="utf-8"
>
> IT?s actually rather difficult to make suggestions, without more
> information to go on. Could you perhaps experiment with cwb-decode, and see
> whether the data contained within your p-attributes is what you expect it
> to be?
>
> best
>
> Andrew.
>
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On
> Behalf Of Mart? Quixal
> Sent: 01 November 2012 19:17
> To: cwb at sslmit.unibo.it
> Subject: [CWB] cwb-huffcode issues warning during cwb-encode
>
> Hi Andrew, hi all,
>
> could you suggest possible sources of error? I have checked for spaces
> that should be tabs, and it does not seem to be the case. There are tokens
> for which the tense attribute is empty. But that usually resulted into
> __UNDEF__ or something like that. Isn't that the case any more?
>
> Best,
> Marti
>
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/a332d73e/attachment.html
> >
>
> ------------------------------
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>
> End of CWB Digest, Vol 71, Issue 4
> **********************************
>



-- 
Martí Quixal
Computational Linguist & Educational Technologist
http://www.iqubo.org/quixal
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121101/386bfdd3/attachment-0001.html>


More information about the CWB mailing list