[CWB] Empty collocation result lists and other issues after upgrade

Thilo Wiertz thilo.wiertz at geographie.uni-freiburg.de
Wed Jan 25 19:03:53 CET 2017


Dear Andrew, dear all,

after upgrading to CQPweb v3.2.26, some functions appear to be broken. For example, after creating a collocation database, the results page is shown and telling me "There are 1,434 different words in your collocation database for...", but the table below does not show anything at all. Also, in the distributions pane, I could previously click on a category to retrieve relevant text passages, now categories are not linked anymore.

I tried running the database upgrade script, which succeeded, but the error remains. Since other changes were made to the server os (package upgrades, security settings, etc.), I can't fully reconstruct the previous state. Any ideas?

Thanks,
Thilo


Am 25.01.2017 um 17:54 schrieb Hardie, Andrew <a.hardie at lancaster.ac.uk>:
> 
> Hi Nikolche
> 
> If I read the tutorial by JMMM correctly, the texts2corpus.py script he supplies is purely for the purpose of merging multiple .vrt files into one.
> 
> So (a) you don't need to do this if you only have one file, you can just go ahead and index; (b) even if you have more than one file, this can be accomplished just as easily with Unix "cat".
> 
> (e.g. "cat folder-with-files/*.vrt > merged-input.vrt")
> 
> As for this error: " No execution mode was defined for this document type: text/plain."
> 
> I really cannot comment on this one without more info. Can you tell me EXACTLY what you did to get this error message? (full list of steps including what you entered on the command line etc.)
> 
> And a final note: indexing via the CWB commandline programs vs. indexing via the CQPweb interface is either/or: you don't need to do both.
> 
> Thanks
> 
> best
> 
> Andrew.
> 
> PS sorry for the slight delay in replying.
> 
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Nikolche Mickoski
> Sent: 22 January 2017 20:41
> To: cwb at sslmit.unibo.it
> Subject: Re: [CWB] Creating and importing Cyrillic corpus in CQPWeb
> 
> Hi Andrew,
> 
> Thank you for the explanation, but unfortunately, I wasn't able to create
> the corpus :(
> 
> I created a single column file in Unix format and inserted it in the test
> folder, but nothing happens when I click texts2corpus.py. 
> 
> I also followed the Corpus Encoding Tutorial, but I got the following error:
> No execution mode was defined for this document type: text/plain.
> 
> It looks like I need help for converting plain text file into single column
> file with the required sentence tags which can be used with CWB.
> 
> Thank you,
> Nikolche
> 
> -----Original Message-----
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On
> Behalf Of cwb-request at sslmit.unibo.it
> Sent: Wednesday, January 18, 2017 5:56 PM
> To: cwb at sslmit.unibo.it
> Subject: CWB Digest, Vol 120, Issue 11
> 
> Send CWB mailing list submissions to
> 	cwb at sslmit.unibo.it
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> 	http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> or, via email, send a message with subject or body 'help' to
> 	cwb-request at sslmit.unibo.it
> 
> You can reach the person managing the list at
> 	cwb-owner at sslmit.unibo.it
> 
> When replying, please edit your Subject line so it is more specific than
> "Re: Contents of CWB digest..."
> 
> 
> Today's Topics:
> 
>   1. Re: Creating and importing Cyrillic corpus in CQPWeb
>      (Hardie, Andrew)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Wed, 18 Jan 2017 16:55:46 +0000
> From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk>
> To: Open source development of the Corpus WorkBench
> 	<cwb at sslmit.unibo.it>
> Subject: Re: [CWB] Creating and importing Cyrillic corpus in CQPWeb
> Message-ID:
> 	<28078EC3FBF1B940A3EF3D0D19BE351D7FC07688 at EX-1-MB2.lancs.local>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi Nikolche,
> 
> OK, first, some notes on the steps you?ve taken so far.
> 
> 
> ?         First the Martinez tutorial ? I didn?t actually know this existed!
> If I remember, I will write to the author at some point to ask if I can
> borrow bits of the text for the official CQPweb manual. It?s an excellent
> introductory guide.
> 
> 
> ?         Second, I have a reasonably complete set of TreeTagger parameter
> sets, but none for Macedonian as it is not available via Schmid?s site, so I
> cannot attempt to diagnose the problems you have been having with it, sorry!
> Also, I?m not familiar myself with MorphAdorner.
> 
> 
> ?         Third, while the Multext East lexicon will give you the language
> resources to build into a POS tagger, I am not sure if it comes with
> software to generate POS tagged / lemmatised output, or what output format
> it generates if it does?
> 
> With that out of the way: the critical point for CQPweb indexing is that the
> data must be in the correct input format.
> 
> The general CWB input format is described on pg 2 of the encoding tutorial:
> http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial.pdf
> 
> The additional requirements for CQPweb are described in my paper on the
> matter in IJCL ? see
> here<http://www.ingentaconnect.com/content/jbp/ijcl/2012/00000017/00000003/a
> rt00004> (canonical link) or
> here<http://www.lancs.ac.uk/staff/hardiea/cqpweb-paper.pdf> (open link) ?
> especially the example on pg 390.
> 
> Basically you need to get your text into the correct columnar format with
> one word per line, with the raw token from the text in col 1, and other
> annotations (tag, lemma etc.) delimited by tags. Then you need to make sure
> that texts have the correct <text id="ID_CODE"> tags before them and </text>
> at the end. (With the XML tags on separate lines).
> 
> All other XML is optional.
> 
> Some taggers will produce the correct columnar format (TreeTagger does) but
> they may not guarantee the correct <text> tags. Other taggers will require
> you to manipulate their output into columnar format.
> 
> For your first experiment in indexing, can I recommend that you try indexing
> a file just with a single ?word? column, and make sure that works properly
> before going on to more complex formats with tags, lemmas, etc? To create
> such a file it is merely necessary to get every word onto a separate line,
> with no whitespace except the line delimiters (in Unix format!). You can do
> this effectively with regular expression global search and replace.
> 
> Once you have a proper input file, you should be able to follow the
> instructions in the ?simple? method of indexing (as specified in the
> tutorial you
> referenced<http://chozelinek.github.io/sacoco/cqpwebsetup.html>)  and get
> your corpus up and running. For a words-only corpus with no XML other than
> <text id=???>?</text>, you can leave the S-attribute and P-attribute
> specification forms empty.
> 
> Hope this helps, but feel free to ask the list again if you have further
> questions, and either I or another reader will answer!
> 
> best
> 
> Andrew.
> 
> 
> 
> 
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On
> Behalf Of Nikolche Mickoski
> Sent: 16 January 2017 17:34
> To: cwb at sslmit.unibo.it
> Subject: [CWB] Creating and importing Cyrillic corpus in CQPWeb
> 
> Hello,
> 
> I?m trying to create a corpus in the CQPWeb for Macedonian language and I
> would like to ask for your help.
> 
> I?ve installed CQPWeb in a box (Esmeralda). I tried to follow CQPweb Admin
> Manual, CWB Encoding Tutorial and Mart?nez tutorial
> (http://chozelinek.github.io/sacoco/cqpwebsetup.html) but in vain. I tried
> to annotate the corpus with TreeTagger but I failed. I was able to parse
> into sentences small texts with MorphAdorner but I still don?t know how I
> can use them with CQPWeb.
> 
> I obtained MULTEXT-East non-commercial lexicon for Macedonian
> (https://www.clarin.si/repository/xmlui/handle/11356/1042) containing over 1
> million tagged lemmas. I?ve extracted Macedonian dump file of Wikipedia from
> dumps.wikimedia.org with Wikipedia Extractor. I did all the preparatory
> work, but I wasn?t able to create the corpus in CQPWeb.
> 
> After I tried everything I could get my hands on, I decided to write to you
> and ask for your help. I really hope that you can spare some time to help me
> with this.
> 
> Thank you very much,
> Nikolche
> 
> Nikolche Mickoski
> Translator/Interpreter
> GSM +389 70 357 406
> nmickoski at gmail.com<mailto:nmickoski at gmail.com>
> 
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL:
> <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20170118/8b3e9507/at
> tachment.html>
> 
> ------------------------------
> 
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> 
> 
> End of CWB Digest, Vol 120, Issue 11
> ************************************
> 
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2016.0.7996 / Virus Database: 4749/13739 - Release Date: 01/09/17
> Internal Virus Database is out of date.
> 
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list