[CWB] Announcement: Another CWB/CQPweb setup in China

Josep M. Fontana josepm.fontana at upf.edu
Thu Oct 25 15:05:44 CEST 2012


OK. Thanks! With what you said I went to the page for the corpus project 
and I found the tagset 
(http://www.linguist.is/icelandic_treebank/Tagset). Now I'm a happy camper.
>
> Hi Josep,
>
> The issue here is that the Icelandic corpus on Ray's server have been 
> installed as if it had been tagged by the Lancaster tagger combination 
> of CLAWS + USAS (which uses the CLAWS7 tagset) whereas in fact it 
> hasn't. Couldn't be, in fact, since C7 is a tagset for English not 
> Icelandic.
>
> This is my fault, indirectly. Way back when CQPweb was only used here 
> at Lancaster, corpus installation had to be done manually, which was a 
> very time-consuming process. To speed things up, I created the 
> indexing web-forms, which have two settings for p-attributes: 
> "default" i.e. assume it has been tagged by CLAWS and USAS, or 
> "custom" i.e. specify the p-attributes yourself. In retrospect this 
> was clearly the Wrong Thing, as nowhere else but Lancaster is 
> CLAWS+USAS the "default", making it too easy for superusers elsewhere 
> to do the wrong thing in the web forms. I /am/ going to replace this 
> system with something more site-neutral, when I get the time....
>
> Anyway, the upshot: if you leave "default" specified when indexing a 
> corpus, then CQPweb will believe it has CLAWS7 tags and USAS semantic 
> tags, even if it doesn't. The way to get around this is to ignore what 
> CQPweb says the tags are and to look at what they really are (e.g. by 
> going to frequency list and looking at a freq list of the 
> part-of-speech tag attribute).
>
> http://124.193.83.252/cqp/IcePaHC/freqlist.php?flTable=__entire_corpus&flAtt=pos&flFilterType=begin&flFilterString=&flFreqLimit1=&flFreqLimit2=&pp=50&flOrder=desc&uT=y
>
> Once you know what tags to use, the simple query syntax /will/ work. 
> (I just tried *_Q-A*, for instance, and it works. Not that I have any 
> idea what Q-A means in this tagset!)
>
> "show +pos" doesn't work because the interface only allows /queries/ 
> to be specified by the user. Other CQP commands are blocked. (In fact, 
> CQPweb /always/ uses show +pos or equivalent, but the tags are 
> rendered in the tooltip that pops over the central link of a 
> concordance, not in the main concordance itself.)
>
> best
>
> Andrew.
>
> *From:*cwb-bounces at sslmit.unibo.it 
> [mailto:cwb-bounces at sslmit.unibo.it] *On Behalf Of *Josep M. Fontana
> *Sent:* 25 October 2012 12:04
> *To:* cwb at sslmit.unibo.it
> *Subject:* Re: [CWB] Announcement: Another CWB/CQPweb setup in China
>
> Hi,
>
> I am a little (or quite) confused about the syntax of CQPweb queries 
> (simple query language). I went to the wonderful resource Ray Wu has 
> made available so that I could see how it works since we are in the 
> process of installing CQPweb as an interface for our corpora. I wasn't 
> able to complete any search using the simple query language, though. 
> I'm sure it is something very simple that I am missing.  From what I 
> understand reading the document 'simple query language syntax', I 
> should be able to do the following in the simple query mode:
>
> _JJ _NN1
>
> which would supposedly look for sequences of an adjective followed by 
> noun according to the CLAWS tag set.
>
> OK, I'm conducting the searches in the Old Icelandic Corpus which has 
> been supposedly tagged using the CLAWS7 tagset (according to the 
> information in "View corpus metadata". When I do this, however, I get 
> a message saying "Your query had no results. There are no matches for 
> your query." This is very puzzling because you would imagine that 
> there would be occurrences of adjectives followed by nouns. Doing it 
> the opposite order (_NN1 _JJ) gives me the same results. What is even 
> more puzzling is that I also get nothing using single POS labels such 
> as _NN1 by itself or _JJ.
>
> Am I doing something wrong or is this due to the fact that this 
> particular corpus uses a completely different tagset? When you access 
> a CQPWeb corpus, is there any way to retrieve the tags that have been 
> used in the corpus? The only relevant info I find in this corpus is 
> the link to the CLAWS7 tagset but, as I said, this doesn't seem to be 
> the right information. Going into the CQP syntax mode and doing "show 
> +pos" doesn't work.
>
>
> JM
>
>     Dear members,
>
>     We are pleased to announce another CWB/CQPweb setup in China and
>     we dub it BFSU CQPweb. It is closely modelled after Hardie's own
>     (sorry Andrew, we're badly in need of imagination) and currently
>     features more than 20 corpora, including two Brown family cousins
>     (CLOB and Crown) developed at Beijing Foreign Studies Unversity by
>     Dr. Xu Jiajing and Professor Liang Maocheng.
>
>     You may access it from http://124.193.83.252/cqp/ using test/test
>     as username/password.
>
>     We'd like to take this opportunity to thank the CWB team for their
>     wonderful work and generosity. It is great fun to build our work
>     on their shoulders.
>
>     Best,
>     Ray
>
>
>
>
>
>
>     _______________________________________________
>
>     CWB mailing list
>
>     CWB at sslmit.unibo.it  <mailto:CWB at sslmit.unibo.it>
>
>     http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121025/971e318b/attachment-0001.html>


More information about the CWB mailing list