[CWB] Announcement: Another CWB/CQPweb setup in China

Ray Wu liangpingwu at 126.com
Thu Oct 25 15:03:08 CEST 2012


hi Andrew,

Thanks for your insight. Actually the server is owned by Beijing Foreign Studies University, so are the corpora. I'm kind of technical assistance. Ok, we will deal with it accordingly.


Best,
Ray


在 2012-10-25 19:27:58,"Hardie, Andrew" <a.hardie at lancaster.ac.uk> 写道:


Hi Josep,

 

The issue here is that the Icelandic corpus on Ray’s server have been installed as if it had been tagged by the Lancaster tagger combination of CLAWS + USAS (which uses the CLAWS7 tagset) whereas in fact it hasn’t. Couldn’t be, in fact, since C7 is a tagset for English not Icelandic.

 

This is my fault, indirectly. Way back when CQPweb was only used here at Lancaster, corpus installation had to be done manually, which was a very time-consuming process. To speed things up, I created the indexing web-forms, which have two settings for p-attributes: “default” i.e. assume it has been tagged by CLAWS and USAS, or “custom” i.e. specify the p-attributes yourself. In retrospect this was clearly the Wrong Thing, as nowhere else but Lancaster is CLAWS+USAS the “default”, making it too easy for superusers elsewhere to do the wrong thing in the web forms. I am going to replace this system with something more site-neutral, when I get the time....

 

Anyway, the upshot: if you leave “default” specified when indexing a corpus, then CQPweb will believe it has CLAWS7 tags and USAS semantic tags, even if it doesn’t. The way to get around this is to ignore what CQPweb says the tags are and to look at what they really are (e.g. by going to frequency list and looking at a freq list of the part-of-speech tag attribute).

http://124.193.83.252/cqp/IcePaHC/freqlist.php?flTable=__entire_corpus&flAtt=pos&flFilterType=begin&flFilterString=&flFreqLimit1=&flFreqLimit2=&pp=50&flOrder=desc&uT=y

 

Once you know what tags to use, the simple query syntax will work. (I just tried _Q-A, for instance, and it works. Not that I have any idea what Q-A means in this tagset!)

 

“show +pos” doesn’t work because the interface only allows queries to be specified by the user. Other CQP commands are blocked. (In fact, CQPweb always uses show +pos or equivalent, but the tags are rendered in the tooltip that pops over the central link of a concordance, not in the main concordance itself.)

 

best

 

Andrew.

 

 

 

From:cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Josep M. Fontana
Sent: 25 October 2012 12:04
To:cwb at sslmit.unibo.it
Subject: Re: [CWB] Announcement: Another CWB/CQPweb setup in China

 

Hi,

I am a little (or quite) confused about the syntax of CQPweb queries (simple query language). I went to the wonderful resource Ray Wu has made available so that I could see how it works since we are in the process of installing CQPweb as an interface for our corpora. I wasn't able to complete any search using the simple query language, though. I'm sure it is something very simple that I am missing.  From what I understand reading the document 'simple query language syntax', I should be able to do the following in the simple query mode:

_JJ _NN1

which would supposedly look for sequences of an adjective followed by noun according to the CLAWS tag set.

OK, I'm conducting the searches in the Old Icelandic Corpus which has been supposedly tagged using the CLAWS7 tagset (according to the information in "View corpus metadata". When I do this, however, I get a message saying "Your query had no results. There are no matches for your query." This is very puzzling because you would imagine that there would be occurrences of adjectives followed by nouns. Doing it the opposite order (_NN1 _JJ) gives me the same results. What is even more puzzling is that I also get nothing using single POS labels such as _NN1 by itself or _JJ.

Am I doing something wrong or is this due to the fact that this particular corpus uses a completely different tagset? When you access a CQPWeb corpus, is there any way to retrieve the tags that have been used in the corpus? The only relevant info I find in this corpus is the link to the CLAWS7 tagset but, as I said, this doesn't seem to be the right information. Going into the CQP syntax mode and doing "show +pos" doesn't work.


JM

Dear members,

We are pleased to announce another CWB/CQPweb setup in China and we dub it BFSU CQPweb. It is closely modelled after Hardie's own (sorry Andrew, we're badly in need of imagination) and currently features more than 20 corpora, including two Brown family cousins (CLOB and Crown) developed at Beijing Foreign Studies Unversity by Dr. Xu Jiajing and Professor Liang Maocheng.

You may access it from http://124.193.83.252/cqp/ using test/test as username/password.

We'd like to take this opportunity to thank the CWB team for their wonderful work and generosity. It is great fun to build our work on their shoulders.

Best,
Ray








_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it
http://devel.sslmit.unibo.it/mailman/listinfo/cwb

 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121025/993c6deb/attachment.html>


More information about the CWB mailing list