[CWB] Announcement: Another CWB/CQPweb setup in China
Josep M. Fontana
josepm.fontana at upf.edu
Thu Oct 25 15:05:44 CEST 2012
OK. Thanks! With what you said I went to the page for the corpus project
and I found the tagset
(http://www.linguist.is/icelandic_treebank/Tagset). Now I'm a happy camper.
>
> Hi Josep,
>
> The issue here is that the Icelandic corpus on Ray's server have been
> installed as if it had been tagged by the Lancaster tagger combination
> of CLAWS + USAS (which uses the CLAWS7 tagset) whereas in fact it
> hasn't. Couldn't be, in fact, since C7 is a tagset for English not
> Icelandic.
>
> This is my fault, indirectly. Way back when CQPweb was only used here
> at Lancaster, corpus installation had to be done manually, which was a
> very time-consuming process. To speed things up, I created the
> indexing web-forms, which have two settings for p-attributes:
> "default" i.e. assume it has been tagged by CLAWS and USAS, or
> "custom" i.e. specify the p-attributes yourself. In retrospect this
> was clearly the Wrong Thing, as nowhere else but Lancaster is
> CLAWS+USAS the "default", making it too easy for superusers elsewhere
> to do the wrong thing in the web forms. I /am/ going to replace this
> system with something more site-neutral, when I get the time....
>
> Anyway, the upshot: if you leave "default" specified when indexing a
> corpus, then CQPweb will believe it has CLAWS7 tags and USAS semantic
> tags, even if it doesn't. The way to get around this is to ignore what
> CQPweb says the tags are and to look at what they really are (e.g. by
> going to frequency list and looking at a freq list of the
> part-of-speech tag attribute).
>
> http://124.193.83.252/cqp/IcePaHC/freqlist.php?flTable=__entire_corpus&flAtt=pos&flFilterType=begin&flFilterString=&flFreqLimit1=&flFreqLimit2=&pp=50&flOrder=desc&uT=y
>
> Once you know what tags to use, the simple query syntax /will/ work.
> (I just tried *_Q-A*, for instance, and it works. Not that I have any
> idea what Q-A means in this tagset!)
>
> "show +pos" doesn't work because the interface only allows /queries/
> to be specified by the user. Other CQP commands are blocked. (In fact,
> CQPweb /always/ uses show +pos or equivalent, but the tags are
> rendered in the tooltip that pops over the central link of a
> concordance, not in the main concordance itself.)
>
> best
>
> Andrew.
>
> *From:*cwb-bounces at sslmit.unibo.it
> [mailto:cwb-bounces at sslmit.unibo.it] *On Behalf Of *Josep M. Fontana
> *Sent:* 25 October 2012 12:04
> *To:* cwb at sslmit.unibo.it
> *Subject:* Re: [CWB] Announcement: Another CWB/CQPweb setup in China
>
> Hi,
>
> I am a little (or quite) confused about the syntax of CQPweb queries
> (simple query language). I went to the wonderful resource Ray Wu has
> made available so that I could see how it works since we are in the
> process of installing CQPweb as an interface for our corpora. I wasn't
> able to complete any search using the simple query language, though.
> I'm sure it is something very simple that I am missing. From what I
> understand reading the document 'simple query language syntax', I
> should be able to do the following in the simple query mode:
>
> _JJ _NN1
>
> which would supposedly look for sequences of an adjective followed by
> noun according to the CLAWS tag set.
>
> OK, I'm conducting the searches in the Old Icelandic Corpus which has
> been supposedly tagged using the CLAWS7 tagset (according to the
> information in "View corpus metadata". When I do this, however, I get
> a message saying "Your query had no results. There are no matches for
> your query." This is very puzzling because you would imagine that
> there would be occurrences of adjectives followed by nouns. Doing it
> the opposite order (_NN1 _JJ) gives me the same results. What is even
> more puzzling is that I also get nothing using single POS labels such
> as _NN1 by itself or _JJ.
>
> Am I doing something wrong or is this due to the fact that this
> particular corpus uses a completely different tagset? When you access
> a CQPWeb corpus, is there any way to retrieve the tags that have been
> used in the corpus? The only relevant info I find in this corpus is
> the link to the CLAWS7 tagset but, as I said, this doesn't seem to be
> the right information. Going into the CQP syntax mode and doing "show
> +pos" doesn't work.
>
>
> JM
>
> Dear members,
>
> We are pleased to announce another CWB/CQPweb setup in China and
> we dub it BFSU CQPweb. It is closely modelled after Hardie's own
> (sorry Andrew, we're badly in need of imagination) and currently
> features more than 20 corpora, including two Brown family cousins
> (CLOB and Crown) developed at Beijing Foreign Studies Unversity by
> Dr. Xu Jiajing and Professor Liang Maocheng.
>
> You may access it from http://124.193.83.252/cqp/ using test/test
> as username/password.
>
> We'd like to take this opportunity to thank the CWB team for their
> wonderful work and generosity. It is great fun to build our work
> on their shoulders.
>
> Best,
> Ray
>
>
>
>
>
>
> _______________________________________________
>
> CWB mailing list
>
> CWB at sslmit.unibo.it <mailto:CWB at sslmit.unibo.it>
>
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121025/971e318b/attachment-0001.html>
More information about the CWB
mailing list