[CWB] Problem encoding corpus with POS tags

Albert Gatt albert.gatt at um.edu.mt
Mon Nov 5 17:17:25 CET 2012


It seems that POS is indeed set as primary annotation. When I go to "Manage
Annotation", I see "part of speech tag" as the primary (this is the
description i assigned when I set "pos" to be the primary annotation at the
installation stage).

Another observation: when I search for, say "word_NN", I get no results. I
do get results when I search for "word_*", and when I download the result
file, I see the POS. I don't know if that's at all useful to localise the
problem.

albert


On 5 November 2012 15:34, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:

> Hi Albert,
>
> You may need to check whether pos has been configured properly as primary
> annotation.
>
> As a superuser, go to the main corpus search page then on the menu select
> > Manage Annotation. See if the "Primary annotation" slot has POS selected.
> If not, change and update, then it should work.
>
> If, on the other hand, pos *IS* properly selected on that screen, let me
> know, and I'll look into what else might be causing the problem.
>
> (I am not sure why sometimes the primary annotation is not selected
> correctly at index time. A bug, of course, but none one I've managed to
> track down yet as it seems to be intermittent. I'll work it out eventually.)
>
> best
>
> Andrew.
>
> ==========================
> From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On
> Behalf Of Albert Gatt
> Sent: 05 November 2012 14:09
> To: cwb at sslmit.unibo.it
> Subject: [CWB] Problem encoding corpus with POS tags
>
> I'm trying to install a corpus which has word + POS, via CQPWeb. An
> example of the data is shown below:
>
> <text id="lh1">
> <s id="0">
> Anqas   MV
> għaraftek       VV
> ...     PUN
> </s>
> ...
> </text>
>
> When I install, I leave the s-attributes as default (since "s" is the only
> structural attribute I have, apart from "text") and specify "pos" as the
> primary p-attribute.
>
> The corpus installs without problems, and I can use CQPWeb's frequency
> list functionality to see a list of different parts of speech, as well as
> word tokens. I can successfully run queries for words. However, any query
> that involves POS gives me no results (e.g. "kien_VA" where "kien" is a
> word and "VA" is a tag).
>
> I'm not sure where the problem lies.
>
> thanks
> albert
>
>
> --
> -----------------------------------------------------------------
> Albert Gatt
> Institute of Linguistics
> Rm 22, Block A
> Car Park 6
> University of Malta
> Tal-Qroqq Msida MSD2080
> Malta
>
> tel: (+356) 2340 2150
> http://staff.um.edu.mt/albert.gatt/
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
>
>


-- 
-----------------------------------------------------------------
Albert Gatt
Institute of Linguistics
Rm 22, Block A
Car Park 6
University of Malta
Tal-Qroqq Msida MSD2080
Malta

tel: (+356) 2340 2150
http://staff.um.edu.mt/albert.gatt/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121105/0bd7274e/attachment.html>


More information about the CWB mailing list