[CWB] News texts in CQPWeb

Hardie, Andrew a.hardie at lancaster.ac.uk
Mon Jan 28 12:31:43 CET 2013


As Martí says, that’s quite right (the CQPweb form currently just drops these straight through to the CWB tools, so the cwb-encode formalism is needed – a more intuitive web user interface will be provided at some point). Sorry for not getting to these mails over the weekend!

best

Andrew.


From: cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] On Behalf Of Kurt Sultana
Sent: 27 January 2013 09:21
To: Open source development of the Corpus WorkBench
Subject: Re: [CWB] News texts in CQPWeb

Bumped into an interesting mail post and I've put in news:0+title+source+date, s and p:0+id now as s-attributes. Seems to be working now. Could anyone confirm I'm doing this right?

Thanks,
Kurt

On Sat, Jan 26, 2013 at 7:41 PM, Kurt Sultana <kurtanatlus at gmail.com<mailto:kurtanatlus at gmail.com>> wrote:
Hi,

I've dug up a bit and have come to know that the attributes I mentioned are stored as s-attributes. So, I have this example text:

<news title="A Thrilling Experience" date="01/01/2013" source="www.timesofmalta.com<http://www.timesofmalta.com>">
<text id="4">
<p id="1">
<s>
Tick    NN    tick
.    SENT     .
</s>
<s>
A    DT     a
clock    NN    clock
.    SENT    .
</s>
<s>
Tick    VB    tick
,    ,    ,
tick    VB    tick
.    SENT    .
</s>
</p>
</text>
</news>
As s-attributes (XML elements) I put in p, p_id, news, news_title, news_source and news_date. Upon installing the corpus, I select to install metadata via xml annotated within the corpus and select news_title, news_source and news_date however when I click on "Create metadata table from XML using settings above", I get this error:

Error message
**** CQP ERROR ****
CQP Error:
No annotated values for s-attribute ``news_title'' in named query c_M_F_xml

I'm not 100% confident of what I'm doing since it's my first time, so I might have easily misunderstood something. What am I doing wrong?
Many thanks in advance,
Kurt


On Thu, Jan 24, 2013 at 10:39 PM, Kurt Sultana <kurtanatlus at gmail.com<mailto:kurtanatlus at gmail.com>> wrote:
Hi all,

I have a news corpus which I'd like to put in CQPWeb.

I'm currently representing a news text (in Maltese) like this:
<text id="1">
<s>
L NP
- PUN
armi VV
nxtraw VV
separatament MV
minn PRP
l- DDC
istess MJ
kollezzjonista NN
anonimu NN
minn PRP
Texas NP
. PUN
</s>
<s>
Dan PD
ifisser VV
li CMP
l- DDC
armi NN
anke CC
wara PRP
li CMP
nbiegħu VV
se PAF
jibqgħu VV
flimkien MV
. PUN
</s>
</text>

A news text, apart from text, usually contains the title and date of publication. How could I include this information in the above, for example? Would these take the form of attributes? And could I run queries against these new attributes?

Thanks in advance,
Kurt


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20130128/fee0670c/attachment.html>


More information about the CWB mailing list