[CWB] CQP: shared collocates

Sylvain Loiseau sylvain.loiseau at wanadoo.fr
Sun Oct 7 21:38:01 CEST 2012


Hi,

You can use rcqp in order to deal with such filtering of form lists. For instance :

> c <- corpus("DICKENS")
>
>
> # nearly :
>
> nearly <- subcorpus(c, '@[] "nearly"')
> # extract the frequency list of the collocate :
> fl.nearly <- cqp_flist(nearly, "target", "word")
> # fl.nearly is a named vector : the name of the
> # element is the form, the value its frequency
> # in the subcorpus
>
>
> # almost :
>
> almost <- subcorpus(c, '@[] "almost"')
> fl.almost <- cqp_flist(almost, "target", "word")
>
> intersect(names(fl.nearly), names(fl.almost))
 [1] "Their"       "n't"         "that"        ","           "Servant"    
 [6] "with"        "There"       "as"          "the"         "both"       
[11] "shall"       "doubt"       "Marley"      "on"          "no"         
[16] "simile"      "of"          "in"          "long"        ";"          
[21] "have"        "each"        "to"          "my"          "sole"       
[26] "I"           "."           "it"          "It"          "But"        
[31] "Scrooge"     "readers"     "him"         "lips"        "Sometimes"  
[36] "a"           "relate"      "these"       "'s"          "years"      
[41] "he"          "?"           "door-nail"   "coffin-nail" "been"       
[46] "would"       "'"           "December"    "lay"         "You"        
[51] "his"         "said"        "felt"        "merry"       "me"         
[56] "be"         

You can also select as collocates all the tokens found in a given span around "already" and "almost", using the  left.context and right.context option of the the cqp_flist function:

> fl.nearly <- cqp_flist(nearly, "target", "word", left.context=3, right.context=0)

Best,
Sylvain

Le 7 oct. 2012 à 15:00, Aleksandar Trklja a écrit :

> Hi Martí,
> 
> thank you for your reply. I'm sorry my question was unclear. 
> 
> What I mean with 'shared collocates' are the collocates that occur both with a word x and a word y. Say I want to find the collocates that 'almost' and 'nearly' share. The '|' function will show the collocates that occur with either of the two but not with both (e.g. 'almost' occurs with 'certainly' but not with 'nearly'). So I guess I'd need here something like an 'AND' function instead of 'OR'.
> 
> Cheers
> Alex
> 
> 
> 
> ________________________________________
> From: cwb-bounces at sslmit.unibo.it [cwb-bounces at sslmit.unibo.it] on behalf of Martí Quixal [marti.quixal at gmail.com]
> Sent: 07 October 2012 13:01
> To: cwb at sslmit.unibo.it
> Subject: Re: [CWB] CQP: shared collocates (Aleksandar Trklja)
> 
> Hi Aleksander,
> 
> I don't know if I understand your question, but do you mean this?
> 
> DICKENS> ".*" "year|people";
> 2077 matches. Use 'cat' to show.
> DICKENS> group Last match lemma;
> #---------------------------------------------------------------------
> (none)                        the                                  355
>                              of                                   168
>                              a                                    150
>                              other                                100
>                              some                                  65
> (...)
> 
> For more on group check this:
> http://cwb.sourceforge.net/files/CQP_Tutorial/node20.html
> 
> Best
> mq
> On Sun, Oct 7, 2012 at 5:00 AM, <cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>> wrote:
> Send CWB mailing list submissions to
>        cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
> 
> To subscribe or unsubscribe via the World Wide Web, visit
>        http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> or, via email, send a message with subject or body 'help' to
>        cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>
> 
> You can reach the person managing the list at
>        cwb-owner at sslmit.unibo.it<mailto:cwb-owner at sslmit.unibo.it>
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of CWB digest..."
> 
> 
> Today's Topics:
> 
>   1.  CQP: shared collocates (Aleksandar Trklja)
>   2. Installing CQPWeb (Mart? Quixal)
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Sat, 6 Oct 2012 09:12:46 +0000
> From: Aleksandar Trklja <AXT899 at bham.ac.uk<mailto:AXT899 at bham.ac.uk>>
> To: Open source development of the Corpus WorkBench
>        <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
> Subject: [CWB]  CQP: shared collocates
> Message-ID:
>        <A584B1A99417C443AD247357ADAEEE050A863A80 at mbx01.adf.bham.ac.uk<mailto:A584B1A99417C443AD247357ADAEEE050A863A80 at mbx01.adf.bham.ac.uk>>
> Content-Type: text/plain; charset="us-ascii"
> 
> Dear all,
> 
> is it possible to produce with CQP a list that contains only shared collocates of two or more lexical items?
> 
> Many thanks for your help.
> 
> Best
> Alex
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Sat, 6 Oct 2012 21:48:22 -0500
> From: Mart? Quixal <marti.quixal at gmail.com<mailto:marti.quixal at gmail.com>>
> To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
> Subject: [CWB] Installing CQPWeb
> Message-ID:
>        <CAMtTwm8Nb5H4eHvtRPfvf+6qc+QSZSqMzRM+5++RgvsSToGg9w at mail.gmail.com<mailto:CAMtTwm8Nb5H4eHvtRPfvf%2B6qc%2BQSZSqMzRM%2B5%2B%2BRgvsSToGg9w at mail.gmail.com>>
> Content-Type: text/plain; charset="utf-8"
> 
> Dear list members,
> 
> I am installing CQPWeb, and everything seemed to be ok until I typed the
> url in my browser:
> 
> http://localhost/spintx-web/adm
> 
> Then I got this message:
> 
> CQPweb encountered an error and could not continue.
> You do not have permission to use this program.
> 
> And then I looked into the apache error log and saw this other info:
> 
> [Sat Oct 06 21:34:32 2012] [error] [client ::1] File does not exist:
> /Library/WebServer/Documents/spintx-web/css/CQPweb.css, referer:
> http://localhost/spintx-web/adm/
> 
> My questions are:
> 
> 1) Could this missing css file be causing the problem? (sounds weird...)
> so...
> 
> 2) I used the automatic php configuration file and when the script asked
> for a user I gave a user that did not exist as a system user (I mistyped
> it, spintex-web instead of spintx-web). I added manually the system user I
> wanted to use, just in case, but this does not seem to improve anything.
> Then I had the impression that the user created with the automatic php
> config file is only a CQPWeb admin not a system user, am I wrong?
> 
> So, do you have any recommendation? What else should I be looking to?
> 
> I am running the whole thing (apache2, mysql, php, cwb tools, etc.) in the
> latest version (also CQPWeb from svn, not download link) on a Mac OSX
> 10.7.5.
> 
> - PHP 5.3.15 with Suhosin-Patch (cli) (built: Jul 31 2012 14:49:18)
> - Server version: 5.5.28 MySQL Community Server (GPL)
> - Server version: Apache/2.2.22 (Unix)
> -- Server built:   Jul 12 2012 15:11:26
> - CQP Version:   3.0.0
> (and latest versions of all perl modules as on the sourceforge page, except
> for CQPWeb, which is the svn version as recommended)
> 
> Thanks in advance!
> 
> --
> Mart? Quixal
> Computational Linguist & Educational Technologist
> http://www.iqubo.org/quixal
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20121006/93b3087f/attachment-0001.html>
> 
> ------------------------------
> 
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb
> 
> 
> End of CWB Digest, Vol 70, Issue 6
> **********************************
> 
> 
> 
> --
> Martí Quixal
> Computational Linguist & Educational Technologist
> http://www.iqubo.org/quixal
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://devel.sslmit.unibo.it/mailman/listinfo/cwb



More information about the CWB mailing list