[CWB] can I create a function that modifies the output?

Stefan Evert stefanML at collocations.de
Thu Apr 4 13:35:25 CEST 2013


On 2 Apr 2013, at 11:51, "BOFÍAS ALBERCH, EVA" <eva.bofias at upf.edu> wrote:

> I would like to make simple modifications of the output of the CQP. Is there an easy way of doing it? I have seen that def and define are reserved words but I have not been able to find how do they work. Can I create a function that can be called on the command line?

CQP isn't a general-purpose programming language, so it's difficult to do any non-trivial post-processing of CQP queries and/or output.  What people usually do is to run a short Perl or Python script on CQP output, or even post-process with Unix command-line tools such as sed (for regexp-based substitution) or awk (for working with tabular output formats).

If you're working on linux or Mac OS X, you can apply the post-processing directly by sending CQP output to a suitable Unix pipe.  You could even write macros for common tasks.  As a simple example, say you want to obtain a type count for your query results.  The easiest approach is to make frequency counts with "group" or "count" and then count how many lines of output you get (with the Unix command "wc -l").  In a CQP sessions, this works as follows:

    Matches = ... some CQP query ...;
    count Matches by word > "| wc -l";

The ">" redirects CQP output to a file, here the frequency counts for each distinct match of your query.  If the filename starts with a vertical bar ("|"), it is interpreted as a Unix pipe through which the output is past.  In this example, the number of lines, i.e. distinct types, will be displayed in your terminal window.

If you want to save the post-processed results to a disk file, you can use another redirection in the pipe, e.g.

    count Matches by word > "| wc -l > no_of_types.txt";

Note that the entire pipe, including the final redirection, has to be specified as a single string in (single or double) quotes.

If you find you often need such post-processing tricks, you can create CQP macros for the most common cases.  See the CQP tutorial on how to write a macro definition file and automatically load it when you start CQP.

A macro for line counts might look as follows:

   define macro line_count(0) '> "| wc -l"';
   count Matches by word /line_count[];

In order to redirect the resulting output (i.e. the line count) to a file, you need to pass the filename as a parameter to the macro (because it has to be inserted into the unix pipe, so you can't just append it after the macro invocation).  Define a second version of the macro that accepts a single parameter:

   define macro line_count(1) '> "| wc -l > $0"';
   count Matches by word /line_count["no_of_types.txt"];

Note that the macro argument has to be quoted (in most cases) and this isn't going to work if your filename contains blanks or other special characters (you can try to work around this problem, but the necessary escapes for nested quotation marks will soon get fairly ugly).

Hope this helps,
Stefan





More information about the CWB mailing list