[CWB] Escape "<" and ">" symbols

mansur 6688000 at gmail.com
Thu Feb 22 18:24:09 CET 2018


Hi!

The error: Can't locate ../lib/perl/cqpwebCEQL.pm at - line 2.

I could fix it only with setting "$perl_extra_directories" and editing file
"ceql.inc.php":
require "../lib/perl/cqpwebCEQL.pm";
to
require "cqpwebCEQL.pm";

I know this is an awful solution, but I have no choice at the moment.

Best,
Mansur

On 22 February 2018 at 12:55, mansur <6688000 at gmail.com> wrote:

> Hello, Stefan!
>
> Thank you so much for the answers and advice! They clearified me many
> things.
>
> > You may also need to configure CQPweb and set appropriate paths there.
>
> Could you, please, explain how I can do that?
>
> Thank you!
> Best,
> Mansur
>
>
> On 22 February 2018 at 11:52, Stefan Evert <stefanML at collocations.de>
> wrote:
>
>> Dear Mansur,
>>
>> most of the remaining issues are related to CQPweb, so Andrew will be in
>> a much better position to answer them and help you with the debugging.
>> Some of them are clearly (mis-)configuration issues, e.g. the failure to
>> locate the CEQL backend that is part of CQPweb or the failure to run CQP.
>>
>> Are you working with an up-to-date version of CQPweb checked out from the
>> SVN repository?
>>
>>
>> > 3) After rebooting computer any search does not work at all:
>> > ERROR: CQP backend startup failed; the reported CQP version [] could
>> not be parsed.
>> > But from the comman line I can perform search with 'cqp -e' and it
>> seems to be working, at least I can see search results.
>>
>> This suggests that you have CQP installed, but in a "private" path that's
>> only visible to your user account and not to the Web server running
>> CQPweb.  You may also need to configure CQPweb and set appropriate paths
>> there.
>>
>> > 4) Is it possible to choose ranges of periods in search according to
>> the 'date'?
>> > <text id="" date=?????>
>>
>> I think Andrew is working on support for date attributes in CQPweb.
>>
>> In plain CQP, there are two ways of doing date searches:
>>
>> a) The reasonable way: Store your dates in a simple standard format – I
>> prefer ISO YYYY-MM-DD, so alphabetical and chronological sort order are the
>> same – and then construct regular expressions for your suitable date
>> ranges, e.g. in the global constraint of a CQP query:
>>
>>         … :: match.text_date = "2011-03.*";  # anything in March 2011
>>
>>         … :: match.text_date = "1990-(01-(1[2-9]|[23]\d)|02-.*|03-([0-1]\d|2[0-4]))";
>> # 12 Jan 1990 .. 24 Mar 1990
>>
>> b) The "I'm a Unix hacker way": convert your dates to 32-bit integers and
>> use numeric comparisons.  The obvious choice would be consecutive numbers
>> for days (or even seconds as in Unix timestamps), but conversion from/to
>> human-readable dates will be complicated.  However, you could encode the
>> ISO-format above _without_ the hyphens to get 8-digit numbers, e.g.
>>
>>         <text id="…" date="20180222">
>>
>> and then cast to integers for numerical comparisons:
>>
>>         … :: int(match.text_date) >= 19900112 & int(match.text_date) <=
>> 19900324;
>>
>> Nice trick, isn't it?
>>
>> > 5) When I press 'Show tags' button I get
>> > 2012_ нче_ елда_ республикада_ 55_ мең_ 839_ бала_ дөньяга_ килгән_ ._
>> > but no tags.
>>
>> That's because CQPweb failed to do proper HTML-escaping for the
>> annotation strings (which is not only incovenient but also a security risk).
>>
>>         @Andrew: has this bug been fixed in the lastest CQPweb code?
>>
>> I've been bitten by similar issues before and would recommend avoiding
>> HTML metacharacters (and other funny things) in annotation strings.  Better
>> recode to something like
>>
>>         n:sg:px3sp:nom
>>
>> or even
>>
>>         |n|sg|px3sp|nom|
>>
>> so you can use the "contains" operator in searches.
>>
>> > I think it is maybe because I didn't replace "<" and ">" in my
>> morphological tags to their XML entities yet. Please, correct me if I'm
>> wrong.
>>
>> That won't help!  With -x, cwb-encode will decode the XML entities in
>> your input file and you'll end up with < and > in the indexed corpus.  You
>> could encode without the -x flag, but then your annotation strings will be
>>
>>         &lt;n&gt;&lt;sg&gt;&lt;px3sp&gt;&lt;nom&gt;
>>
>> which happens to display nicely only until HTML escaping in CQPweb is
>> fixed – and you will have to search for
>>
>>         [pos = ".*&lt;nom&gt;.*"]
>>
>> instead of
>>
>>         [pos = ".*<nom>.*"]
>>
>> > 7) I also saw the button 'Export corpus -> Export whole corpus'. Does
>> that mean that users can download the whole corpus? Is it possible to turn
>> it off somehow?
>>
>> AFAIK, only users with the "full access privilege" are allowed to
>> download a corpus.  So if you want to disable downloads, simply keep to
>> "normal access".
>>
>>
>> Best,
>> Stefan
>>
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180222/8f499ec4/attachment.html>


More information about the CWB mailing list