[CWB] Escape "<" and ">" symbols

mansur 6688000 at gmail.com
Fri Mar 9 08:12:27 CET 2018


Oh, thank you Andrew! "Manage annotations" menu helped :)

On 9 March 2018 at 09:59, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:

> Have you configured the pos attribute as your primary annotation? (either
> by setting it as such when indexing, or via the “Manage annotation”
> controls)?
>
>
>
> “Show tags” displays the primary annotation, but the system needs to know
> which that is in order to do so.
>
>
>
> Andrew.
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] *On
> Behalf Of *mansur
> *Sent:* 09 March 2018 06:48
> *To:* Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it
> >
> *Subject:* Re: [CWB] Escape "<" and ">" symbols
>
>
>
> Hello!
>
> According to your advice I'm using tags like this:
>
> n:nom:pl
>
> But when I press "Show tags" in the concordance, it does not show tags
> anyway:
> Нурый_ -_ Биктимернең_ энесе_ ,_ комсомол_ ячейкасы_ секретаре_ ._
> Мөршидә_ -_ Нурыйның_ йөри_ торгам_ кызы_ ._
> Әпрэй_ -_ ялкау_ ._
>
> Maybe I need to configure it somewhere?
>
> Columns in my vrt file:
>
> word
>
> lemma
>
> pos
>
> tags
>
> Thank you!
>
> With best wishes,
>
> Mansur
>
>
>
> On 5 March 2018 at 15:53, Hardie, Andrew <a.hardie at lancaster.ac.uk> wrote:
>
> If you use | then you can treat the attribute as a feature set. This might
> be useful. You can see a description of what feature sets allow you to do
> in the encoding tutorial.
>
>
>
> If you don’t care about it being a feature set, then you can use any
> character. People often  do use : as a joiner, but there is no reason not
> to use ; or , instead if that makes more sense for your purposes. I’d
> suggest not using . because it is a regular expression metacharacter.
>
>
>
> best
>
>
>
> Andrew.
>
>
>
>
>
> *From:* cwb-bounces at sslmit.unibo.it [mailto:cwb-bounces at sslmit.unibo.it] *On
> Behalf Of *mansur
> *Sent:* 05 March 2018 12:46
> *To:* Open source development of the Corpus WorkBench <cwb at sslmit.unibo.it
> >
> *Subject:* Re: [CWB] Escape "<" and ">" symbols
>
>
>
> Hello, Stefan, Andrew and others!!!
>
> You advised to use tagging style like:
>
> n:sg:px3sp:nom
> or
> n|sg|px3sp|nom
>
> Is there any particular reason why ":" or "|" instead of "<" or ">". Is it
> possible to use "," (comma)? What do you usually use in your projects?
>
>
>
> Thank you!
>
> With best wishes,
>
> Mansur
>
>
>
> On 22 February 2018 at 11:52, Stefan Evert <stefanML at collocations.de>
> wrote:
>
> Dear Mansur,
>
> most of the remaining issues are related to CQPweb, so Andrew will be in a
> much better position to answer them and help you with the debugging.  Some
> of them are clearly (mis-)configuration issues, e.g. the failure to locate
> the CEQL backend that is part of CQPweb or the failure to run CQP.
>
> Are you working with an up-to-date version of CQPweb checked out from the
> SVN repository?
>
>
> > 3) After rebooting computer any search does not work at all:
> > ERROR: CQP backend startup failed; the reported CQP version [] could not
> be parsed.
> > But from the comman line I can perform search with 'cqp -e' and it seems
> to be working, at least I can see search results.
>
> This suggests that you have CQP installed, but in a "private" path that's
> only visible to your user account and not to the Web server running
> CQPweb.  You may also need to configure CQPweb and set appropriate paths
> there.
>
> > 4) Is it possible to choose ranges of periods in search according to the
> 'date'?
> > <text id="" date=?????>
>
> I think Andrew is working on support for date attributes in CQPweb.
>
> In plain CQP, there are two ways of doing date searches:
>
> a) The reasonable way: Store your dates in a simple standard format – I
> prefer ISO YYYY-MM-DD, so alphabetical and chronological sort order are the
> same – and then construct regular expressions for your suitable date
> ranges, e.g. in the global constraint of a CQP query:
>
>         … :: match.text_date = "2011-03.*";  # anything in March 2011
>
>         … :: match.text_date = "1990-(01-(1[2-9]|[23]\d)|02-.*|03-([0-1]\d|2[0-4]))";
> # 12 Jan 1990 .. 24 Mar 1990
>
> b) The "I'm a Unix hacker way": convert your dates to 32-bit integers and
> use numeric comparisons.  The obvious choice would be consecutive numbers
> for days (or even seconds as in Unix timestamps), but conversion from/to
> human-readable dates will be complicated.  However, you could encode the
> ISO-format above _without_ the hyphens to get 8-digit numbers, e.g.
>
>         <text id="…" date="20180222">
>
> and then cast to integers for numerical comparisons:
>
>         … :: int(match.text_date) >= 19900112 & int(match.text_date) <=
> 19900324;
>
> Nice trick, isn't it?
>
> > 5) When I press 'Show tags' button I get
> > 2012_ нче_ елда_ республикада_ 55_ мең_ 839_ бала_ дөньяга_ килгән_ ._
> > but no tags.
>
> That's because CQPweb failed to do proper HTML-escaping for the annotation
> strings (which is not only incovenient but also a security risk).
>
>         @Andrew: has this bug been fixed in the lastest CQPweb code?
>
> I've been bitten by similar issues before and would recommend avoiding
> HTML metacharacters (and other funny things) in annotation strings.  Better
> recode to something like
>
>         n:sg:px3sp:nom
>
> or even
>
>         |n|sg|px3sp|nom|
>
> so you can use the "contains" operator in searches.
>
> > I think it is maybe because I didn't replace "<" and ">" in my
> morphological tags to their XML entities yet. Please, correct me if I'm
> wrong.
>
> That won't help!  With -x, cwb-encode will decode the XML entities in your
> input file and you'll end up with < and > in the indexed corpus.  You could
> encode without the -x flag, but then your annotation strings will be
>
>         &lt;n&gt;&lt;sg&gt;&lt;px3sp&gt;&lt;nom&gt;
>
> which happens to display nicely only until HTML escaping in CQPweb is
> fixed – and you will have to search for
>
>         [pos = ".*&lt;nom&gt;.*"]
>
> instead of
>
>         [pos = ".*<nom>.*"]
>
> > 7) I also saw the button 'Export corpus -> Export whole corpus'. Does
> that mean that users can download the whole corpus? Is it possible to turn
> it off somehow?
>
> AFAIK, only users with the "full access privilege" are allowed to download
> a corpus.  So if you want to disable downloads, simply keep to "normal
> access".
>
>
> Best,
> Stefan
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180309/65933bd5/attachment.html>


More information about the CWB mailing list