[CWB] UNREADABLE

mansur 6688000 at gmail.com
Sat Mar 10 08:57:28 CET 2018


Hey!

Stefan, I did as you said and replaced all whitespaces in multipleword
tokens with "_" symbol. It does show these words correctly now in
concordance. I can also make search in CQL mode:
[word="кеше.*_генә"]

But I can't make query in CEQL mode:
"кеше_генә"

Syntax error
Sorry, your simple query [[[ кеше_генә ]]] contains a syntax error.
**Error:** no attribute defined for part-of-speech tags (internal error) -
when parsing '' генә '' as **pos_tag**, **pos_constraint** - when parsing
'' кеше_генә '' as **token_expression**, **phrase_element** - at this
location: '' кеше_генә ''**<==**'' '' - when parsing '' кеше_генә '' as
**phrase_query** - when parsing '' кеше_генә '' as **default**

Did I something wrong?

Best,
Mansur

On 9 March 2018 at 12:29, Stefan Evert <stefanML at collocations.de> wrote:

>
> > On 9 Mar 2018, at 09:37, Hardie, Andrew <a.hardie at lancaster.ac.uk>
> wrote:
> >
> > If you’ve got multiword with spaces, then the first element and second
> element will be treated as separate tokens because the CQP concordance line
> uses space as its token delimiter. But this means the first element will
> have no tag… thus why a word-and-tag combination is not read.
>
> You also won't be able to find such multiword tokens with simple (CEQL)
> queries:
>
>         кеше генә
>
> searches for a sequence of two separate tokens "кеше" and "генә".
>
> I would recommend to write multiword tokens with an underscore, i.e
> "кеше_генә" etc.  You'll just have to document this so users know to
> specify the underscore in searches.
>
> Best,
> Stefan
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180310/5309db26/attachment.html>


More information about the CWB mailing list