From michells at ims.uni-stuttgart.de Thu Jul 7 11:46:59 2011 From: michells at ims.uni-stuttgart.de (Lukas Michelbacher) Date: Thu Jul 7 11:47:15 2011 Subject: [CWB] maximum corpus size and structural attributes Message-ID: Hello, Do structural attributes count towards the 2^31 token boundary? Lukas -- Dipl.-Ling. Lukas Michelbacher Institute for Natural Language Processing University of Stuttgart phone: +49 (0)711-685-84587 fax : +49 (0)711-685-81366 email: michells@ims.uni-stuttgart.de From stefanML at collocations.de Thu Jul 7 12:31:30 2011 From: stefanML at collocations.de (Stefan Evert) Date: Thu Jul 7 12:31:39 2011 Subject: [CWB] maximum corpus size and structural attributes In-Reply-To: References: Message-ID: > Do structural attributes count towards the 2^31 token boundary? No, they're stored as pairs of start and end positions rather than included in the token stream. You should be able to build a corpus containing exactly 2^31 - 1 tokens. Best, Stefan From noreply at sourceforge.net Wed Jul 13 18:01:35 2011 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 13 18:01:50 2011 Subject: [CWB] [ cwb-Feature Requests-2824094 ] CQPweb: query-diff postprocess Message-ID: <20110713160140.AF8E48CC7E@einstein.sslmit.unibo.it> Feature Requests item #2824094, was opened at 2009-07-20 05:25 Message generated for change (Comment added) made by nobody You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=722306&aid=2824094&group_id=131809 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: CQPweb Group: None Status: Open Priority: 6 Private: No Submitted By: Andrew Hardie (andrewhardie) Assigned to: Andrew Hardie (andrewhardie) Summary: CQPweb: query-diff postprocess Initial Comment: Idea for an additional postprocess function: query-diff ie difference between two queries. Have an option on the concordance-page menu "filter out results shared with another query". This goes to an admin page, where the user's saved queries are listed. The orig query the user came from is q1, the saved query they select is q2. Running the postprocess creates a new query containing all && only the hits in q1 that do not occur in q2. How to implement: create a temporary mySQL table for q2's conc lines (begin position / end position [implemented as one field or 2?? ]). Then, dump q1. Read the dump line by line -- check the temp table to see if it's there, --------If it is, skip it. --------If it isn't, write it to tempfile Finally, if there is anything in tempfile, add the number of hits at the start (creating tempfile2) && undump tempfile2 to create the new query. ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2011-07-13 16:01 Message: 4k3ofD tzhtguwzroeh, [url=http://dtsoimptdyxk.com/]dtsoimptdyxk[/url], [link=http://myqhirpthrze.com/]myqhirpthrze[/link], http://vjjyxvmjczaz.com/ ---------------------------------------------------------------------- Comment By: Nobody/Anonymous (nobody) Date: 2011-04-08 07:03 Message: hRr8Gw yutgxlqdvpgw, [url=http://wvhldgufnsrz.com/]wvhldgufnsrz[/url], [link=http://ndhbyzichnfe.com/]ndhbyzichnfe[/link], http://vwkdzrqjnsvo.com/ ---------------------------------------------------------------------- Comment By: Stefan Evert (schtepf) Date: 2009-10-25 17:27 Message: From our experience working interactively in CQP, it's even more useful to be able to run subqueries, i.e. filter query results either by collocates (tokens with a certain property within a specified range, e.g. a finite verb within 3 words) or by another CQP query. This could easily be implemented using "set keyword" and subqueries in CQP, but the results would have to be stored as saved queries (because they can't easily be reproduced when they're dropped from the cache). ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=722306&aid=2824094&group_id=131809 From noreply at sourceforge.net Wed Jul 20 11:55:48 2011 From: noreply at sourceforge.net (SourceForge.net) Date: Wed Jul 20 11:56:16 2011 Subject: [CWB] [ cwb-Bugs-3341353 ] tabulate crashes with empty query result Message-ID: <20110720095601.DA6DB8C249@einstein.sslmit.unibo.it> Bugs item #3341353, was opened at 2011-06-28 19:00 Message generated for change (Comment added) made by schtepf You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=722303&aid=3341353&group_id=131809 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: CQP engine Group: None >Status: Closed >Resolution: Fixed Priority: 5 Private: No Submitted By: Nobody/Anonymous (nobody) Assigned to: Nobody/Anonymous (nobody) Summary: tabulate crashes with empty query result Initial Comment: Hi, Version: 3.2.7, x86_64 GNU/Linux DEWIKI2> [lemma = "?u?ere"]; 0 matches. DEWIKI2> tabulate Last match > "WORD_HITS/?u?ere.hits"; cqp: output.c:822: pt_validate_anchor: Assertion `cl->range && cl->size > 0' failed. Aborted The error is reproducable with my setup. Lukas -- Lukas Michelbacher Institute for Natural Language Processing University of Stuttgart http://www.ims.uni-stuttgart.de/~michells/ ---------------------------------------------------------------------- >Comment By: Stefan Evert (schtepf) Date: 2011-07-20 11:55 Message: fixed in 3.0.1 and main trunk (r244) ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=722303&aid=3341353&group_id=131809 From waldenfels at issl.unibe.ch Thu Jul 28 10:11:40 2011 From: waldenfels at issl.unibe.ch (Ruprecht von Waldenfels) Date: Thu Jul 28 10:12:00 2011 Subject: [CWB] UTF8 Bug Message-ID: <4E3119BC.5020404@issl.unibe.ch> Dear everybody, it seems the last version of CWB does not properly deal with UTF-8 when it comes to the calculation of context size. If 25 characters are defined as context size, in some cases illegal characters are output, presumably because of truncation. This is a real problem if when works with XML, since the output is then no longer valid XML. However, the problem is easy to avoid by choosing a different context measure. I did not know how to submit this as a bug. All the best! Ruprecht PS: a sample corpus (hope this makes it through the web servers!): ????????? ????? . ? ???????? . ??? ?? ??????? ???????? ??????? ????? ?????? . ?? ?? ????? ?????? , -- ------------------------------------------------ Ruprecht von Waldenfels Universitaet Bern Institut fuer slavische Sprachen und Literaturen Laenggassstrasse 49 - CH 3005 Bern 9 ------------------------------------------------ Tel: +41 31 631 35 83 / Fax: +41 31 631 39 90 Tel: +49 761 214 66 72 / Mob.: +49 163 230 34 23 ------------------------------------------------