[CWB] [cwb:feature-requests] #12 CQPweb: XML support

Andrew Hardie andrewhardie at users.sf.net
Mon Aug 1 00:40:13 CEST 2016


- **status**: open --> closed
- **Group**:  --> TODO-3.5
- **Comment**:

done in Q1 2016



---

** [feature-requests:#12] CQPweb: XML support**

**Status:** closed
**Group:** TODO-3.5
**Labels:** CQPweb 
**Created:** Sun Jun 14, 2009 11:58 PM UTC by Andrew Hardie
**Last Updated:** Wed Dec 12, 2012 05:26 AM UTC
**Owner:** Andrew Hardie


This is the big enhancement for version 3.0: many, MANY users have asked for it.

Just as the "text-based restrictions" parallel the "written text restrictions" in BNCweb, so the "XML-based restrictions" will need to parallel the "utterance-by-speaker-type" system in BNCweb.

Each XML span \(ie s-attribute\) which is to be covered in this way \(and note, not all of the XML in a given corpus needs to be\) will need to be identified by the combination of \(a\) an element-name \(b\) some given attribute. Its "is" in the database will then look a bti like this:

xml\_metadata\_for\_CORPUSNAME \[parallel to text\_metadata\_for\_CORPUSNAME\]
id          gender   class     ...      CQPbegin   CQPend
\-----------------------------------------------------------
u|who|S933  m        AB        ...      \d\d\d\d   \d\d\d\d

Boite, however, this kind of "natural" system for XML identifiers won't work, because the XML segment is not \*uniquely\* identified. Two solutions:
\(1\) allow CQPbeing and CQPend to contain \*multiple\* cwb-indexes
\(2\) enforce uniqueness of XML elements - so "who" could not be used for u, but "id" could be.

Neither of these is entirely satisfactory and this needs careful thinking about.

Also note that every different s-attribute will require \(a\) a different set of CWB-frequency indexes and \(b\) a separate set of frequency tables . This function will be \*\*VERY\*\* hungry of disk space.


---

Sent from sourceforge.net because cwb at sslmit.unibo.it is subscribed to https://sourceforge.net/p/cwb/feature-requests/

To unsubscribe from further messages, a project admin can change settings at https://sourceforge.net/p/cwb/admin/feature-requests/options.  Or, if this is a mailing list, you can unsubscribe from the mailing list.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20160731/2763e9a4/attachment.html>


More information about the CWB mailing list