[CWB] CWB Digest, Vol 84, Issue 20

Stefan Evert stefanML at collocations.de
Thu Jan 30 16:24:40 CET 2014


On 30 Jan 2014, at 15:57, Andres Chandia <andres at chandia.net> wrote:

> As always you're right:
> 
> <text id="10038" year="" url_source="http://www.vanity-rechner.de" error="0.444444">
> <text id="10038" year="" url_source="http://www.vanity-rechner.de" error="0.461538">
> <text id="10038" year="" url_source="http://www.vanity-rechner.de" error="0.000000">
> <text id="10038" year="" url_source="http://www.vanity-rechner.de" error="0.000000">
> <text id="10038" year="" url_source="http://www.vanity-rechner.de" error="0.000000">
> <text id="10038" year="" url_source="http://www.vanity-rechner.de" error="0.000000">
> 
> 
> So I have to look the way to reenumerate ids.....

CQPweb isn't designed to work with sentence collections like sdeWaC and some other recent Web corpora.  How did you assign text IDs to the sentences?  One possibility would be to group all sentences with the same ID into a single <text> unit, and keep the original <s> tags (which you seem to have turned into <text> elements).

Cheers,
Stefan



More information about the CWB mailing list