[CWB] How to append corpus data into an existing corpora?
Hardie, Andrew
a.hardie at lancaster.ac.uk
Tue Oct 17 15:09:44 CEST 2023
Well, if you think users would prefer to have the corpus at a given URL change regularly, nothing stops you running your server that way. I can only advise on best practice based on my own experience.
Certainly adding data to an index will never be supported at the cwb-encode (Etc.) level because that would drastically complicate the database model required.
best
Andrew.
From: cwb-bounces at sslmit.unibo.it <cwb-bounces at sslmit.unibo.it> On Behalf Of wzzhang at shisu.edu.cn
Sent: Tuesday, October 17, 2023 1:41 AM
To: cwb <cwb at sslmit.unibo.it>
Subject: Re: [CWB] CWB Digest, Vol 199, Issue 7
Thank you Hardie! I can understand overwriting exising index will destroy cached query and other things. However, from points of administrator or users' view, it is very normal to append new VRT files to an existing corpora and to still use the same URL of the updated existing corpora. If changed, it seems very weired.
________________________________
Vincent Zhang
From: cwb-request<mailto:cwb-request at sslmit.unibo.it>
Date: 2023-10-16 20:32
To: cwb<mailto:cwb at sslmit.unibo.it>
Subject: CWB Digest, Vol 199, Issue 7
Send CWB mailing list submissions to
cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
To subscribe or unsubscribe via the World Wide Web, visit
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
or, via email, send a message with subject or body 'help' to
cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it>
You can reach the person managing the list at
cwb-owner at sslmit.unibo.it<mailto:cwb-owner at sslmit.unibo.it>
When replying, please edit your Subject line so it is more specific
than "Re: Contents of CWB digest..."
Today's Topics:
1. Re: How to append corpus data into an existing corpora?
(Hardie, Andrew)
----------------------------------------------------------------------
Message: 1
Date: Mon, 16 Oct 2023 12:31:51 +0000
From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk>>
To: Open source development of the Corpus WorkBench
<cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>>
Subject: Re: [CWB] How to append corpus data into an existing corpora?
Message-ID:
<LO4P265MB3485B676F09D9441E65BC6AFCBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM<mailto:LO4P265MB3485B676F09D9441E65BC6AFCBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM>>
Content-Type: text/plain; charset="utf-8"
I mean it cannot be done at all. You need to start over. As you indicate ? because this?
>> we can instead only run cwb-encode command to re-index and overwrite the existing corpora index
=starting over. So it?s starting over whether you do it via the web UI or the CLI.
But overwriting the existing index is a bad idea, because any saved queries that referenced the index will still point there ? but now they are no longer pointing at the same data.
Better to have parallel names with a changeable suffix:
mycorpus-01
mycorpus-02
?
or
mycorpus-20231015
mycorpus-20231016
?
So that there will not be confusion regarding what corpus any given saved query is associated with. (whether or not you opt to delete older indexes).
best
Andrew.
From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it>> On Behalf Of ???
Sent: Monday, October 16, 2023 12:46 PM
To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it>
Subject: Re: [CWB] CWB Digest, Vol 199, Issue 5
Thank you, Andrew! Do you mean we cannot make it on the admin-ui webpage, we can instead only run cwb-encode command to re-index and overwrite the existing corpora index? If so, it really sucks.It cannot be done by adding more files via the web-ui.
Vincent Zhang
From: cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it%3cmailto:cwb-request at sslmit.unibo.it>>
Date: 2023-10-16 18:00:01
To: cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it%3cmailto:cwb at sslmit.unibo.it>>
Subject: CWB Digest, Vol 199, Issue 5>Send CWB mailing list submissions to
> cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it%3cmailto:cwb at sslmit.unibo.it>>
>
>To subscribe or unsubscribe via the World Wide Web, visit
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>or, via email, send a message with subject or body 'help' to
> cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it<mailto:cwb-request at sslmit.unibo.it%3cmailto:cwb-request at sslmit.unibo.it>>
>
>You can reach the person managing the list at
> cwb-owner at sslmit.unibo.it<mailto:cwb-owner at sslmit.unibo.it<mailto:cwb-owner at sslmit.unibo.it%3cmailto:cwb-owner at sslmit.unibo.it>>
>
>When replying, please edit your Subject line so it is more specific
>than "Re: Contents of CWB digest..."
>
>
>Today's Topics:
>
> 1. How to append corpus data into an existing corpora?
> (wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn%3cmailto:wzzhang at shisu.edu.cn>>)
> 2. Re: How to append corpus data into an existing corpora?
> (Hardie, Andrew)
>
>
>----------------------------------------------------------------------
>
>Message: 1
>Date: Mon, 16 Oct 2023 13:59:39 +0800
>From: "wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn><mailto:wzzhang at shisu.edu.cn%3cmailto:wzzhang at shisu.edu.cn%3e>" <wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn%3cmailto:wzzhang at shisu.edu.cn>>>
>To: cwb <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it%3cmailto:cwb at sslmit.unibo.it>>>
>Subject: [CWB] How to append corpus data into an existing corpora?
>Message-ID: <202310161358581732745 at shisu.edu.cn<mailto:202310161358581732745 at shisu.edu.cn<mailto:202310161358581732745 at shisu.edu.cn%3cmailto:202310161358581732745 at shisu.edu.cn>>>
>Content-Type: text/plain; charset="gb2312"
>
>Hello everyone,
>I found nowhere to append a new VRT file into an existing corpora. If it lack this feature, how to sustainably improve a corpora?
>
>
>
>Vincent Zhang
>Institute of Corpus Studies and Applications, Shanghai International Studies University
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/ef192825/attachment-0001.html>
>
>------------------------------
>
>Message: 2
>Date: Mon, 16 Oct 2023 06:19:46 +0000
>From: "Hardie, Andrew" <a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk<mailto:a.hardie at lancaster.ac.uk%3cmailto:a.hardie at lancaster.ac.uk>>>
>To: Open source development of the Corpus WorkBench
> <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it%3cmailto:cwb at sslmit.unibo.it>>>
>Subject: Re: [CWB] How to append corpus data into an existing corpora?
>Message-ID:
> <LO4P265MB3485AD0D1262A6549EBA62EECBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM<mailto:LO4P265MB3485AD0D1262A6549EBA62EECBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM<mailto:LO4P265MB3485AD0D1262A6549EBA62EECBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM%3cmailto:LO4P265MB3485AD0D1262A6549EBA62EECBD7A at LO4P265MB3485.GBRP265.PROD.OUTLOOK.COM>>>
>
>Content-Type: text/plain; charset="us-ascii"
>
>That's because you can't do it.
>
>You have to create a new corpus index from your original files with your new files appended to them.
>
>Each CWB index then corresponds to the state of your corpus at some particular moment in time. (This is actually desirable from the point of view of replicability of results.)
>
>best
>
>Andrew.
>
>From: cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it%3cmailto:cwb-bounces at sslmit.unibo.it>> <cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it<mailto:cwb-bounces at sslmit.unibo.it%3cmailto:cwb-bounces at sslmit.unibo.it>>> On Behalf Of wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn<mailto:wzzhang at shisu.edu.cn%3cmailto:wzzhang at shisu.edu.cn>>
>Sent: Monday, October 16, 2023 7:00 AM
>To: cwb <cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it<mailto:cwb at sslmit.unibo.it%3cmailto:cwb at sslmit.unibo.it>>>
>Subject: [CWB] How to append corpus data into an existing corpora?
>
>Hello everyone,
>I found nowhere to append a new VRT file into an existing corpora. If it lack this feature, how to sustainably improve a corpora?
>
>________________________________
>Vincent Zhang
>Institute of Corpus Studies and Applications, Shanghai International Studies University
>-------------- next part --------------
>An HTML attachment was scrubbed...
>URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/38eb1612/attachment-0001.html>
>
>------------------------------
>
>_______________________________________________
>CWB mailing list
>CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it%3cmailto:CWB at sslmit.unibo.it>>
>http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>End of CWB Digest, Vol 199, Issue 5
>***********************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231016/fe7f0fd6/attachment.html>
------------------------------
_______________________________________________
CWB mailing list
CWB at sslmit.unibo.it<mailto:CWB at sslmit.unibo.it>
http://liste.sslmit.unibo.it/mailman/listinfo/cwb
End of CWB Digest, Vol 199, Issue 7
***********************************
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20231017/15f770bf/attachment-0001.html>
More information about the CWB
mailing list