From graham.ranger at univ-avignon.fr Sun Dec 21 14:17:26 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger -- UAPV) Date: Sun, 21 Dec 2025 14:17:26 +0100 Subject: [CWB] Research using cqpweb In-Reply-To: <54a4424d-66d5-4c18-8c1d-a133a76d782d@uni-saarland.de> References: <50a17441-961d-477d-9d7c-9eff289bd827@uni-saarland.de> <9A4962A4-9BFC-46A3-8349-F3EA0F29EA99@collocations.de> <54a4424d-66d5-4c18-8c1d-a133a76d782d@uni-saarland.de> Message-ID: <7ec32a46-f2e2-44ac-baa8-23c5052f9742@univ-avignon.fr> Hello everybody, Does anybody know if there is a resource listing research (or teaching, come to that) carried out using cqpweb? Thanks in advance. Best, Graham. -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.ranger at univ-avignon.fr Sun Dec 21 14:24:53 2025 From: graham.ranger at univ-avignon.fr (Graham Ranger -- UAPV) Date: Sun, 21 Dec 2025 14:24:53 +0100 Subject: [CWB] cqpweb and phonetic transcription In-Reply-To: <54a4424d-66d5-4c18-8c1d-a133a76d782d@uni-saarland.de> References: <50a17441-961d-477d-9d7c-9eff289bd827@uni-saarland.de> <9A4962A4-9BFC-46A3-8349-F3EA0F29EA99@collocations.de> <54a4424d-66d5-4c18-8c1d-a133a76d782d@uni-saarland.de> Message-ID: <90f89b4f-8682-4382-99cb-b4b030e143da@univ-avignon.fr> Hello again, A second question, on a different thread for clarity: does anybody have experience with text and phonetic transcription? Specifically, I have transcriptions of interviews made 30-40 years ago in a form of regional French that only had 40 speakers at the time. I have 1) IPA transcriptions, with one or two local conventions for pauses, etc. and 2) reformulations in standard French. The variety being exclusively oral, this is all I have. Now, I would imagine that I could do this either as a corpus and its "translation" or as a single corpus with the transcriptions as sentence-level attributs or something like that. Would the first type allow for searches that start with the IPA transcription? The second type appears of rather limited interest, since searches would need to start with the reformulation. One last question: I think that the audio could be linked to the files as metadata. Is this right? In short, any accounts of user experiences with similar corpora would be very helpful! Best, Graham. -------------- next part -------------- An HTML attachment was scrubbed... URL: From a.hardie at lancaster.ac.uk Sun Dec 21 15:39:23 2025 From: a.hardie at lancaster.ac.uk (Hardie, Andrew) Date: Sun, 21 Dec 2025 14:39:23 +0000 Subject: [CWB] cqpweb and phonetic transcription In-Reply-To: <90f89b4f-8682-4382-99cb-b4b030e143da@univ-avignon.fr> References: <50a17441-961d-477d-9d7c-9eff289bd827@uni-saarland.de> <9A4962A4-9BFC-46A3-8349-F3EA0F29EA99@collocations.de> <54a4424d-66d5-4c18-8c1d-a133a76d782d@uni-saarland.de> <90f89b4f-8682-4382-99cb-b4b030e143da@univ-avignon.fr> Message-ID: I?ve indexed various corpora whose primary token stream was an IPA transcription (because the language was one without a written form). It works just as normal. Remember CQPweb as software is totally agnostic as to the script that the data uses, so IPA is just as good as Latin, Greek, Cyrillic, Japanese, or whatever. But that means that, just like data in any other script, you need it to be tokenised, and any word-level annotation needs to be presented alongside the tokens as extra columns in the Vrt file. So for instance you can have IPA as an annotation, alongside others possibly, e.g. a POS as here: my ma? POSSPRO name ne:m NOUN is ?z VERB Andrew and?u: NOUN Or you can have the primary data be in IPA, and then either add or don?t add the orthographic form as annotation: ma? my ne:m name ?z is and?u: Andrew IN SUM, If your standard French and your IPA transcriptions line up word by word, you can use one of them as an annotation on the other. Then, you can search on either in the usual way using either CQL or simple query. This is the best and most flexible approach. If the word lineup doesn?t match, so you can?t do it as per above, then either of the techniques you mention, IE giving the Stand.Fr. as a sentence-level translation, or using two ?parallel? corpora, would work. Neither is the ideal way to handle this kind of data. But if you don?t have tokenisation lineup, then you might have to go with one of these. >> Would the first type allow for searches that start with the IPA transcription? So long as your IPA data is either the ?word? (first column of the input) or an annotation (second column), you can search it. (Your users would need an IPA soft keyboard of course. I am working on adding soft keyboards, but it?s not complete yet.) >> One last question: I think that the audio could be linked to the files as metadata. Is this right? Yes. See admin manual section 7.5.1. Provide address of the files with the audio: prefix. best Andrew. From: CWB On Behalf Of Graham Ranger -- UAPV via CWB Sent: 21 December 2025 13:25 To: cwb at sslmit.unibo.it Cc: Graham Ranger -- UAPV Subject: [CWB] cqpweb and phonetic transcription Hello again, A second question, on a different thread for clarity: does anybody have experience with text and phonetic transcription? Specifically, I have transcriptions of interviews made 30-40 years ago in a form of regional French that only had 40 speakers at the time. I have 1) IPA transcriptions, with one or two local conventions for pauses, etc. and 2) reformulations in standard French. The variety being exclusively oral, this is all I have. Now, I would imagine that I could do this either as a corpus and its "translation" or as a single corpus with the transcriptions as sentence-level attributs or something like that. Would the first type allow for searches that start with the IPA transcription? The second type appears of rather limited interest, since searches would need to start with the reformulation. One last question: I think that the audio could be linked to the files as metadata. Is this right? In short, any accounts of user experiences with similar corpora would be very helpful! Best, Graham. -------------- next part -------------- An HTML attachment was scrubbed... URL: From graham.ranger at univ-avignon.fr Sun Dec 21 19:17:09 2025 From: graham.ranger at univ-avignon.fr (graham.ranger) Date: Sun, 21 Dec 2025 19:17:09 +0100 Subject: [CWB] cqpweb and phonetic transcription In-Reply-To: Message-ID: <20251221181709.A5BCE2027E@zmtaauth05.partage.renater.fr> Brilliant! Thanks as ever for your full and precise answer, Andrew. I think that, given the material, I'm probably looking at a parallel corpus set up. I'm going to have fun transforming my colleague's fairly anarchic word files into something palatable but that's another story!?Best,?Graham.Envoy? depuis mon appareil Galaxy -------- Message d'origine --------De : "Hardie, Andrew via CWB" Date : 21/12/2025 15:41 (GMT+01:00) ? : Open source development of the Corpus WorkBench Cc : "Hardie, Andrew" Objet : Re: [CWB] cqpweb and phonetic transcription I?ve indexed various corpora whose primary token stream was an IPA transcription (because the language was one without a written form). It works just as normal. Remember CQPweb as software is totally agnostic as to the script that the data uses, so IPA is just as good as Latin, Greek, Cyrillic, Japanese, or whatever. ? But that means that, just like data in any other script, you need it to be tokenised, and any word-level annotation needs to be presented alongside the tokens as extra columns in the Vrt file. ? So for instance you can have IPA as an annotation, alongside others possibly, e.g. a POS as here: ? my????????? ma?????????? POSSPRO name??????? ne:m???????? NOUN is????????? ?z?????????? VERB Andrew????? and?u:?????? NOUN ? Or you can have the primary data be in IPA, and then either add or don?t add the orthographic form as annotation: ? ma??????? ??my????????? ne:m????? ??name??????? ?z??????? ??is????????? and?u:??? ??Andrew????? ? IN SUM, If your standard French and your IPA transcriptions line up word by word, you can use one of them as an annotation on the other. Then, you can search on either in the usual way using either CQL or simple query. This is the best and most flexible approach. ? If the word lineup doesn?t match, so you can?t do it as per above, then either of the techniques you mention, IE giving the Stand.Fr. as a sentence-level translation, or using two ?parallel? corpora, would work. Neither is the ideal way to handle this kind of data. But if you don?t have tokenisation lineup, ?then you might have to go with one of these. ? ? >> Would the first type allow for searches that start with the IPA transcription? ? So long as your IPA data is either the ?word? (first column of the input) or an annotation (second column), you can search it. ? (Your users would need an IPA soft keyboard of course. I am working on adding soft keyboards, but it?s not complete yet.) ? ? >> One last question: I think that the audio could be linked to the files as metadata. Is this right? ? Yes. See admin manual section 7.5.1. Provide address of the files with the audio: prefix. ? best ? Andrew. ? ? ? From: CWB On Behalf Of Graham Ranger -- UAPV via CWB Sent: 21 December 2025 13:25 To: cwb at sslmit.unibo.it Cc: Graham Ranger -- UAPV Subject: [CWB] cqpweb and phonetic transcription ? Hello again, A second question, on a different thread for clarity: does anybody have experience with text and phonetic transcription? Specifically, I have transcriptions of interviews made 30-40 years ago in a form of regional French that only had 40 speakers at the time. I have 1) IPA transcriptions, with one or two local conventions for pauses, etc. and 2) reformulations in standard French. The variety being exclusively oral, this is all I have. Now, I would imagine that I could do this either as a corpus and its "translation" or as a single corpus with the transcriptions as sentence-level attributs or something like that. Would the first type allow for searches that start with the IPA transcription? The second type appears of rather limited interest, since searches would need to start with the reformulation. One last question: I think that the audio could be linked to the files as metadata. Is this right? In short, any accounts of user experiences with similar corpora would be very helpful! Best, Graham. -------------- next part -------------- An HTML attachment was scrubbed... URL: