[CWB] Accessing phonetic transcriptions in CQPweb

Javier Pueyo javier.pueyo at gmail.com
Mon Jun 4 14:54:37 CEST 2018

Hi Eva,

For my historical (and spoken) Spanish corpora I also have 4 columns:
paleographic form (or spoken form), PoS, lemma, and normalized form. The
first column should be the one you want to be able to search for it without
any CEQL shortcut. In order to show by default "what was actually said" you
should define the 4th column (cGAT-Transcript) as your first one. If you do
not want to do that, you could also define the cGAT-Transcript as the
alternate view in extended context (and as a gloss in the KWIC results).

Although we can definitively use  the CQP syntax to search for the
normalized forms: [normalized="haben"], in order to be able to use a fourth
column shortcut within the CEQL syntax ---normally limited to 3 + 2
shortcuts: word (word) / PoS (_POS) / Lemma {LEMMA} + SimplePoS
_{SimplePOS} / Lemma/SimplPOS Combined (_{LEMMA/SimplePOS} --- I came up
with a really "dirty trick": I created an additional entry in my
"Simplified PoS mapping table", like this:

"*" => "*"

and then, I declared the "normalized" column to be used within the "Combination
annotation". So now, I can use the Combination annoation shortcut {haben/*} to
search for "normalized form 'haben' having ANY Simplified PoS". I said it
was a "dirty trick", and I mean it: it will impact the CEQL performance if
you have a really big corpus. Also, since my corpora are only in Spanish, I
do not need the original {Lemma / SimplifiedPOS} shortcut (but you might).
However I do really need my normalized columns to be accesible within the



2018-06-04 7:26 GMT-04:00 Eva Bretschneider <
eva.bretschneider at uni-leipzig.de>:

> Dear everybody,
> I have a question regarding my corpora:
> The texts are transcriptions of spoken German written with cGAT. They are
> edited so the first column in the data is "normalized", meaning the
> transcription was adjusted to "normal" writing. The second column is the
> POS-tag, third the lemma and the fourth is the cGAT-Transcript.
> My question is: Is there a way to display this fourth column when
> accessing the corpus? E.g. searching for {haben} and displaying "what was
> actually said", meaning the transcript in the fourth column?
> Thanks a lot for any help,
> best regards
> Eva
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20180604/1d9df0a2/attachment.html>

More information about the CWB mailing list