<html><head><meta http-equiv="content-type" content="text/html; charset=utf-8"></head><body style="overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Dear CWB members,</span><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">Maybe someone can help with this. I want to create a workflow to annotate, edit, etc. CWB corpora in R and I have some open issues.</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">What works so far:</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">1) polmineR C.old <- decode([CORPUS], to=“data.table) —> which works fine and creates a datable of the tokenstream with p_attributes as well as s_attributes in columns. </span></div><div><span style="font-size: 14px;"><span class="Apple-tab-span" style="white-space:pre">        </span>- The CWB corpus contains the following s_attributes: "”corpus”, "text” , “text_id",”s",”s_id",“s_polarity",”s_subjectivity"</span></div><div><span style="font-size: 14px;"><span class="Apple-tab-span" style="white-space:pre">        </span>- the decoded data.table C.old contains columns for all of these, with “corpus”,”text”, and “s” being empty</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">2) using cwbtools I do:</span></div><div><span style="font-size: 14px;"><span class="Apple-tab-span" style="white-space:pre">        </span>- C.new<- CorpusData$new()</span></div><div><span style="font-size: 14px;"><span class="Apple-tab-span" style="white-space:pre">        </span>- C.new$tokenstream <- C.old</span></div><div><span style="font-size: 14px;"><span class="Apple-tab-span" style="white-space:pre">        </span>- cpos_max_min <- function(x) list(cpos_left = min(x[["cpos"]]), cpos_right = max(x[["cpos"]]))</span></div><div><span style="font-size: 14px;"><span class="Apple-tab-span" style="white-space:pre">        </span>- C.new$metadata <- C.new$tokenstream[, cpos_max_min(.SD), by = text_id]</span></div><div><div><span style="font-size: 14px;"><span class="Apple-tab-span" style="white-space:pre">        </span>- C.new$tokenstream[, text_id := NULL]</span></div></div><div><span style="font-size: 14px;"><span class="Apple-tab-span" style="white-space:pre">        </span>- then I use C.new$encode(…)</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">While this works in principle, the resulting registry files for the s_attribute are different </span><span style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-size: 14px;">(see excerpts below)</span><span style="font-size: 14px;">, and I’m not sure yet whether this might create problems. More importantly, I am unclear how I could use this approach while also keeping the structuring of the corpus in sentences, including the annotation s_id, s_polarity, s_subjectivity.</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">Does anyone have any pointers as to how I could reencode a corpus in R that is more similar or even identical to what I decoded?</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">Best,</span></div><div><span style="font-size: 14px;">Thomas</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">The s_attributes part of the original registry file looks like this:</span></div><div><span style="font-size: 14px;"><br></span></div><div><div><span style="font-size: 14px;">##</span></div><div><span style="font-size: 14px;">## s-attributes (structural markup)</span></div><div><span style="font-size: 14px;">##</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;"># <corpus> ... </corpus></span></div><div><span style="font-size: 14px;">STRUCTURE corpus</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;"># <text id=".."> ... </text></span></div><div><span style="font-size: 14px;"># (no recursive embedding allowed)</span></div><div><span style="font-size: 14px;">STRUCTURE text</span></div><div><span style="font-size: 14px;">STRUCTURE text_id # [annotations]</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;"># <s id=".." polarity=".." subjectivity=".."> ... </s></span></div><div><span style="font-size: 14px;"># (no recursive embedding allowed)</span></div><div><span style="font-size: 14px;">STRUCTURE s</span></div><div><span style="font-size: 14px;">STRUCTURE s_id # [annotations]</span></div><div><span style="font-size: 14px;">STRUCTURE s_polarity # [annotations]</span></div><div><span style="font-size: 14px;">STRUCTURE s_subjectivity # [annotations]</span></div><div><span style="font-size: 14px;"><br></span></div></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">The registry file for C.new is simply: </span></div><div><span style="font-size: 14px;"><br></span></div><div><div><span style="font-size: 14px;">## s-attributes</span></div><div><span style="font-size: 14px;">##</span></div><div><span style="font-size: 14px;"><br></span></div><div><span style="font-size: 14px;">STRUCTURE text_id</span></div></div><div><span style="font-size: 14px;"><br></span></div><span style="font-size: 14px;"><br><br></span><div>
<meta charset="UTF-8"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;"><br class="Apple-interchange-newline">-------------------------------------------------------------------------------------</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Dr. Thomas C. Messerli</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Postdoctoral Teaching and Research Fellow (Oberassistent)</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Department of Languages and Literatures, Universität Basel</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Englisches Seminar</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Nadelberg 6</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">CH-4051 Basel</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;"><br></span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Office 15 </span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">+41 61 207 27 82</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;"><br></span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">http://www.thomasmesserli.org<br>thomas.messerli@unibas.ch</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;"><br></span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;"><br></span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;"><br></span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Recent publications:</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Dayter, Daria, Locher, Miriam, A. & Messerli, Thomas C. (2023). <a href="https://www.cambridge.org/core/elements/pragmatics-in-translation/2253C3F6A17EEC4A08297B137450D402">Pragmatics in Translation</a>. Cambridge University Press.</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Landert, Daniela, Dayter, Daria, Messerli, Thomas C., & Locher, Miriam A. (2023). <a href="https://www.cambridge.org/core/elements/corpus-pragmatics/30FE00EAA8BC1F9C3191B390AB4B0040">Corpus Pragmatics</a>. Cambridge University Press.<br>Locher, Miriam. A, Jucker, Andreas H., Landert, Daniela, & Messerli, Thomas C. (2023). <a href="https://www.cambridge.org/core/elements/fiction-and-pragmatics/D198C6EEF1402A67B259E53221B1CD16">Fiction and Pragmatics</a>. Cambridge University Press.</span></div><div dir="auto" style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 0); font-family: Helvetica; font-style: normal; font-variant-caps: normal; font-weight: normal; letter-spacing: normal; text-align: start; text-indent: 0px; text-transform: none; white-space: normal; word-spacing: 0px; -webkit-text-stroke-width: 0px; text-decoration: none; overflow-wrap: break-word; -webkit-nbsp-mode: space; line-break: after-white-space;"><span style="font-size: 14px;">Locher, Miriam A., & Messerli, Thomas C. (2023). <a href="https://www.sciencedirect.com/science/article/pii/S2211695823000193">“This is not the place to bother people about BTS”</a>: Pseudo-synchronicity and interaction in timed comments by Hallyu fans on the video streaming platform Viki Discourse, Context & Media, 52. https://doi.org/10.1016/j.dcm.2023.100686</span></div></div></div></div></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"></span></div><span style="font-size: 14px;"><br class="Apple-interchange-newline"><br class="Apple-interchange-newline">
</span></div>
<br></body></html>