[CWB] List all the values of a specific attribute, is it possible?

David Lukeš david.lukes at ff.cuni.cz
Thu Nov 30 13:56:40 CET 2017


Should you need to list them based on the vertical file, you can use grep:

     grep -oP '(?<=title=").*?(?=")' vss.vrt

David

On 11/30/2017 01:49 PM, Martin Hammarstedt wrote:
>
> Hi,
>
> You can use the cwb-scan-corpus tool, like this:
>
>     cwb-scan-corpus CORPUS story_title
>
> Best regards,
> Martin
>
>
> On 2017-11-30 13:38, Hugo SG wrote:
>> Dear all,
>>
>> I would like to know if there is a way to list all the values of a 
>> specific attribute.
>>
>> I mean, using one example of the Corpus Encoding Tutorial (Version 
>> 3.4) [0] 
>> <http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial.pdf>, is 
>> there any way to list all the values of the *title *attribute of the 
>> corpus showed in vss.vrt file (page 6) ?
>>
>> /
>> /
>> /<?xml version="1.0" encoding="ISO-8859-1" standalone="yes" ?>
>> /
>> /
>> /
>> /<!-- A Thrilling Experience -->/
>> /
>> /
>> /<story num="4"*title="A Thrilling Experience"*>
>> <p>
>> <s>
>> Tick NN tick
>> . SENT .
>> </s>
>> <s>
>> A DT a
>> clock NN clock
>> . SENT .
>> </s>
>> <s>
>> Tick VB tick
>> , , ,
>> tick VB tick
>> . SENT .
>> </s>
>> </p>
>> ...
>> </story>
>> /
>>
>>
>> I would like to know it because I need to check which documents are 
>> already present in my corpus. Identifiers of the documents are 
>> encoded as an attrinute.
>>
>> I understand that maybe it is not possible because I should know that 
>> before, but just in case.
>>
>> Thank you in advance.
>>
>> Best regards,
>> Hugo
>>
>> [0] : http://cwb.sourceforge.net/files/CWB_Encoding_Tutorial.pdf
>>
>>
>> -- 
>> Hugo Sanjurjo González
>> Personal Investigador en Formación
>> Área de Ingeniería de Sistemas y Automática
>> Dep. Ingeniería Eléctrica y de Sistemas y Automática
>>
>> Facultad de Filosofía y Letras - Dep. Filología Moderna - Despacho 320
>> Campus de Vegazana s/n 24071
>> Universidad de León
>> León
>>
>> Tel._+34 987 291088_
>>
>>
>> _______________________________________________
>> CWB mailing list
>> CWB at sslmit.unibo.it
>> http://liste.sslmit.unibo.it/mailman/listinfo/cwb
>
>
>
> _______________________________________________
> CWB mailing list
> CWB at sslmit.unibo.it
> http://liste.sslmit.unibo.it/mailman/listinfo/cwb

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://liste.sslmit.unibo.it/pipermail/cwb/attachments/20171130/39649c8d/attachment-0001.html>


More information about the CWB mailing list