[CWB] xml files

Ingrid Sör ingrid.e.sor at gmail.com
Wed Dec 17 11:25:14 CET 2014


Hi,

I hope this is the right forum for my following questions..
I am trying to get frequency data of Swedish nouns from certain corpora in
the Swedish "Språkbanken". They have their files available for download in
xml format, so I am now trying to make them usable with CWB. I read in the
CWB encoding tutorial that the files need to be in .vrt-format to encode
them and that this can be done easily via XSLT.

Is this the best way to go about things? I am not familiar with XSLT really
and I think it will take some time to learn how to do it on my own, so if
XSLT is the solution I would be very grateful if anyone might have a
"standard" xslt code for me to adapt. Or if there is any other way? I have
been using *sed *in my ubuntu terminal to get each tag or word onto a new
line, but this seems a complicated way to also make the p-attributes
tab-separated (as they are now inside <w> tags).

Sorry if I am probably asking about rudimentary things now - I am very new
to CWB and corpus work. Thanks for any help!
Best regards,
Ingrid
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://devel.sslmit.unibo.it/pipermail/cwb/attachments/20141217/5560f0a8/attachment.html>


More information about the CWB mailing list