User interfaces, WaC Tk (Re: [Sigwac] Re: bootcats / large crawls)

Niels Ott niels at drni.de
Mon Aug 28 13:57:02 CEST 2006


-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Andy Roberts wrote:
> I think one of the key issues is improving the degree of choice that's
> available to people in the WAC domain. I'm the first to admit that
> jBootCat is the least functional version available at the moment,
> although I hope that will improve in the near future. But even in this
> case, I, like Marco (I guess), am quite fond of the low-level, more
> technical approach to many tasks - I'm just drawn to *making*
> front-ends!

In my opinion, the choice of the user interface depends on the addressed
users. It would be probably a bad idea to have a concordancer entirely
in a console window with vi-style commands.

But building a corpus from the web to me isn't a task that anybone does
seriously in those 5 minutes before lunch break. So users will probably
find that they need to learn about the process of building their corpus
anyways. The effort going into learning command line or configuration
file options will turn out to be a minor one.

For the Web as Corpus Toolkit (http://www.drni.de/wac-tk/ ), the only
option would be to have a graphical configuration wizard.

Btw: It is likely that there will be a more convenient release later
this year, the toolkit currently is being developed further as Lexical
Computing is using it for some projects.
Features will include the processing of arbitrary (text) data, not only
files downloaded by ParaGet. We plan to enable safe processing of XML
tags and entities. The new tokenizer already can do it.

Best,

  Niels


- --
Me & Myself & All The Rest: http://www.drni.de/
"If you were happy every day of your life, you wouldn't be human. You'd
be a game show host." (Winona Ryder)
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.2.2 (GNU/Linux)

iD8DBQFE8toObosnVosUgx0RArVpAJ9cAXvQRRSxmhspXcIEW1l+fs0CYwCfdLhy
lcSqwCinlIwgBtxyp9/9uHo=
=RALB
-----END PGP SIGNATURE-----


More information about the Sigwac mailing list