[Sigwac] Legal issues with crawling
Cyrus Shaoul
cyrus.shaoul at ualberta.ca
Thu Aug 31 18:39:11 CEST 2006
>
>
> Can anyone offer any advice on this issue?
>
Andy,
I have been looking into this issue, and I am completely confounded by
it. The only idea that I have right now
for a scalable way to find content that is unencumbered is to use the CC
search engines. see: http://search.creativecommons.org/
Yahoo and Google both offer it. Yahoo even lets you do a : "Creative
Commons Search" with the restrictions:
"Find content I can use for commercial purposes." and "Find content I
can modify, adapt, or build upon.".
If the WAC crawlers use these search engines through their APIs as a
source of seeds, and then confirms the CC license RDF data on the web
page, there should be little
fear of later reprisals from web site publishers. (But I am no lawyer.)
The question is: what percent of web pages have a CC license at all? Is
that enough for SIGWAC's purposes?
That, I don't know.
Yours,
Cyrus
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
Cyrus Shaoul
http://www.psych.ualberta.ca/~westburylab/
University of Alberta
780-492-5843
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
More information about the Sigwac
mailing list