[Sigwac] Legal issues with crawling

Thu Aug 31 18:39:11 CEST 2006

>
>
> Can anyone offer any advice on this issue?
>
Andy,

I have been looking into this issue, and I am completely confounded by 
it. The only idea that I have right now
for a scalable way to find content that is unencumbered is to use the CC 
search engines. see: http://search.creativecommons.org/

Yahoo and Google both offer it. Yahoo even lets you do a : "Creative 
Commons Search" with the restrictions:
"Find content I can use for commercial purposes." and "Find content I 
can modify, adapt, or build upon.".

If the WAC crawlers use these search engines through their APIs as a 
source of seeds, and then confirms the CC license RDF data on the web 
page, there should be little
fear of later reprisals from web site publishers. (But I am no lawyer.)

The question is: what percent of  web pages have a CC license at all? Is 
that enough for SIGWAC's purposes?
That, I don't know.

Yours,

Cyrus

=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}
Cyrus Shaoul
http://www.psych.ualberta.ca/~westburylab/
University of Alberta
780-492-5843
=[=]={=}=[=]={=}=[=]={=}=[=]={=}=[=]={=}