Google is building a software program that will conduct searches of public databases on the Web to try to ascertain their contents. The goal behind this move is to index and make available information that is not currently available - like flight schedules and fares, to use an example from the CNet article. This development raises two important questions for consideration. First, are there any legal issues for Google to conduct data mining from public databases? Second, who will pay for the bandwidth and CPU charges for Google's activities?
On the first question, it remains to be seen whether anyone will object on legal grounds to the searches. Google can certainly provide a way for companies to opt out of the searches using standard robot/user agent techniques currently employed to manage search engine crawlers, which may make the legal issues moot.
On the second question, there is a very real prospect that Google will add significant traffic to a site's search system, potentially costing the company maintaining the site both in bandwidth and server charges. For sites hosted in a cloud environment, those costs could be precisely quantified. So who will pay for the additional traffic? If Google provides an opt out solution that companies can easily deploy, one could argue that any company that neglects to opt out of the searches is by inference allowing Google to conduct the searches and so agreeing to incur the costs associated with the searches.
On the other hand, one could argue that Google has an obligation to proactively notify companies if it plans to change the way it indexes their systems in a way that may force them to incur additional costs, which effectively takes us back to the first question of legal issues.
In the bigger picture, Google's move is just a first step in what will inevitably industry attempts to better expose and share data buried in databases around the world. Though the Semantic Web has so far failed to attract a huge following, we can reasonably expect that either it or some other technology will take hold and begin to shape the next generation of knowedge sharing on the Internet.