Theft of Credit Cards Highlights Challenges of Large Databases

The recent disclosure of a data theft of massive proportions - 130 million credit cards - reveals the challenges insitutions face securing large stores of sensitve data in the face of sophisticated hackers determined to breach secured systems. The attacks demonstrate the weaknesses of traditional approaches to application and data security. 

Who Owns Your Data?

News broke today that Amazon remotely erased copies of some George Orwell books from Kindle devices. Putting aside the irony of Orwell's books being erased, the question we must all ask ourselves is what the ramifications are of putting our data on devices and systems controlled by others. In our increasingly connected world of always on, wired and wireless access to everything from our books to our pesonal information, we should all ask a very central philosphical question - who owns our data?

Whether the issue is Amazon deleting a book you paid for on a device you own, government workers improperly accessing private records, private industry workers improperly accessing records of public figurescompanies attempting to assert ownership over data you place in their hands, or hackers stealing data from public and private databases, the twin issues of data ownership and security have become central themes in an emerging threat to the success of the Internet as a trusted medium. We believe that users own their data, and we are working on an exciting new product set that will address these fundamental issues in ways that put data ownership and security in the hands of users. 

MySQL Workbench for Linux

Since I use Ubuntu Linux on my desktop for day to day development, I am always looking for good Linux desktop software technical tools to add to my toolkit. One of the categories that has lagged behind Windows is in database modeling and management. There are some freeware tools, but nothing that provides the features I need.

Fortunately, MySQL has released the MySQL Workbench for Windows, MacOS, and Linux.  There are community (read: free) and commercial editions of the product available. The community edition provides reverse and forward engineering of a model to a MySQL database and has a "sync with model" feature that is great. I have just finished porting a model of the Colony application platform into MySQL Workbench. Kudos to the MySQL team for a good release.

Google's Mission to Penetrate the Deep Web

Google is building a software program that will conduct searches of public databases on the Web to try to ascertain their contents. The goal behind this move is to index and make available information that is not currently available - like flight schedules and fares, to use an example from the CNet article. This development raises two important questions for consideration. First, are there any legal issues for Google to conduct data mining from public databases? Second, who will pay for the bandwidth and CPU charges for Google's activities?

On the first question, it remains to be seen whether anyone will object on legal grounds to the searches. Google can certainly provide a way for companies to opt out of the searches using standard robot/user agent techniques currently employed to manage search engine crawlers, which may make the legal issues moot. 

On the second question,  there is a very real prospect that Google will add significant traffic to a site's search system, potentially costing the company maintaining the site both in bandwidth and server charges. For sites hosted in a cloud environment, those costs could be precisely quantified. So who will pay for the additional traffic? If Google provides an opt out solution that companies can easily deploy, one could argue that any company that neglects to opt out of the searches is by inference allowing Google to conduct the searches and so agreeing to incur the costs associated with the searches.

On the other hand, one could argue that Google has an obligation to proactively notify companies if it plans to change the way it indexes their systems in a way that may force them to incur additional costs, which effectively takes us back to the first question of legal issues. 

In the bigger picture, Google's move is just a first step in what will inevitably industry attempts to better expose and share data buried in databases around the world.  Though the Semantic Web has so far failed to attract a huge following, we can reasonably expect that either it or some other technology will take hold and begin to shape the next generation of knowedge sharing on the Internet.

Restoring a large dataset in MySQL

Working with large datasets in MySQL, I have been experimenting with different methods of restoring a large database backup. If the db backup file has already been produced, you can go to a MySQL command shell and use these commands to restore the database:

mysql> SET FOREIGN_KEY_CHECKS = 0;
mysql> SOURCE dump_file_name
mysql> SET FOREIGN_KEY_CHECKS = 1;
mysql> COMMIT;

 This tip and more about backup and restore of MySQL can be found at the SpikeSource blog.

BlogCFC was created by Raymond Camden. This blog is running version 5.8.001.