Wednesday, December 1, 2010
Freebase Gridworks hits 2.0, earns the new name Google Refine
Google Refine, the revamp of one of Google's summer acquisitions Freebase Gridworks, reached version 2.0 last month, adding an extension backend to the already powerful open source database cleaning software.
Freebase Gridworks gained a significant following within the data journalism community due to ease of use, particularly with the implementation of algorithms.
Of course, "ease of use" is very much a relative term. If you have no idea transforming regular expressions means or what n-gram clustering is, well, this isn't going to do much for you. But for journalists and programmers whose job is to make data friendlier on a daily basis, Google Refine is a godsend.
Still wondering what exactly it does? Through the use of algorithms designed to analyze text for similarities/discrepancies, Google Refine can help standardize a set of data that is mostly similar, but is off just enough to be frustrating to deal with.
These issues can be as simple as a collection of event summaries spelt with different arrangements of uppercase/lowercase (such as "Basketball game; BASKETBALL GAME; basketball game; BasKetBalL GAmE"). If you have ever had the pleasure of looking through a data log created by multiple authors who had no organized standardization agreement, you likely have came across an issue like this one.
Unfortunately, Google Refine does not do much in terms of "holding a new user's hand," but again, the services it provides are still primarily aimed to help a niche audience. I did have an issue a few times with Refine making my Macbook Pro uncomfortably hot during usage, and quitting the program typically required me to use force quit.
But as with all things Google and beta, it is bound to get more stable. Eventually.
If you are interested in learning more about Google Refine, visit the Refine Google Code page, or, jump right in with the 3-part YouTube introduction series created by the Google Refine team.