Wednesday, 21 April 2010

GeoCoding with Kettle : new plugin

Hi all,
I created a plugin for geocoding addresses into Kettle v3.5. This plugin is using the google maps API. You can learn more about this API HERE.

What is Geocoding ?

According to wikipedia, geocoding is “the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes”. Normalization is the process to clean an input address and putit into a normalized, standardized format.
Reverse geocoding is the opposite : finding a complete address from GPS coordinates.
Raised relief map … a basic tool for geocoding.

The plugin

For the moment, it is a basic V1 release, but fully working. A lot more features are about to be added (advanced geocoding).
Here is the plugin screen, in Kettle. This is a basic screen as you can see. You need to enter the following :
  • GMapKey : your google map key. The geocoding works without it … well for me. But I recommend you to sign on for the API and use your Google key.
  • Input Address Field : the address field, from the incoming rows, on which you want to process the geocoding
  • Normalized address : give the column name in which the normalized address will be stored.
  • City Field : give the column name in which the city name will be stored.
  • GPS Coord Fields : give the column name in which the GPS Coordinates address will be stored.
Here is the main Kettle screen with a transformation sample.

Let’s see how it works

For the example above, I used 4 row creation steps to create 4 types of addresses (French, USA, Asia, Africa). Here is the output : a code, a raw adress (with typos and disorder) and a comment.
Let’s imagine now we want to normalize the Raw address content field and retrieve the corresponding GPS coordinates for each address. Let’s do it, we set up the plugin screen with the following informations : your GMap key, the “Raw address” input field and the names for the normalized address, the city field and the GPS coords.
Now we can plug everything and start the transformation. The plugin is asking for geocoding to the Google map API for each address. You will find the result set as follows :
The original fields are still here (Code, Raw Adress and Comment), but the plugin added 3 more fields according the names you set up previously (Norm_address, City and GPS_Coord). As you can see, the adress is normalized and formated, thanks to Google map API. The GPS coords are : lat / lng.


After some readings, I noticed you can ask for geocoding up to 15.000 time per day. This is a limitation of the Google map API. I didn’t try to go above 15.000 addresses / geocoding demands. I let you check this (create 15001 lines in the row creation steps …).

I want it

No problem. You can download the plugin HERE (plugin, xml file and icon) and test it. Like usual, everything is packed into a single jar using fatjar.

What’s next ?

This is a basic geocoding process. I’m currently working on something more powerfull, with more features : using all the API attributes, give ability to the user to choose which attributes he wants / doesn’t want, reverse geocoding … etc …
Please, if this plugin is usefull for you, tell me more about your needs. I will be happy to upgrade this plugin for your usage.


Anonymous said...

Unable to give you a heart. so have a reply to push up your post. ........................................

enricbiosca said...

Congrats for your job! I think it's a good idea. I will try in some days and I promis feedback

Roland Bouman said...

Hi Vincent!

great stuff...but I think this is not compatible with the terms of use of Google's API.

Look at parts 9 and 10 of http://code.google.com/intl/fr/apis/maps/terms.html - this is under the "terms of use" link on the API page you linked to. Basically, these terms of use say you can only use the API to display maps on a publicly available free-of-charge website.

I think the geonames service (http://www.geonames.org/) provides geodata under a less restrictive license, but you should probably check that out in detail - I am not a lawyer.

Vincent Teyssier said...

Hi Roland,
Thanks for your message.
You are right with the terms of use. I sent an email to google to ask if I'm definitely out of the bounds (still no reply). If so, I will switch to another geocoder.
I'm currently in a big webmarketing company using 100% NET framework, and I noticed a lot of people is using the google api that way (bouhhhhhh).

Sylvain said...

Vincent, pour info dans le cadre de nos projets et en ce qui concerne la licence, voici ce qui conditionne l'utilisation.
Cas1. Ton site web est grand public => tu as le droit d'utiliser GoogleMaps
Cas2. Ton site web est soumis à authentification (login/pwd): tu dois payer une licence

Mais dans le cas précis, c'est un peu plus subtile, car on invoque l'api de google pour récupérer des datas.
Suis curieux (et impatient) de voir la réponse de Google


Sylvain said...

Sinon ton plugin fonctionne à merveille, j'ai fait quelques tests sur des adresses persos, ça géocode super bien (l'algo de Google à l'air plutôt pas mal...)


Vincent Teyssier said...

Merci Sylvain !
J'ai publié mon message sur le google group dédié à google maps. J'attends la réponse. A mon avis, je suis peut être l'un des premiers à l'utiliser au sein d'un ETL ...
Wait n see ...

Thanks Sylvain !
I sent my message to google groups (the one dedicated to google maps devs) and I'm still waiting for an answer. Maybe I'm the first guy to use this API in conjonction with an ETL tool...
Wait n see ...

Juan Pablo Serra said...

Great work Vincent!!
it is very helpful and works exelent to me.

Just a little thing, you think would be difficult to return the country just like you return the city?

i was thinking that could be handy for validating returned latitude/longitude


Anonymous said...

Il semble que vous soyez un expert dans ce domaine, vos remarques sont tres interessantes, merci.

- Daniel