Hi all,
Today, I had to create a currency repository for one of my client in financial services. Easy with Informatica connected to one of the real time financial interfaces (Bloomberg, Reuters, etc …). The challenge was to gather data for all major currencies and store the Euro exchange rates over time. Easy I as said. It was done in no time.
Then, on coffee time, I thought : “How to do that with Kettle, with a complete free approach ?”. Here again, easy. Let me explain.
The data sources
From 2003 to 2005, I worked for the very official French National Bank (Banque de France - BDF) and the European Central Bank (ECB). Since that time, I still have a lot of links and data sources about economics and stats. For Euro exchange rates, we have 2 possible data sources which are FREE of access and usage :
- RSS streams : broadcasted daily, at 14:15 (not a minute more), these RSS are easily accessible and you can leverage them to build your own exchange rate repository. Some data transformation are needed but it is very simple. These RSS streams offer 5 days of historical data. They are available on the ECB website HERE.
- XML file : like the RSS Stream, a daily XML file is available. It contains only data for the current day. Here again, you can easily load it with Kettle and the XML step with the proper parameters and configuration. The XML file is available on the ECB website HERE.
Let’s see everything in detail.
Solution Nb1 : RSS streams from European Central Bank.
If you go on the RSS page (HERE), you will see a lot of streams available for every currency on the market.
The RSS stream is easy to read and understand. Let’s click on the first one : US Dollar. As you can see, we have the currency exchange rate with the Euro and a date. We will need to do some parsing here.
Okay, we have now the RSS links and we just had a quick overview of its internal structure, now time to go playing with Kettle. First we put a “RSS Reader” on the workbench and we set it up. As you can see, I added each RSS link corresponding to each currency. No possibility to use a single RSS stream for all currencies (I will investigate this point).
On the second tab, nothing to do, just be sure you have a 0 in the field “Max number”.
On the third tab, Fields, we want to choose only 2 fields : “Date de publication” (exchange rate timestamp) and “Titre” (the string holding the exchange rate for the Euro). Let’s use String as datatype. Below is the Field tab.
If you hit the preview button, you will see the RSS stream popping out on your screen like this. Cool.
Okay, now we have to process our data in order to feed a table, somewhere on your datawarehouse or your application. Let’s have a look at the transformation I did in 3 mins for that purpose.
You can see the RSS Reader on the left, no need to go futher on this one. Then some other steps :
Field split : the field Titre, as we saw it with the RSS Reader, is a long string containing all what we need : the exchange rate and the international currency code (USD, EUR, CHF …). With this step, I split the Titre field into two new fields that will hold the exchange rate (Cours) and the currency code (Devise). The delimter is a space, that’s why you can’t see it on the screeshot below :
RATES data input : We feed the target database / table. Take care to manage your historical data here, remember the RSS stream is sending 5 days of historical data each day.
Keep single currency : That’s the second part of the transformation. Here, we need to build a table with the couples : currency code / currency name. Remember we only have the currency code (USD …) and it would be nice to build a tiny dimension with the real name of the country and the currency. This step will only keep distinct values.
Map ECB currency code / currency name : the previous distinct values will be mapped with the currency real and long names. Look at the configuration screen below, a new field is created (Devise) and you can copy paste the currency names from the ECB page where we grabbed the RSS links. The currency codes are international standards, no risk to see them changing one morning.
CURRENCIES data input : Final step, we feed the currency dimension with the couples code / currency name. Since it is a typical short dimension and new currencies are not frequent, you can update this dimension once in a while, or when a new RSS is added …
If we make a quick extract of the data we created, the output will look like this for the rates (left) and the currencies (right).
Solution Nb2 : XML file from the European Central Bank.
Well, doing the same with an XML input is possible and easy too. First, we need to find the appropriate XML file. This one can be found HERE. It is quite simple and interesting data is the timestamp, the currency and the rate. Remember, this XML file only contain data for one day.
This XML file is available everyday at 14:15, so you can schedule your job to run in order to gather the lastest data.
Let’s process this file now. For this, we will create a transformation looking like this one below. We will also create a currency dimension table, like we did for the previous example.
XML Extract : this is the core component of this transformation. It can read an XML file, parse it, process it, based on your Xpath query. Let’s have a look about configuration.
The link to the XML file has to be written in the main tab. The second tab is more sensitive : here we have to specify a XPath for the document. In our case, the XPath must be : /gesmes:Envelope/*[name()='Cube']/*[name()='Cube']/*[name()='Cube']
The last tab (Fields) is also very important : we need to indicate the XPath attributes in order to reach the elements we need. In our case, @currency, @rate and @time are mandatory.
RATES : the XML file is directly written into a target database / table. No need for custom transformation here. The target data looks like the previous example.
Map ECB currency code / country : we still need to create a tiny dimension with the couples :currency code / currency full name. Same process as the previous example. Luckily, the currency codes are a worldwide standard and we can re use the step from the previous transformation.
CURRENCIES : finally, the currency dimension is written into its target database / table.
The rates are on the left while the currencies – inchanged – are on the right. Remember : only one day of data is available with the XML file. You will maybe notice that the date format is different compared to the RSS data (yyyy-MM-dd versus dd/MM/yy) but this is something you can easily manage if necessary.
Well, this was a quick and handy way to reach some official currency data and process it in Kettle. Of course, you can easily customize and optimize these jobs.
If you have troubles running theses examples, feel free to reach me and I will provide you with the transformation files.

2 commentaires:
Excellent post! thanks for sharing this :)
You r welcome Roland !!!
I'm working on S3 interactivity now : files, stats ...
Godness, nites are too short.
Post a Comment