<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-1890171231785089767</id><updated>2012-01-29T10:51:01.213-08:00</updated><category term='Kettle'/><category term='Data Management'/><category term='Introduction'/><category term='Talend'/><category term='Readings'/><category term='Marketae'/><category term='Cloud Computing'/><category term='ColorMyTail tail unix linux NET log'/><category term='Postgresql'/><category term='JRubik'/><category term='Music'/><category term='Geocoding'/><category term='Mondrian'/><category term='Storage'/><category term='JasperSoft'/><category term='BOBatchConverter'/><title type='text'>Open BI</title><subtitle type='html'>Open BI is dedicated to Open Source Business Intelligence tools.</subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>85</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-775356795816701280</id><published>2011-07-26T07:09:00.001-07:00</published><updated>2011-07-27T01:30:33.571-07:00</updated><title type='text'>Query Twitter with Talend to see what people think about …</title><content type='html'>&lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;Almost on holidays after a very hard working year. I have some beach time.&lt;/p&gt;  &lt;p&gt;This morning, I tried to query &lt;a href="http://www.twitter.com" target="_blank"&gt;Twitter&lt;/a&gt; and to process the data. My purpose is to quickly build a data set showing what people are talking about … let’s say, &lt;strong&gt;Obama&lt;/strong&gt;. Easy, Twitter is providing an interface to run queries and retrieve the results back with json format.&lt;/p&gt;  &lt;p&gt;Well, my proposition is to implement a basic &lt;strong&gt;word frequency analysis&lt;/strong&gt;. My tool is Talend, but this process is also easy to set up using java, php, python, shell, Kettle …&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-Ud5LvpZL1m4/Ti7Kp_EEHiI/AAAAAAAAA1c/sVLPJ4P0AS8/s1600-h/image%25255B4%25255D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/-vEW4hrk3DUU/Ti7KqXfkp4I/AAAAAAAAA1g/7GwPf0HURHs/image_thumb%25255B2%25255D.png?imgmax=800" width="648" height="262" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;Here are some details : &lt;/strong&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;strong&gt;fileInputJSON &lt;/strong&gt;: just read the json from Twitter search engine. The syntax is : &lt;a href="http://search.twitter.com/search.json?&amp;amp;q=YOUR_QUERY_HERE&amp;amp;rpp=10000"&gt;http://search.twitter.com/search.json?&amp;amp;q=YOUR_QUERY_HERE&amp;amp;rpp=10000&lt;/a&gt;, where YOUR_QUERY_HERE = the key word you want to search for. Don’t forget to distribute the answer into a string column :       &lt;ul&gt;       &lt;li&gt;&lt;strong&gt;answer ====&amp;gt; “$..text”&lt;/strong&gt; : where “$..text” stands for the tweet message you want to read. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;tmap&lt;/strong&gt; : just to transform the text into lower case. I tried to do it on the fly in the json step, but it does not work. &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;tnormalize&lt;/strong&gt; : transform all words from anwer into rows (separator is space).&lt;/li&gt;    &lt;li&gt;&lt;strong&gt;tfilter&lt;/strong&gt; : filter your data in order to avoid “|” (pipes) and other special chars + get rid of words having less than 4 letters. &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;taggregate&lt;/strong&gt; : aggregate on the words and add a new column named &lt;strong&gt;nb&lt;/strong&gt; that will store the distinct count of all words. &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;tsortrow&lt;/strong&gt; : now you have &lt;strong&gt;words and counts&lt;/strong&gt;, just sort the data from higher counts to lower (desc). &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;tsamplerow&lt;/strong&gt; : as we just want to read the &lt;strong&gt;first top 20 words&lt;/strong&gt;, we create a sample based on the sorted rows with range “1..20”. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Finally, I print everything into the console by using a tlogrow + “:” separator.&lt;/p&gt;  &lt;p&gt;Here are the data for the keyword “obama”, the query was done on July 26 at 16h00.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/-YrR2LOKNLwk/Ti7Kqw-k6kI/AAAAAAAAA1k/dOszCOZLW8Q/s1600-h/image%25255B12%25255D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-krH1cE8LTa4/Ti7Kr5P12uI/AAAAAAAAA1o/QHLy0PEVCPA/image_thumb%25255B6%25255D.png?imgmax=800" width="177" height="334" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Finally, I will use &lt;a href="http://www.wordle.net/contact" target="_blank"&gt;Jonathan Feinberg&lt;/a&gt;, &lt;a href="http://www.wordle.net/" target="_blank"&gt;wordle&lt;/a&gt;, to create a nice and sexy word map with the above results.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/-692Lm_gfK-Y/Ti7KslqK5EI/AAAAAAAAA1s/_cVl90W3-Vs/s1600-h/image%25255B17%25255D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/-0Ob4rvUb51I/Ti7KtSgyAZI/AAAAAAAAA1w/J659ARIKcXI/image_thumb%25255B9%25255D.png?imgmax=800" width="485" height="315" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Setting up the process and testing it only took 20 minutes. Of course, this can be improved a lot by adding string cleaning, custom data filtering… or better : &lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Capture who created the tweet and who received it : you can create relationship networks with, for instance, &lt;a href="http://networkx.lanl.gov/" target="_blank"&gt;networkx&lt;/a&gt;. &lt;/li&gt;    &lt;li&gt;Send the data to a powerfull text analytics framework like &lt;a href="http://www.nltk.org/" target="_blank"&gt;NLTK&lt;/a&gt;, for better analytics. &lt;/li&gt;    &lt;li&gt;Create the word map picture into Talend. I have some source code for this, will have to work on it … &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;I ll see how to add one of these features soon.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-775356795816701280?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/775356795816701280/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=775356795816701280' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/775356795816701280'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/775356795816701280'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2011/07/query-twitter-with-talend-to-see-what.html' title='Query Twitter with Talend to see what people think about …'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/-vEW4hrk3DUU/Ti7KqXfkp4I/AAAAAAAAA1g/7GwPf0HURHs/s72-c/image_thumb%25255B2%25255D.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3412418302734972141</id><published>2011-07-19T08:59:00.001-07:00</published><updated>2011-07-19T09:13:07.555-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Talend'/><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Interfacing Talend with Amazon SDB (AWS SDB) – quick way</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;I had the following challenge : &lt;b&gt;read some ftp account informations&lt;/b&gt; (ftp server, username, password, target directory)&lt;b&gt; stored in &lt;a href="http://aws.amazon.com/" target="_blank"&gt;Amazon&lt;/a&gt; &lt;a href="http://en.wikipedia.org/wiki/Amazon_SimpleDB" target="_blank"&gt;SDB&lt;/a&gt; and use it in a Talend transformation&lt;/b&gt;, published as a web service. You know about &lt;b&gt;SDB&lt;/b&gt; I hope. For those who don’t, &lt;b&gt;SDB&lt;/b&gt; is a key / value database provided by Amazon. So you can name SDB a &lt;a href="http://en.wikipedia.org/wiki/NoSQL" target="_blank"&gt;&lt;b&gt;noSQL&lt;/b&gt;&lt;/a&gt; database.&lt;br /&gt;I played with the SDB API from Amazon, and succeeded after some coding and “Talending”. Here is how I did. &lt;br /&gt;Below is only a small part of a much larger project, composed of a large webservice collection, created for my client.&lt;br /&gt;This project does the following : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Query into a database using dynamic params&lt;/b&gt; given by the user at run time from a &lt;b&gt;&lt;a href="http://www.adobe.com/products/flex/" target="_blank"&gt;Flex&lt;/a&gt;&lt;/b&gt; portal (a query engine like business objects !), &lt;/li&gt;&lt;li&gt;Return a &lt;b&gt;rowcount&lt;/b&gt; of the query, into the portal. I’m working in web marketing : counting people (segments) before creating a campaign is very important … &lt;/li&gt;&lt;li&gt;&lt;b&gt;Generate an extract of the data&lt;/b&gt;, process this extract according to the params given by the user (separator, encoding, spliting, zipping …), &lt;/li&gt;&lt;li&gt;&lt;b&gt;Send this data file to different “tubes”&lt;/b&gt; : router, ftp, &lt;a href="http://docs.amazonwebservices.com/AmazonS3/latest/gsg/" target="_blank"&gt;&lt;b&gt;AWS S3&lt;/b&gt;&lt;/a&gt;, local download … &lt;/li&gt;&lt;/ul&gt;Here we focus on the ftp sending part, using SDB to retrieve infos.&lt;br /&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;&lt;h3&gt;The process.&lt;/h3&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;&lt;a href="http://lh4.ggpht.com/-Pro1Q5Dd7T4/TiWp7HjKgQI/AAAAAAAAA0g/wdTa3tT7cWc/s1600-h/image14.png"&gt;&lt;img alt="image" border="0" height="109" src="http://lh5.ggpht.com/-temD8-idRpo/TiWp7g4Fz8I/AAAAAAAAA0k/fbjkWp7DYTw/image_thumb6.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="552" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;The job (partial).&lt;/h3&gt;&lt;a href="http://lh6.ggpht.com/-i-V8-wSYTII/TiWp7yepnjI/AAAAAAAAA0o/DAkt5svNjHc/s1600-h/image3.png"&gt;&lt;img alt="image" border="0" height="337" src="http://lh6.ggpht.com/-MEelhrSNNMQ/TiWp8RLHbyI/AAAAAAAAA0s/3Dq7T9wFM1o/image_thumb1.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="750" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;&lt;h3&gt;Data structure in AWS SDB.&lt;/h3&gt;I’m using a very nice firefox plugin in order to have easy and quick access to my SDB ecosystem : &lt;a href="http://code.google.com/p/sdbtool/" target="_blank"&gt;sdbtool&lt;/a&gt;. My data structure is simple ( “dd” is of course not the true value …) : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Item : ftp      &lt;ul&gt;&lt;li&gt;Attribute names : Address :          &lt;ul&gt;&lt;ul&gt;&lt;li&gt;Attribute value : dd &lt;/li&gt;&lt;/ul&gt;&lt;li&gt;Attribute names :Login              &lt;ul&gt;&lt;li&gt;Attribute value : dd &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Attribute names : PKey              &lt;ul&gt;&lt;li&gt;Attribute value : dd &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Attribute names : Password              &lt;ul&gt;&lt;li&gt;Attribute value : dd &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;Attribute names : Port              &lt;ul&gt;&lt;li&gt;Attribute value : 21 &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;Here is a screencap of my sdbtool view : &lt;br /&gt;&lt;a href="http://lh6.ggpht.com/-F_657uS0rCg/TiWp87Zh-OI/AAAAAAAAA0w/DeC-aGB9W8Y/s1600-h/image%25255B10%25255D.png"&gt;&lt;img alt="image" border="0" height="459" src="http://lh6.ggpht.com/-9YeY7mQEbzs/TiWp9ZuH2WI/AAAAAAAAA00/niUZjYTCW2Y/image_thumb%25255B5%25255D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="710" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h3&gt;Explanations.&lt;/h3&gt;First, we load all the needed libraries, using tlibraryload component.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;aws-java-sdk-1.0.14.jar &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;commons-codec-1.3.jar &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;commons-httpclient-3.0.1.jar &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;commons-logging-1.1.1.jar &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;jackson-core-asl-1.4.3.jar &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;stax-api-1.0.1.jar &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;stax-1.2.0.jar &lt;/b&gt;&lt;/li&gt;&lt;/ul&gt;For the aws-java-sdk-1.0.14.jar import, I had to write some imports. These imports are required to be able to use the aws jdk.&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/-ZdLA6-XuYNM/TiWp9gm-0ZI/AAAAAAAAA04/DqK3e8vw3y0/s1600-h/image%25255B5%25255D.png"&gt;&lt;img alt="image" border="0" height="301" src="http://lh3.ggpht.com/-mS7-UzRKXx0/TiWp-dRAP1I/AAAAAAAAA08/Gfj_nFW85z8/image_thumb%25255B2%25255D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="614" /&gt;&lt;/a&gt;&lt;br /&gt;Then we have a tRowGenerator in which I create the value for the variable myDomain, that will be used in SDB queries (see code below). You can avoid this step, I created it only for quick testing purpose.&lt;br /&gt;Then, we have to code a little tJavarow. This java code will : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;connect to AWS SDB. You must have an account for AWS SDB. &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;run several queries, using SQL, to retrieve ftp account informations :&lt;/b&gt;       &lt;ul&gt;&lt;li&gt;ftp server address &lt;/li&gt;&lt;li&gt;ftp server login &lt;/li&gt;&lt;li&gt;ftp server pass &lt;/li&gt;&lt;li&gt;ftp server port &lt;/li&gt;&lt;li&gt;ftp server pkey, if needed &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;b&gt;store the query results in output_row.[name] so they can be used in Talend process. &lt;/b&gt;&lt;/li&gt;&lt;/ul&gt;The code in the tJavaRow. First we create some credentials (use yours) and then create an endpoint with sdb address from AWS : &lt;a href="https://sdb.eu-west-1.amazonaws.com/"&gt;https://sdb.eu-west-1.amazonaws.com&lt;/a&gt;. Be carefull to set a valid endpoint, using a valid country zone.&lt;br /&gt;The code is finally simple : create a string containing your sql query, then call a getItems() function. An Item is sent back, simply call a getAttribute in order to retrieve the value you need.&lt;br /&gt;I chose, for simplicity, to run a different query for each item I need from SDB. Of course, you can write it shortly.&lt;br /&gt;&lt;pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: andale mono, lucida console, monaco, fixed, monospace; font-size: 12px; height: 955px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 96.36%;"&gt;&lt;code&gt;&lt;span style="font-size: xx-small;"&gt;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; BasicAWSCredentials credentials = new BasicAWSCredentials("KL45LKJ4325MLKJ2345", "LKJ45LKJmlkjdlkjGRhjKLJSFSDG432534");&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; final String[] FTP_Items;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; AmazonSimpleDB sdb = new AmazonSimpleDBClient(credentials);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; sdb.setEndpoint("&lt;/span&gt;&lt;a href="https://sdb.eu-west-1.amazonaws.com%22%29/;"&gt;&lt;span style="font-size: xx-small;"&gt;https://sdb.eu-west-1.amazonaws.com");&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; try {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; int i = 0;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; String myDomain = "Clients"; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; String selectExpression = "select FTP_Address from `" + myDomain + "`where code_client = '" + context.client_name +"'";&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; SelectRequest selectRequest = new SelectRequest(selectExpression);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Item item : sdb.select(selectRequest).getItems()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Attribute attribute : item.getAttributes()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.FTP_Address = attribute.getValue().toString();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selectExpression = "select FTP_Login from `" + myDomain + "`where code_client = '" + context.client_name +"'";&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selectRequest = new SelectRequest(selectExpression);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Item item : sdb.select(selectRequest).getItems()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Attribute attribute : item.getAttributes()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.FTP_Login = attribute.getValue().toString();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selectExpression = "select FTP_Pass from `" + myDomain + "`where code_client = '" + context.client_name +"'";&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selectRequest = new SelectRequest(selectExpression);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Item item : sdb.select(selectRequest).getItems()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Attribute attribute : item.getAttributes()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.FTP_Pass = attribute.getValue().toString();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selectExpression = "select FTP_Port from `" + myDomain + "`where code_client = '" + context.client_name +"'";&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selectRequest = new SelectRequest(selectExpression);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Item item : sdb.select(selectRequest).getItems()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Attribute attribute : item.getAttributes()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.FTP_Port = Integer.valueOf(attribute.getValue());&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selectExpression = "select FTP_PKey from `" + myDomain + "`where code_client = '" + context.client_name +"'";&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; selectRequest = new SelectRequest(selectExpression);&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Item item : sdb.select(selectRequest).getItems()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (Attribute attribute : item.getAttributes()) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; output_row.FTP_PKey = attribute.getValue().toString();&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; } catch (AmazonServiceException ase) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("AWSException");&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("ErrorMsg:&amp;nbsp;&amp;nbsp;&amp;nbsp; " + ase.getMessage());&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("HTTPStatcode: " + ase.getStatusCode());&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("AWS Errcode:&amp;nbsp;&amp;nbsp; " + ase.getErrorCode());&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("Errortype:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; " + ase.getErrorType());&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("RequestID:&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; " + ase.getRequestId());&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; } catch (AmazonClientException ace) {&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("AWSClientException");&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; System.out.println("Error Message: " + ace.getMessage());&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Final.&lt;/h3&gt;After retrieving all the item I need for sending on ftp (server, username, pass, port or location for ssh key), I store all this into global variables. Then, these global variables are used as arguments into two very customized scripts (needed in my case) that will send the files : simple ftp or sftp when needed. Finally, I catch some usefull infos from the custom ftp scripts, process it into a tmap and send this information into a tBufferOutput step. That way, I can provide a soap feed back when calling this webservice.&lt;br /&gt;&lt;br /&gt;This post is very consice, feel free to ask me for more infos about this process.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Links.&lt;/h3&gt;&lt;ul&gt;&lt;li&gt;&lt;a href="http://aws.amazon.com/fr/documentation/simpledb/" target="_blank"&gt;&lt;b&gt;Amazon SDB documentation&lt;/b&gt;&lt;/a&gt;&lt;b&gt; &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://aws.amazon.com/code?_encoding=UTF8&amp;amp;jiveRedirect=1" target="_blank"&gt;&lt;b&gt;Amazon SDB codes and samples&lt;/b&gt;&lt;/a&gt;&lt;b&gt; &lt;/b&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://code.google.com/p/sdbtool/" target="_blank"&gt;&lt;b&gt;Amazon SDB Tool (free)&lt;/b&gt;&lt;/a&gt;&lt;b&gt; &lt;/b&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3412418302734972141?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3412418302734972141/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3412418302734972141' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3412418302734972141'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3412418302734972141'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2011/07/interfacing-talend-with-amazon-sdb-aws.html' title='Interfacing Talend with Amazon SDB (AWS SDB) – quick way'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-temD8-idRpo/TiWp7g4Fz8I/AAAAAAAAA0k/fbjkWp7DYTw/s72-c/image_thumb6.png?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1922182763045732849</id><published>2011-06-21T09:00:00.001-07:00</published><updated>2011-07-19T07:21:33.937-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>… about AWS cloud, Talend, Jaspersoft, Postgresql and typical EC2 internal addressing issues …</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;I’m terribly late with this article, initially scheduled for January 2011 … sorry. Maybe it is a bit outdated now, anyway, I publish it …&lt;br /&gt;Let’s talk about EC2 cloud computing, Talend, Postgresql and JasperServer. Basic setup.&lt;br /&gt;You already know all the pros and cons with cloud computing, I won’t talk about that. As to me, I love cloud computing and use it everyday, because of these particular advantages : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Scalablity&lt;/b&gt; : scale up or down any instance, according to your needs, &lt;/li&gt;&lt;li&gt;&lt;b&gt;Flexibility &lt;/b&gt;: create your own instances, boot them, create quick sandboxes, replicate data … &lt;/li&gt;&lt;li&gt;&lt;b&gt;Pay per use&lt;/b&gt; : you pay for what you use (cpu, storage, security …),&lt;/li&gt;&lt;li&gt;&lt;b&gt;Opex, no capex !&lt;/b&gt;&lt;/li&gt;&lt;/ul&gt;Cloud computing is still something new, and it is not surprising to discover softwares that are not ready for it or not fully “cloud compliant”. I recently faced such an issue when implementing Postgresql, Talend and Jaspersoft, which remain my preferred open source BI tools.&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;First issue&lt;/h3&gt;Let’s imagine we have a single server, hosting Postgresql. No big deal with that as long as we use this instance in a simple way : I can start my instance, host data on a persistent EBS, connect to it and stop it whenever I want. By using elastic IPs, I can assign a “fixed” IP address to this server and can easily set up a connection string. &lt;span style="font-size: xx-small;"&gt;Note on 16/12/2010 : Amazon is now offering a DNS service.&lt;/span&gt;&lt;br /&gt;Now let’s imagine we need a typical BI architecture (tiers) : one &lt;b&gt;ETL&lt;/b&gt; (Talend or Pentaho of course !), a &lt;b&gt;Postgresql&lt;/b&gt; database in the middle and &lt;b&gt;Jaspersoft&lt;/b&gt; for reporting. &lt;br /&gt;That’s a bit more complex because &lt;b&gt;we need our Postgresql server to allow connections from the ETL and from the reporting tool&lt;/b&gt;. On top of that, we want to fully leverage all cloud computing features : &lt;b&gt;stop&lt;/b&gt; the servers when they are not used, &lt;b&gt;boot&lt;/b&gt; them when the service is needed, maybe change their network properties ... &lt;b&gt;eventually we want this to be fully automated and working without any human actions&lt;/b&gt; like changing the connection strings, starting/stopping the servers … &lt;br /&gt;Let’s have a look to a little schema now. As you can see, we have now our architecture up and running. We are also using &lt;b&gt;elastic IPs for each server&lt;/b&gt;, which is mandatory for the following demonstration. &lt;span style="font-size: xx-small;"&gt;IPs are fake.&lt;/span&gt;&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/-CKcdxdXM4Po/TgC__2ulDII/AAAAAAAAAyE/qsol-Dr0Yyo/s1600-h/image4.png"&gt;&lt;img alt="image" border="0" height="475" src="http://lh5.ggpht.com/-S9J2AxkH-J8/TgDAAUinxQI/AAAAAAAAAyI/doiBFgbIln8/image_thumb1.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="748" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;How to read Public DNS, Private DNS and Elastic IPs on AWS EC2 ?&lt;/b&gt;&lt;br /&gt;Imagine we have an instance running. This instance has an Elastic IP which is &lt;b&gt;&lt;span style="color: red;"&gt;46.52.186.25&lt;/span&gt;&lt;/b&gt; and the private IP address is &lt;b&gt;&lt;span style="color: blue;"&gt;11.235.33.6&lt;/span&gt;&lt;/b&gt;.&lt;br /&gt;The Private DNS name is : ip-&lt;b&gt;&lt;span style="color: blue;"&gt;11-235-33-6&lt;/span&gt;&lt;/b&gt;.eu-west-1.compute.internal&lt;br /&gt;The Public DNS name is : ec2-&lt;span style="color: red;"&gt;&lt;b&gt;46-52-186-25&lt;/b&gt;&lt;/span&gt;.eu-west-1.compute.amazonaws.com&lt;br /&gt;You see the relationship ?&lt;br /&gt;&lt;br /&gt;Ok, now, &lt;b&gt;how do you think we will configure Postgresql server to allow connexions from the ETL server and from the Reporting server&lt;/b&gt; ? Easy, here is one answer : &lt;br /&gt;&lt;ol&gt;&lt;li&gt;By making the ETL Server and the reporting server point to Postgresql. For that, we will use this nice little &lt;b&gt;Elastic IP&lt;/b&gt; we previously set up for Postgresql server because it’s soooo easy to do that way … &lt;/li&gt;&lt;li&gt;By writing the ETL server Elastic IP and reporting server Elastic IP into Postgresql &lt;b&gt;pg_hba.conf&lt;/b&gt; of course … because here again it is soooo easy natural to do so. &lt;/li&gt;&lt;li&gt;Don’t forget to open the corresponding ports in your security groups (see picture above). &lt;/li&gt;&lt;/ol&gt;Ok, easy. Let’s go for it. We make Talend and Jasper point to Postgresql like this : &lt;br /&gt;&lt;b&gt;Jasper server connexion screen : Postgresql database &amp;lt;===&amp;gt; Jasperserver&lt;/b&gt;&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/-C_RYWM1J3iM/TgDAAywts8I/AAAAAAAAAyM/Iq8aQf6ut1k/s1600-h/image%25255B21%25255D.png"&gt;&lt;img alt="image" border="0" height="448" src="http://lh4.ggpht.com/-IZUorFOSUo4/TgDABqQte3I/AAAAAAAAAyQ/Mhh70Te2sFU/image_thumb%25255B8%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="606" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;b&gt;Talend client connexion screen : your client &amp;lt;===&amp;gt; Talend server&lt;/b&gt;&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/-Ma5zmW8_s4A/TgDACO4xxnI/AAAAAAAAAyU/cbXzM5YFUg0/s1600-h/image22.png"&gt;&lt;img alt="image" border="0" height="494" src="http://lh6.ggpht.com/-Y3qZ6adDdKE/TgDACqw7jCI/AAAAAAAAAyY/DPtti1GMCrY/image_thumb11.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="604" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;b&gt;Talend server connexion screen : Talend server &amp;lt;===&amp;gt; Postgresql database&lt;/b&gt;&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/-RToMxYBly1w/TgDADED3DUI/AAAAAAAAAyc/vNBv3Exx4Ik/s1600-h/image%25255B30%25255D.png"&gt;&lt;img alt="image" border="0" height="589" src="http://lh6.ggpht.com/-xJKYaS8o460/TgDADkXDynI/AAAAAAAAAyg/Nk7w_dOGtBc/image_thumb%25255B13%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="602" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;And then we write down the&lt;b&gt; Elastic IPs into the pg_hba&lt;/b&gt; file like this, in order to allow Talend server and JasperServer to connect to the postgresql database. This is a basic pg_hba.conf, I encourage you to add stronger authentication. &lt;br /&gt;&lt;a href="http://lh6.ggpht.com/-mpbsFxH4dZs/TgDADwyCKGI/AAAAAAAAAyk/HDe2DMq6ntc/s1600-h/image33.png"&gt;&lt;img alt="image" border="0" height="239" src="http://lh5.ggpht.com/-Bieccu8G5QQ/TgDAEWyHNcI/AAAAAAAAAyo/qDYX_iItK9w/image_thumb16.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="653" /&gt;&lt;/a&gt;&lt;br /&gt;We are done. Don’t forget to adjust the security groups like this : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;b&gt;Talend Server :&lt;/b&gt; allow 8080, allow 22 &lt;/li&gt;&lt;li&gt;&lt;b&gt;Postgresql Server :&lt;/b&gt; allow 5432, allow 22 &lt;/li&gt;&lt;li&gt;&lt;b&gt;Jasperserver :&lt;/b&gt; allow 80 (or 443 if https), allow 22 &lt;/li&gt;&lt;/ul&gt;Okay, this stuff is fully working, you can test it. &lt;br /&gt;But wait … that’s &lt;b&gt;not the good way to do !&lt;/b&gt; By using the &lt;b&gt;&lt;span style="color: red;"&gt;elastic IPs to set up communication between each server/node&lt;/span&gt;&lt;/b&gt;, we just created a weird monster that makes the traffic &lt;span style="color: red;"&gt;&lt;b&gt;going&lt;/b&gt; &lt;b&gt;OUT&lt;/b&gt;&lt;/span&gt; of the cloud and &lt;span style="color: red;"&gt;&lt;b&gt;going&lt;/b&gt; &lt;b&gt;BACK INTO&lt;/b&gt;&lt;/span&gt; the cloud. Don’t forget you are paying for that. Look at this schema.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/-EHZonnuuHq8/TgDAE9Pl6yI/AAAAAAAAAys/kXQidPj190M/s1600-h/image41.png"&gt;&lt;img alt="image" border="0" height="342" src="http://lh5.ggpht.com/-dOql6155RkY/TgDAFpjafKI/AAAAAAAAAyw/gBovaQJJwPQ/image_thumb2.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="504" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h3&gt;First solution&lt;/h3&gt;The best practice is to &lt;b&gt;avoid using elastic IPs&lt;/b&gt; in order to set up network &lt;b&gt;traffic between servers that are hosted inside the EC2 cloud&lt;/b&gt;. Instead, use EC2 &lt;b&gt;internal adresses&lt;/b&gt;.&lt;br /&gt;Ok, but … wait a minute. &lt;br /&gt;&lt;ul&gt;&lt;li&gt;How do I do to &lt;b&gt;retrieve the internal address&lt;/b&gt; from inside EC2 ?&amp;nbsp; &lt;/li&gt;&lt;/ul&gt;The solution rely on a poorly documented EC2 feature : &lt;b&gt;&lt;span style="color: red;"&gt;when you query an ec2 public DNS server &lt;u&gt;from inside EC2&lt;/u&gt;, you will be given back the corresponding &lt;u&gt;internal IP address&lt;/u&gt;.&lt;/span&gt; Just what we need !!!!&lt;/b&gt;&lt;br /&gt;For instance, if you query your ETL Server from your your Postgresql server, by using the famous &lt;b&gt;&lt;i&gt;host&lt;/i&gt;&lt;/b&gt; command, you will have : &lt;br /&gt;&lt;a href="http://lh4.ggpht.com/-L8tFwZ7_NiU/TgDAFwt0R2I/AAAAAAAAAy0/Zf30fG-ko3Q/s1600-h/image37.png"&gt;&lt;img alt="image" border="0" height="110" src="http://lh3.ggpht.com/-Gm42z34b7FQ/TgDAGHq2EkI/AAAAAAAAAy4/ixKA8zOAoyc/image_thumb18.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="671" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;You see what you have to do ? Replace all elastic IPs, except for your Talend client, by internal IPs. Like that, your internal data won’t leave the cloud, like below.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/-u-AVzx5vr3c/TgDAG5MWLUI/AAAAAAAAAy8/5KE6YqggOKw/s1600-h/image8.png"&gt;&lt;img alt="image" border="0" height="400" src="http://lh6.ggpht.com/-LuMtgJVStwQ/TgDAHTwAI0I/AAAAAAAAAzA/EgarRZjS3gg/image_thumb4.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="465" /&gt;&lt;/a&gt;&lt;br /&gt;After using the internal addressing, the connexion screens will look like this : &lt;br /&gt;&lt;b&gt;Jasper server connexion screen : Postgresql database &amp;lt;===&amp;gt; Jasperserver&lt;/b&gt;&lt;br /&gt;&lt;h3&gt;&lt;a href="http://lh3.ggpht.com/-eH5fYUMJX3I/TgDAH3TLW3I/AAAAAAAAAzE/BuJD_N3IUw0/s1600-h/image%25255B12%25255D.png"&gt;&lt;img alt="image" border="0" height="384" src="http://lh4.ggpht.com/-ygBKF8Wy5JY/TgDAIQy6hqI/AAAAAAAAAzI/7NuhyG3pmR4/image_thumb%25255B4%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="616" /&gt;&lt;/a&gt;&lt;/h3&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;&lt;b&gt;Talend server connexion screen : Talend server &amp;lt;===&amp;gt; Postgresql database&lt;/b&gt;&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/-Wlv_KGJzBmw/TgDAJCMLgdI/AAAAAAAAAzM/8FhifeUNw5E/s1600-h/image%25255B16%25255D.png"&gt;&lt;img alt="image" border="0" height="704" src="http://lh6.ggpht.com/-bpIO1c_-0u0/TgDAJj9LWjI/AAAAAAAAAzQ/9n5aV09bQVQ/image_thumb%25255B5%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; margin: 0px; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="735" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Second issue&lt;/h3&gt;Well, ok, we solved our first issue : &lt;b&gt;using internal addresses between the ETL server and the Postgresql server&lt;/b&gt;. But, I can see two other issues : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Postgresql still &lt;b&gt;does not accept DNS names&lt;/b&gt; in the pg_hba.conf ! Only IP addresses allowed. So We can’t ask Postgresql and pg_hba.conf to resolve the dns for us. &lt;/li&gt;&lt;li&gt;What if I decide to &lt;b&gt;reboot the ETL server, or the Reporting server ?&lt;/b&gt; These &lt;b&gt;internal adresses&lt;/b&gt; are nice but they are changing each time I reboot / restart server in EC2. Then, how to keep my Postgreqsl &lt;b&gt;pg_hba.conf updated with frequently changing adresses&lt;/b&gt; ? &lt;/li&gt;&lt;/ul&gt;&lt;div align="center"&gt;&lt;a href="http://lh5.ggpht.com/-O8wEFZNPqHM/TgDAKIzAjzI/AAAAAAAAAzU/sQVl8jEZzUY/s1600-h/image%25255B34%25255D.png"&gt;&lt;img alt="image" border="0" height="136" src="http://lh4.ggpht.com/-rePW9tbpO-E/TgDAKoR9XGI/AAAAAAAAAzY/AozDSzP1UMU/image_thumb%25255B15%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="525" /&gt;&lt;/a&gt;&lt;span style="font-size: xx-small;"&gt;…not allowed …&lt;/span&gt;&lt;/div&gt;&lt;h3&gt;Second solution&lt;/h3&gt;No, there is still no support for DNS entries in the pg_hba.conf. I know this is a long awaited feature, at least by me. But, unless I’m wrong (tell me), &lt;b&gt;writing down a DNS name in pg_hba.conf won’t work&lt;/b&gt; and the server won’t start.&lt;br /&gt;We need to find a way to update the pg_hba.conf with the last / current ec2 internal addresses corresponding to the ETL server and the Reporting server. Easy, we will use a bit of shell code here. This script will retrieve the &lt;b&gt;internal IP Address&lt;/b&gt; for each server (ETL and JasperServer) by using the command &lt;i&gt;&lt;b&gt;host&lt;/b&gt;&lt;/i&gt; and will &lt;b&gt;update this address in the pg_hba.conf&lt;/b&gt; by using some sed or awk. Then, by using a sighup, Postgresql server will apply the new address configuration.&lt;br /&gt;Nothing complex, but the success rely on a good timing.&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/-Rr0HbzTvs-s/TgDALAADp9I/AAAAAAAAAzc/vQCMD8dINO0/s1600-h/image%25255B8%25255D.png"&gt;&lt;img alt="image" border="0" height="118" src="http://lh3.ggpht.com/-7EYGX7u3qh0/TgDALRcZ9DI/AAAAAAAAAzg/Vju_gUCx1P0/image_thumb%25255B3%25255D.png?imgmax=800" style="background-image: none; border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="741" /&gt;&lt;/a&gt;&lt;br /&gt;Note here : I created an &lt;b&gt;ORCHESTRATOR&lt;/b&gt;, a specialized instance in EC2, to monitor all my servers. &lt;b&gt;This orchestrator will run this kind of script as soon as it detects any change in the internal addressing schema&lt;/b&gt;. This ORCHESTRATOR will be detailed in a future article (I made several public presentations, and a lot of people seem interested …).&lt;br /&gt;And the shell script. This shell asks for the internal address, then updates the corresponding line. For that, you must&amp;nbsp; maintain your file in a tidy way : labels are needed. &lt;br /&gt;&lt;pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: andale mono, lucida console, monaco, fixed, monospace; font-size: 12px; height: 641px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 96.36%;"&gt;&lt;code&gt;&lt;span style="font-size: xx-small;"&gt;################################ &lt;br /&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # &lt;br /&gt;&lt;br /&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; IP adress lookup&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; # &lt;br /&gt;&lt;br /&gt;#&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; #  &lt;br /&gt;################################ &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;# POSTGRES (DATABASE) Server&lt;br /&gt;# Public DNS : ec2-12-345-678-999.eu-west-1.compute.amazonaws.com &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;# TALEND (ETL) Server     &lt;br /&gt;ETL_SERVER=`host ec2-11-222-33-444.eu-west-1.compute.amazonaws.com | sed 's/.*has address //g'` &lt;br /&gt;&lt;br /&gt;# JASPER (BI &amp;amp; reports) Server    &lt;br /&gt;JASPER_SERVER=`host ec2-22-33-444-555.eu-west-1.compute.amazonaws.com | sed 's/.*has address //g'` &lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;# Echoing all     &lt;br /&gt;echo "" &lt;br /&gt;&lt;br /&gt;echo "################## EC2 Addresses Update ######################" &lt;br /&gt;&lt;br /&gt;echo "Will update EC2 Talend Server address with : " $ETL_SERVER &lt;br /&gt;&lt;br /&gt;echo "Will update EC2 Jasper Server address with : " $JASPER_SERVER &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;h3&gt;&lt;span style="font-size: xx-small;"&gt;echo "" &lt;/span&gt;&lt;/h3&gt;&lt;span style="font-size: xx-small;"&gt;# Find and replace line Talend Server &lt;br /&gt;&lt;br /&gt;TALEND_NB=`grep -n "Talend server connexion" /mnt/postgres/data/pg_hba.conf | cut -d":" -f1` &lt;br /&gt;&lt;br /&gt;TALEND_NB=$((TALEND_NB+1)) &lt;br /&gt;&lt;br /&gt;sed -i "$TALEND_NB s%.*%host&amp;nbsp;&amp;nbsp;&amp;nbsp; all&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; all&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $ETL_SERVER/32&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; md5%" /mnt/postgres/data/pg_hba.conf &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;# Find and replace line Jasper Server     &lt;br /&gt;JASPER_NB=`grep -n "JasperServer connexion" /mnt/postgres/data/pg_hba.conf | cut -d":" -f1` &lt;br /&gt;&lt;br /&gt;JASPER_NB=$((JASPER_NB+1))     &lt;br /&gt;sed -i "$JASPER_NB s%.*%host&amp;nbsp;&amp;nbsp;&amp;nbsp; all&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; all&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $JASPER_SERVER/32&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; md5%" /mnt/postgres/data/pg_hba.conf&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;The end&lt;/h3&gt;Having a small (or even big) BI architecture up and running into EC2 is not a big deal. Having it properly set – in order not to pay extra fees – is something different and need some basic thinking before doing. The addressing issue which is technically simple, can have negative impact on your project if you don’t manage it from the start.&lt;br /&gt;&lt;br /&gt;I will recommand any AWS / EC2 user (BI or not) to create their own admin tools and scripts, based on the various available APIs, in order to&amp;nbsp; :&lt;br /&gt;&lt;ul&gt;&lt;li&gt;reduce reaction time,&lt;/li&gt;&lt;li&gt;be fully independent,&lt;/li&gt;&lt;li&gt;spare time (graphical tools are nice but need clicks, clicks and clicks …)&lt;/li&gt;&lt;/ul&gt;Some usefull links about AWS / EC2 documentation : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&amp;nbsp;&lt;a href="http://aws.amazon.com/archives/Amazon%20EC2?_encoding=UTF8&amp;amp;jiveRedirect=1"&gt;AWS Documentation archive (newest first)&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://aws.amazon.com/articles/1346?_encoding=UTF8&amp;amp;queryArg=searchQuery&amp;amp;x=0&amp;amp;fromSearch=1&amp;amp;y=0&amp;amp;searchPath=articles&amp;amp;searchQuery=elastic%20ip"&gt;Elastic IPs documentation and API&lt;/a&gt;&amp;nbsp;&lt;/li&gt;&lt;/ul&gt;&lt;ul&gt;&lt;/ul&gt;Feel free to contact me if this article is not clear enough.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1922182763045732849?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1922182763045732849/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1922182763045732849' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1922182763045732849'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1922182763045732849'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2011/06/about-aws-cloud-talend-jaspersoft.html' title='… about AWS cloud, Talend, Jaspersoft, Postgresql and typical EC2 internal addressing issues …'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/-S9J2AxkH-J8/TgDAAUinxQI/AAAAAAAAAyI/doiBFgbIln8/s72-c/image_thumb1.png?imgmax=800' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3278933647083132114</id><published>2011-03-28T14:51:00.001-07:00</published><updated>2011-03-28T14:52:11.503-07:00</updated><title type='text'>Wordle, wordmap, word clouds … what’s in a name ?</title><content type='html'>&lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;Wordmaps, wordles, word clouds … are pretty famous these days. But they are still “difficult” to generate : no api, often proprietary code, often only web based tools, difficulty to render good looking fonts ….&lt;/p&gt;  &lt;p&gt;Last week, I was attending a client meeting and we showed some wordmaps of our own, created with &lt;a href="http://www.r-project.org/" target="_blank"&gt;R&lt;/a&gt; : great success. If you are involved into datamining or simply data visualization, these wordmaps / wordle are definitely a must have.&lt;/p&gt;  &lt;p&gt;Here is a wordle I created using the excellent web site &lt;a href="http://www.wordle.net/" target="_blank"&gt;wordle.net&lt;/a&gt;, created by &lt;a href="http://www.wordle.net/contact" target="_blank"&gt;Jonathan Feinberg&lt;/a&gt;. Simply paste some of your data (previously prepared and formated) into text fields, press generate and the magic comes !&lt;/p&gt;  &lt;p align="center"&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/TZECzbDCgoI/AAAAAAAAAx8/eo8hmsk9-yg/s1600-h/image%5B4%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TZEC0HoovpI/AAAAAAAAAyA/Cim1n1gMJZs/image_thumb%5B2%5D.png?imgmax=800" width="622" height="434" /&gt;&lt;/a&gt;&lt;font size="1"&gt;Wordmap based on the keywords used to reach this blog.&lt;/font&gt;&lt;/p&gt;  &lt;p align="left"&gt;&lt;a href="http://www.comportemental.fr/blog" target="_blank"&gt;Matthias Oehler&lt;/a&gt; and I are currently working on a webservice aimed at creating wordmaps. The process will be : send a soap message with your data in it and you will get your png wordmap in return. Promising, hum ?&lt;/p&gt;  &lt;p align="left"&gt;Stay in touch …&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3278933647083132114?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3278933647083132114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3278933647083132114' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3278933647083132114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3278933647083132114'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2011/03/wordle-wordmap-whats-in-name.html' title='Wordle, wordmap, word clouds … what’s in a name ?'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/TZEC0HoovpI/AAAAAAAAAyA/Cim1n1gMJZs/s72-c/image_thumb%5B2%5D.png?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4431080405094708378</id><published>2011-03-24T04:56:00.001-07:00</published><updated>2011-03-28T08:22:06.422-07:00</updated><title type='text'>New Data cleansing / mining using Google Refine in the cloud with AWS</title><content type='html'>&lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;As promised, an article about a really nice piece of software that will allow you to run data cleansing and data mining jobs with fun.&lt;/p&gt;  &lt;p&gt;Let’s implement &lt;a href="http://code.google.com/p/google-refine/" target="_blank"&gt;Google Refine&lt;/a&gt; in Amazon Web Services (aka “le cloud”).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/TYtLj4RiudI/AAAAAAAAAxE/q60upfFGUeg/s1600-h/image%5B45%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/TYtLkK83_nI/AAAAAAAAAxI/TOoGveLVbAI/image_thumb%5B17%5D.png?imgmax=800" width="26" height="27" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;h3&gt;Google refine ?&lt;/h3&gt;  &lt;p&gt;According to Google, “&lt;a href="http://code.google.com/p/google-refine/" target="_blank"&gt;Google Refine&lt;/a&gt; is a power tool for working with messy data, cleaning it up, transforming it from one format into another, extending it with web services, and linking it to databases like &lt;a href="http://www.freebase.com/"&gt;Freebase&lt;/a&gt;.”&lt;/p&gt;  &lt;p&gt;With Google Refine, it’s easy to load big datafiles and process this data : cell fusion, clustering, groups, adding key-values, transcoding, data modification / data customization with web service calls … Just imagine an Excel grid, but on steroïds.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TYtLkVblG_I/AAAAAAAAAxM/shisPSQqE9I/s1600-h/image%5B46%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/TYtLkojHNyI/AAAAAAAAAxQ/W127xFVzTEI/image_thumb%5B18%5D.png?imgmax=800" width="24" height="25" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;h3&gt;Let’s go now …&lt;/h3&gt;  &lt;p&gt;As usual, you first need a valid AWS / EC2 account. Once it’s been done, you need an instance (a server). I recommend using a Fedora Core instance for Google Refine instead of a Ubuntu one. I’m a great fan of Ubuntu and use them in a lot of crucial apps, but I faced many issues with running Google Refine on top of a Ubuntu Lucid AMI (ram usage, freezing, jdk erratic behaviour). Please, choose instead the fedora one : amazon/fedora-8-x86_64-v1.14-std. Of course, let’s go with 64 bits and with an EBS volume attached to the instance in order to provide data consistency over time (everything not located on EBS is ephemeral).&lt;/p&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/TYtSgH2XqAI/AAAAAAAAAxU/94flpRMKSZE/s1600-h/image%5B53%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/TYtSgqjpYsI/AAAAAAAAAxY/AKbiPs-zUkU/image_thumb%5B21%5D.png?imgmax=800" width="244" height="148" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;p&gt;I won’t detail how to create and start an instance on AWS, but here are the majors steps. You can easily go through these steps by using the Firefox pluging called &lt;a href="http://aws.amazon.com/developertools/609?_encoding=UTF8&amp;amp;jiveRedirect=1" target="_blank"&gt;Elastifox&lt;/a&gt;. Or, for the guys having muscles, use the AWS EC2 api, which can be donwloaded &lt;a href="http://s3.amazonaws.com/ec2-downloads/ec2-api-tools.zip" target="_blank"&gt;here&lt;/a&gt;.&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Create a key pair and download the private key : this will give you SSH access to your instance. &lt;/li&gt; &lt;/ul&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/TYtHRJZYcSI/AAAAAAAAAwU/Bls8xSSrw4s/s1600-h/image%5B5%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/TYtHRm3ihLI/AAAAAAAAAwY/CobmdjJXHlQ/image_thumb%5B1%5D.png?imgmax=800" width="244" height="164" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;ul&gt;   &lt;li&gt;Create a security group : create a dedicated security group for your instance and open the following ports : 22 for ssh and 3333 for Google Refine webservice. &lt;/li&gt; &lt;/ul&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/TYtHR4I0RlI/AAAAAAAAAwc/E1g3FiWTu3E/s1600-h/image%5B2%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/TYtHSEzoSDI/AAAAAAAAAwg/KZNxK7yKrUE/image_thumb.png?imgmax=800" width="244" height="142" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;ul&gt;   &lt;li&gt;Choose your instance : amazon/fedora-8-x86_64-v1.14-std is a good choice. This corresponds to AMI ami-1d042f69 &lt;/li&gt; &lt;/ul&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/TYtHSdF8wWI/AAAAAAAAAwk/ql7O9OYBv9s/s1600-h/image%5B10%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/TYtHSpTja6I/AAAAAAAAAwo/nVj-vneLxlk/image_thumb%5B4%5D.png?imgmax=800" width="496" height="103" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;ul&gt;   &lt;li&gt;Run your instance :      &lt;ul&gt;       &lt;li&gt;Be carefull with the availability zone. This one is in USA but you can also place your instance in Ireland or in Asia. Be sure not to put yourself outlaw by placing sensitive data out of your safe harbor. &lt;/li&gt;        &lt;li&gt;Be sure you assign the key pair and the security group you created on steps 1 and 2. &lt;/li&gt;        &lt;li&gt;Choose an instance size. As I have plenty of money, I chose a m2.xlarge instance with plenty of ram. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt; &lt;/ul&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/TYtHTEqZLGI/AAAAAAAAAws/G3bcBtWztUA/s1600-h/image%5B13%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TYtHTYMh8EI/AAAAAAAAAww/NZjH7aeBzt0/image_thumb%5B5%5D.png?imgmax=800" width="157" height="244" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;ul&gt;   &lt;li&gt;SSH into your instance : you can use &lt;a href="http://www.chiark.greenend.org.uk/~sgtatham/putty/download.html" target="_blank"&gt;Putty&lt;/a&gt;, but don’t forget to convert the pem key (unix style private key) into ppk (windows style private key). &lt;/li&gt; &lt;/ul&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/TYtShIZ7-6I/AAAAAAAAAxc/Tta-pE50cCs/s1600-h/image%5B50%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/TYtShYKIKjI/AAAAAAAAAxg/zEGNFiO4tws/image_thumb%5B20%5D.png?imgmax=800" width="302" height="60" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;ul&gt;   &lt;li&gt;Create a volume and attach it to your newly running instance. This volume will be used to store Google Refine itself and all the data you will work with.      &lt;ul&gt;       &lt;li&gt;You can create a volume by using Elasticfox and attach it to the instance. Below you can find the different prompts. I choose 100 Gigas but you can set it smaller. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt; &lt;/ul&gt;  &lt;blockquote&gt;   &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TYtKkzDHC-I/AAAAAAAAAw0/rGBsduXohp8/s1600-h/image%5B16%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TYtKlDjOfXI/AAAAAAAAAw4/ak4aPqSXYrI/image_thumb%5B6%5D.png?imgmax=800" width="244" height="121" /&gt;&lt;/a&gt;&lt;/p&gt;    &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/TYtKlrg6poI/AAAAAAAAAw8/uH-KTKghniA/s1600-h/image%5B22%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; margin: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/TYtKl7wcLxI/AAAAAAAAAxA/hL79jtNcAg0/image_thumb%5B8%5D.png?imgmax=800" width="244" height="131" /&gt;&lt;/a&gt;&lt;/p&gt; &lt;/blockquote&gt;  &lt;ul&gt;   &lt;ul&gt;     &lt;li&gt;Once the volume is created and attached to the instance, simply create a filesystem on it, using ext3 for instance : &lt;em&gt;sudo mkfs.ext3 /dev/sdf&lt;/em&gt; &lt;/li&gt;      &lt;li&gt;Create a mount point in /mnt : &lt;em&gt;sudo mkdir /mnt/refine&lt;/em&gt; &lt;/li&gt;      &lt;li&gt;Add mount point in /etc/fstab &lt;/li&gt;      &lt;li&gt;Mount the newly created volume : &lt;em&gt;sudo mount /mnt/refine&lt;/em&gt; &lt;/li&gt;   &lt;/ul&gt;    &lt;li&gt;Download and install jdk 1.6      &lt;ul&gt;       &lt;li&gt;&lt;em&gt;sudo wget &lt;a href="http://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_Developer-Site/en_US/-/USD/VerifyItem-Start/jdk-6u24-linux-x64.bin?BundledLineItemUUID=msmJ_hCxeGwAAAEuarYWpm05&amp;amp;OrderID=uCSJ_hCxvZsAAAEuWrYWpm05&amp;amp;ProductID=oSKJ_hCwOlYAAAEtBcoADqmS&amp;amp;FileName=/jdk-6u24-linux-x64.bin"&gt;http://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_Developer-Site/en_US/-/USD/VerifyItem-Start/jdk-6u24-linux-x64.bin?BundledLineItemUUID=msmJ_hCxeGwAAAEuarYWpm05&amp;amp;OrderID=uCSJ_hCxvZsAAAEuWrYWpm05&amp;amp;ProductID=oSKJ_hCwOlYAAAEtBcoADqmS&amp;amp;FileName=/jdk-6u24-linux-x64.bin&lt;/a&gt;&lt;/em&gt; &lt;/li&gt;        &lt;li&gt;Install the jdk : &lt;em&gt;sudo sh jdkxxxxx.bin&lt;/em&gt; &lt;/li&gt;        &lt;li&gt;Don’t forget to add jdk_home and path into your bash profile . &lt;/li&gt;        &lt;li&gt;Try your jdk by typing simply typing java in a shell. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;Finally, assign an Elastic IP to your instance. This will be easier to connect to your instance and start using Google Refine. Once again, using ElasticFox will save you a lot of time. (of course you can assign an elastic IP before connecting with SSH, which is more logic I admit …).      &lt;blockquote&gt;       &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/TYtSh_puQcI/AAAAAAAAAxk/4BkAA7b6X1E/s1600-h/image%5B62%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/TYtSiAvssKI/AAAAAAAAAxo/ERSBJCCLC6E/image_thumb%5B24%5D.png?imgmax=800" width="244" height="121" /&gt;&lt;/a&gt;&lt;/p&gt;     &lt;/blockquote&gt;   &lt;/li&gt;    &lt;li&gt;Download and install Google Refine. This is really a no brainer …&amp;#160; &lt;ul&gt;       &lt;li&gt;Go into /mnt/refine (that means goind into your EBS). &lt;/li&gt;        &lt;li&gt;Download Refine : &lt;em&gt;sudo wget &lt;/em&gt;&lt;a title="http://google-refine.googlecode.com/files/google-refine-2.0-r1836.tar.gz" href="http://google-refine.googlecode.com/files/google-refine-2.0-r1836.tar.gz"&gt;&lt;em&gt;http://google-refine.googlecode.com/files/google-refine-2.0-r1836.tar.gz&lt;/em&gt;&lt;/a&gt; &lt;/li&gt;        &lt;li&gt;unzip and untar the archive :          &lt;ul&gt;           &lt;li&gt;&lt;em&gt;sudo tar xzf google-refine-2.0-r1836.tar.gz&lt;/em&gt; &lt;/li&gt;         &lt;/ul&gt;       &lt;/li&gt;        &lt;li&gt;Start Google Refine          &lt;ul&gt;           &lt;li&gt;&lt;em&gt;sh refine&lt;/em&gt; –I 0.0.0.0 –m 8000M &lt;/li&gt;            &lt;li&gt;That means : start refine, listen to all addresses and assign 8 Giga of memory. Hey, that’s what we need here when playing with data ! &lt;/li&gt;            &lt;li&gt;Google Refine starts ….              &lt;p&gt;&lt;font size="1"&gt;Starting Google Refine at '&lt;/font&gt;&lt;a href="http://0.0.0.0:3333/'"&gt;&lt;font size="1"&gt;http://0.0.0.0:3333/'&lt;/font&gt;&lt;/a&gt;&lt;/p&gt;              &lt;p&gt;&lt;font size="1"&gt;10:11:01.905 [refine_server] Starting Server bound to '0.0.0.0:3333' (0ms)                  &lt;br /&gt;10:11:01.906 [refine_server] Max memory size: 8000M (1ms)                   &lt;br /&gt;10:11:01.955 [refine_server] Initializing context: '/' from '/mnt/refine/google-refine-2.0/webapp' (49ms)                   &lt;br /&gt;10:11:03.288 [refine] Starting Google Refine 2.0 [r1836]... (1333ms)                   &lt;br /&gt;10:11:03.297 [FileProjectManager] Using workspace directory: /root/.local/share/google/refine (9ms)                   &lt;br /&gt;10:11:03.299 [FileProjectManager] Loading workspace: /root/.local/share/google/refine/workspace.json (2ms)&lt;/font&gt;&lt;/p&gt;           &lt;/li&gt;         &lt;/ul&gt;       &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt; &lt;/ul&gt;  &lt;h3&gt;&amp;#160;&lt;/h3&gt;  &lt;h3&gt;Time to play now !&lt;/h3&gt;  &lt;p&gt;Ok, most of the work is done. I hope the quick AWS EC2 walk thru is sufficient for most of you. In the other case, feel free to contact me.&lt;/p&gt;  &lt;p&gt;Now let’s play with Google Refine. Simply open your browser and point to the IP address you assigned to your instance : &lt;a title="http://ec2-79-125-28-175.eu-west-1.compute.amazonaws.com:3333/" href="http://your IP Adress:3333/"&gt;http://your IP Adress:3333/&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;… and you are done !&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TYtSika8xTI/AAAAAAAAAxs/KZ_4VxnfEnE/s1600-h/image%5B73%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/TYtSi3p1jgI/AAAAAAAAAxw/H7O409WvRxI/image_thumb%5B29%5D.png?imgmax=800" width="525" height="269" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Ok, I have some more time, let’s create a simple project. Simply choose a file in the Data File zone, name your project and provide some more informations about your file : separator, header, limit, auto detect value types …&lt;/p&gt;  &lt;p&gt;After a short while, booom, your file is loaded and you have access to your data, ready to work on it.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/TYtXShMtbII/AAAAAAAAAx0/wY19NeL6miI/s1600-h/image%5B79%5D.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/TYtXTJ4VLOI/AAAAAAAAAx4/2C8DwJPpb98/image_thumb%5B33%5D.png?imgmax=800" width="529" height="213" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Stay in touch for the next coming articles, I will show you how to fully leverage Google Refine and how to enrich your data with spectacular value added and services.&lt;/p&gt;  &lt;p&gt;One last thing, you definitely need to go and read &lt;a href="http://www.comportemental.fr/blog" target="_blank"&gt;my friend’s blog about datamining&lt;/a&gt;. His name is Matthias Oehler (dataminer) and he is a kind of &lt;a href="http://www.r-project.org/" target="_blank"&gt;R&lt;/a&gt; and &lt;a href="http://code.google.com/p/google-refine/" target="_blank"&gt;Google Refine&lt;/a&gt; wizard. You will learn a lot reading his articles as soon as his website will be open (from a few hours to a few days according to him …).&lt;/p&gt;  &lt;p&gt;More informations&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;You can compile and deploy Google Refine into &lt;a href="http://code.google.com/intl/fr/appengine/" target="_blank"&gt;Google AppEngine&lt;/a&gt;, which is even a better solution. &lt;/li&gt;    &lt;li&gt;You can find some screencast about Google Refine &lt;a href="http://code.google.com/p/google-refine/wiki/Screencasts" target="_blank"&gt;here&lt;/a&gt;. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Feel free to contact me if needed.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4431080405094708378?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4431080405094708378/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4431080405094708378' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4431080405094708378'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4431080405094708378'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2011/03/new-data-cleansing-mining-using-google.html' title='New Data cleansing / mining using Google Refine in the cloud with AWS'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/TYtLkK83_nI/AAAAAAAAAxI/TOoGveLVbAI/s72-c/image_thumb%5B17%5D.png?imgmax=800' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-6006559566252730176</id><published>2011-03-23T12:31:00.000-07:00</published><updated>2011-07-27T01:34:45.682-07:00</updated><title type='text'>PageRank 4 and next coming articles ...</title><content type='html'>Hi all,  &lt;br /&gt;  &lt;br /&gt;First, I want to thank everybody (well, people reading that blog) for my pagerank 4.  &lt;br /&gt;  &lt;br /&gt;&lt;a href="http://lh5.ggpht.com/-toUHpj0MjqU/Ti_No7RedCI/AAAAAAAAA14/eluxE0FcAXs/s1600-h/image%25255B3%25255D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top: 0px; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/-GarLZHs2j-w/Ti_NpRrK94I/AAAAAAAAA18/xUzJvfwXAWk/image_thumb%25255B1%25255D.png?imgmax=800" width="218" height="68" /&gt;&lt;/a&gt;  &lt;br /&gt;  &lt;br /&gt;  &lt;div&gt;It's been a long time since I wrote a post here. As you may remember, I'm deeply involved into a datamining startup as CTO.&lt;/div&gt;  &lt;br /&gt;Be prepared because I'm back in a short while with a lot of articles / experiences and feed back about :   &lt;br /&gt;  &lt;ul&gt;   &lt;li&gt;Datamining&lt;/li&gt;    &lt;li&gt;Google APIs&lt;/li&gt;    &lt;li&gt;&lt;a href="http://code.google.com/p/google-refine/"&gt;Google Refine&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;BI and Cloud computing with &lt;a href="http://aws.amazon.com/"&gt;Amazon Web Services&lt;/a&gt;&lt;/li&gt;    &lt;li&gt;Json, Soap and BI in cloud environment.&lt;/li&gt; &lt;/ul&gt; Vincent  &lt;br /&gt;  &lt;br /&gt;  &lt;div&gt;&lt;/div&gt;  &lt;div&gt;&lt;/div&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-6006559566252730176?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/6006559566252730176/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=6006559566252730176' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6006559566252730176'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6006559566252730176'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2011/03/hi-all-first-i-want-to-thank-everybody.html' title='PageRank 4 and next coming articles ...'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/-GarLZHs2j-w/Ti_NpRrK94I/AAAAAAAAA18/xUzJvfwXAWk/s72-c/image_thumb%25255B1%25255D.png?imgmax=800' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-7155806561015282963</id><published>2011-01-31T13:27:00.000-08:00</published><updated>2011-01-31T13:27:04.632-08:00</updated><title type='text'>Stuff The Internet Says On Scalability For January 28, 2011</title><content type='html'>&lt;a href="http://highscalability.com/blog/2011/1/28/stuff-the-internet-says-on-scalability-for-january-28-2011.html"&gt;Stuff The Internet Says On Scalability For January 28, 2011&lt;/a&gt;: " &lt;p&gt;&lt;img alt="" align="right" src="http://farm5.static.flickr.com/4088/4997942872_671232a8b0_o.jpg" /&gt;&lt;/p&gt;&lt;br /&gt;&lt;p&gt; Submitted for your reading pleasure...&lt;/p&gt;&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;Something we get to say more often than you might expect - funny NoSQL comic: &lt;a href="http://geekandpoke.typepad.com/geekandpoke/2011/01/nosql.html"&gt;How to Write a CV&lt;/a&gt; (SFW)&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Playtomic shows hows how to &lt;a href="http://playtomic.com/blog/post/53-handling-over-300-million-ev"&gt;handle over 300 million events per day, in real time, on a budget&lt;/a&gt;. &lt;/li&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://www.datacenterknowledge.com/archives/2011/01/24/more-speed-at-80000-a-millisecond/?utm-source=feedburner&amp;amp;utm-medium=feed&amp;amp;utm-campaign=Feed:+DataCenterKnowledge+(Data+Center+Knowledge)"&gt;More Speed, at $80,000 a Millisecond&lt;/a&gt;. Does &lt;a href="http://highscalability.com/blog/2009/7/25/latency-is-everywhere-and-it-costs-you-sales-how-to-crush-it.html"&gt;latency matter&lt;/a&gt;? Oh yes...&lt;em&gt;“On the Chicago to New York route in the US, three milliseconds can mean the difference between US$2,000 a month and US$250,000 a month.”&lt;/em&gt;&lt;/li&gt;&lt;br /&gt;&lt;li&gt;Quotable Quotes&lt;br /&gt;&lt;ul&gt;&lt;br /&gt;&lt;li&gt;&lt;a href="http://twitter.com/#!/jkalucki/statuses/28333480144281600"&gt;@jkalucki&lt;/a&gt;: Throwing 1,920 CPUs and 4TB of RAM at an annoyance, as you do. @jointheflock&lt;/li&gt;&lt;br /&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;"&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-7155806561015282963?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='related' href='http://highscalability.com/blog/2011/1/28/stuff-the-internet-says-on-scalability-for-january-28-2011.html' title='Stuff The Internet Says On Scalability For January 28, 2011'/><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/7155806561015282963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=7155806561015282963' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7155806561015282963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7155806561015282963'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2011/01/stuff-internet-says-on-scalability-for.html' title='Stuff The Internet Says On Scalability For January 28, 2011'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2093049413141078912</id><published>2011-01-02T05:10:00.001-08:00</published><updated>2011-01-02T05:11:38.047-08:00</updated><title type='text'>In-Memory analytics. 2011 the real take off ?</title><content type='html'>&lt;p&gt;Hi all, and happy new year to everybody.&lt;/p&gt;  &lt;p&gt;Are traditional &lt;strong&gt;databases&lt;/strong&gt; and &lt;strong&gt;data centers&lt;/strong&gt; going to &lt;strong&gt;change&lt;/strong&gt; in order to be &lt;strong&gt;able to deliver&lt;/strong&gt; the always increasing demand&lt;strong&gt; &lt;/strong&gt;for &lt;strong&gt;performance and real time&lt;/strong&gt; ? For BI and real time data analysis, I believe so. The traditional data path (with its I/Os challenge) has now to evolve after 30 years almost untouched. &lt;/p&gt;  &lt;p&gt;In-Memory Analytics (in conjunction with cloud computing or infrastructure like &lt;strong&gt;&lt;a href="http://hive.apache.org/"&gt;Hive&lt;/a&gt;&lt;/strong&gt;) :&amp;#160; the next big thing on 2011 ?&lt;/p&gt;  &lt;p&gt;Read more &lt;a href="http://www.ecrmguide.com/article.php/3918891/analytics-and-in-memory-databases-are-changing-data-centers.htm"&gt;&lt;strong&gt;here&lt;/strong&gt;&lt;/a&gt;, an interesting article from the &lt;a href="http://www.ecrmguide.com"&gt;eCRMguide&lt;/a&gt; website. Comments (below article) are also precious.&lt;/p&gt;  &lt;p align="center"&gt;&lt;img style="display: block; float: none; margin-left: auto; margin-right: auto" src="http://explore.toshiba.com/images/showcase/access-memory-hero.jpg" width="227" height="240" /&gt;&lt;font size="1"&gt;A real time database&lt;/font&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2093049413141078912?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2093049413141078912/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2093049413141078912' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2093049413141078912'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2093049413141078912'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2011/01/in-memory-analytics-2011-real-take-off.html' title='In-Memory analytics. 2011 the real take off ?'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-6622345654226608604</id><published>2010-12-20T02:26:00.001-08:00</published><updated>2010-12-20T02:26:01.996-08:00</updated><title type='text'>New Pentaho Kettle book</title><content type='html'>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;Last year, at the same time, I wrote and article about the first Pentaho book published. Simply called &lt;strong&gt;Pentaho Solutions&lt;/strong&gt;, this book covers the basics of datawarehousing and Pentaho tools. You can find the original article &lt;a href="http://open-bi.blogspot.com/2009/12/pentaho-solutions-book-by-roland-bouman.html"&gt;here&lt;/a&gt; and order this valuable book &lt;a href="http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0470484322.html"&gt;here&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Recently, I received my own review copy of a long awaited Pentaho book : &lt;strong&gt;Pentaho Kettle Solutions – Building Open Source ETL Solutions with Pentaho Data Integration.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Thanks to Roland, Matt and Jos for sending me this new book.&lt;/p&gt;  &lt;p align="center"&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TQ8vMQz9RuI/AAAAAAAAAvg/fwblrBmuSr8/s1600-h/image4.png"&gt;&lt;img style="background-image: none; border-right-width: 0px; padding-left: 0px; padding-right: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/TQ8vOAQ1HPI/AAAAAAAAAvk/NaYryvH9R9Q/image_thumb2.png?imgmax=800" width="255" height="329" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p align="center"&gt;ISBN: 978-0-470-63517-9 &lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;&lt;strong&gt;Matt Casters &lt;/strong&gt;is the Pentaho Chief of Data Integration and Kettle founder (&lt;strong&gt;Kettle’s dad, it’s him&lt;/strong&gt;). Have a look to his prolific blog &lt;a href="http://www.ibridge.be/"&gt;here&lt;/a&gt;.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;&lt;strong&gt;Roland&lt;/strong&gt; is an IT expert, ranging from web application development and business process analysis to business intelligence. He co-authored the &lt;/font&gt;&lt;a href="http://store.vervante.com/c/v/595352502.html"&gt;&lt;font face="Arial"&gt;MySQL Cluster 5.1 Certification Study Guide&lt;/font&gt;&lt;/a&gt;&lt;font face="Arial"&gt;. Please, have a look to his &lt;/font&gt;&lt;a href="http://rpbouman.blogspot.com/"&gt;&lt;font face="Arial"&gt;blog&lt;/font&gt;&lt;/a&gt;&lt;font face="Arial"&gt;.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;&lt;strong&gt;Jos&lt;/strong&gt; is a BI expert with more than 15 years of experience. He created &lt;a href="http://www.tholis.com"&gt;Tholis Consulting&lt;/a&gt; and is also covering BI developments for the Dutch Database Magazine.&lt;/font&gt;&lt;/p&gt;  &lt;h3&gt;&amp;#160;&lt;/h3&gt;  &lt;h3&gt;What is this book about ?&lt;/h3&gt;  &lt;p&gt;The first book, &lt;strong&gt;Pentaho Solutions&lt;/strong&gt;, was aimed at discovering the basics of BI and Pentaho usage. Now, with this new book, we go deeper into hardcode &lt;strong&gt;dataprocessing&lt;/strong&gt; and &lt;strong&gt;datawarehousing&lt;/strong&gt; using &lt;strong&gt;Kettle&lt;/strong&gt;. But it is not exclusively focused on Kettle : a strong emphasis is placed on data processing basics, technics and theory (Codd vs Kimbal …). Reading this book will get you to the next upper level on these two topics : &lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;strong&gt;Data processing and how to build / feed a datawarehouse,&lt;/strong&gt; &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Kettle development, customization and advanced usage.&lt;/strong&gt; &lt;/li&gt; &lt;/ul&gt;  &lt;h3&gt;&amp;#160;&lt;/h3&gt;  &lt;h3&gt;Book summary&lt;/h3&gt;  &lt;ul&gt;   &lt;li&gt;&lt;strong&gt;Introduction&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;&lt;strong&gt;What’s an ETL&lt;/strong&gt; and what are Kettle key concepts &lt;/li&gt;        &lt;li&gt;How to &lt;strong&gt;install Kettle&lt;/strong&gt; and configure it &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Real life example&lt;/strong&gt; : Sakila datawarehouse &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;ETL and ETL subsystems&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;What are the famous &lt;strong&gt;ETL Subsystems&lt;/strong&gt; (Kimabal). A very detailed and inspiring chapter. &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Extraction&lt;/strong&gt; &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Cleansing and conforming&lt;/strong&gt; &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Handling dimension&lt;/strong&gt; tables &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Loading facts&lt;/strong&gt; tables &lt;/li&gt;        &lt;li&gt;Working with &lt;strong&gt;OLAP data&lt;/strong&gt;. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Management and deployment&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;Typical ETL &lt;strong&gt;development lifecycle.&lt;/strong&gt; A must read here ! &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Scheduling&lt;/strong&gt; and &lt;strong&gt;monitoring&lt;/strong&gt; &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Versioning&lt;/strong&gt; &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Lineage&lt;/strong&gt; and &lt;strong&gt;auditing&lt;/strong&gt;. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Performance and scalability&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;&lt;strong&gt;Performance tuning.&lt;/strong&gt; Here again, a must read that will give you precious information on how to make your Kettle set up reach the hills of performance and stability. &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Parallelization, clustering and Partitioning&lt;/strong&gt; : my favorite. You have big data and / or strong constraints ? Think parallel and start building your own Kettle cluster / parallel set up. As to me, the best chapter ever written on this topic, all ETLs included. &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Dynamic clustering in the cloud&lt;/strong&gt;. Once again my favorite. You all know my passion for Cloud Computing ! Very technical article, you need real experience on using AWS tools and APIs. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Advanced topics&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;&lt;strong&gt;Data Vault Management&lt;/strong&gt; : interesting concept. You will learn about Data Vault and discover this mixed (Codd with 3NF / Kimbal with star schema) approach in detail. &lt;/li&gt;        &lt;li&gt;Handling &lt;strong&gt;complex data&lt;/strong&gt; formats. &lt;/li&gt;        &lt;li&gt;Web Services. &lt;strong&gt;I love that one too ! More and more datawarehouses are now feeded by using web&lt;/strong&gt; services. Learn how to feed yours by leveraging Kettle. &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Kettle integration&lt;/strong&gt; &lt;/li&gt;        &lt;li&gt;&lt;strong&gt;Extending Kettle.&lt;/strong&gt; Yummy ! If, like me, you created your own Kettle plugins or want to, this chapter is a must read. Java programing experience is needed. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;The Kettle Ecosystem&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;Kettle &lt;strong&gt;enterprise edition features&lt;/strong&gt; : comparative matrix. &lt;/li&gt;        &lt;li&gt;Built in &lt;strong&gt;variables&lt;/strong&gt; and &lt;strong&gt;properties&lt;/strong&gt; &lt;strong&gt;reference&lt;/strong&gt; : a must read in order to be aware of Kettle internals, and be able to create fully automatised / self sufficient jobs. &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt; &lt;/ul&gt;  &lt;h3&gt;My opinion&lt;/h3&gt;  &lt;p&gt;This book is a fantastic &lt;strong&gt;concentration&lt;/strong&gt; of &lt;strong&gt;knowledge&lt;/strong&gt;. You will learn from ETL basics, advanced topics, performance management, Kettle development and cloud dataprocessing. Matt, Roland and Jos met a risky challenge : &lt;strong&gt;writing a book that do the splits from basic knowledge to high level technics while staying focused on how to use Kettle to solve actual and concrete data problems.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;They succeeded.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;This book is now sitting on my reference BI shelf, it entered my personal &lt;strong&gt;BI Book Hall of Fame&lt;/strong&gt;.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-6622345654226608604?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/6622345654226608604/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=6622345654226608604' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6622345654226608604'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6622345654226608604'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/12/new-pentaho-kettle-book.html' title='New Pentaho Kettle book'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_hTlcWbt-BP4/TQ8vOAQ1HPI/AAAAAAAAAvk/NaYryvH9R9Q/s72-c/image_thumb2.png?imgmax=800' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4867663215604177485</id><published>2010-12-14T02:56:00.001-08:00</published><updated>2010-12-15T07:59:53.169-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Talend'/><title type='text'>Talend : Customer feed back</title><content type='html'>Hi all,&lt;br /&gt;Here is an &lt;a href="http://www.talend.com/open-source-provider/casestudy/CaseStudy_ScoreMD_FR.php"&gt;&lt;strong&gt;interview&lt;/strong&gt;&lt;/a&gt; I gave to &lt;a href="http://www.talend.com/"&gt;Talend&lt;/a&gt; in order to explain what I did as Architect for &lt;a href="http://www.score-md.com/"&gt;Score-MD&lt;/a&gt;.&lt;br /&gt;This article is written in french, sorry. I can translate if needed.&lt;br /&gt;The key points are : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Cloud environment (AWS) with home brew orchestrator, &lt;/li&gt;&lt;li&gt;Heterogeneous data sources and large data volume, &lt;/li&gt;&lt;li&gt;Flexibility, scalability and cost management (by leveraging all cloud features) &lt;/li&gt;&lt;/ul&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/TQdNeJ90_dI/AAAAAAAAAvY/CLWn8Cw8IPI/s1600-h/image%5B3%5D.png"&gt;&lt;img alt="image" border="0" height="160" src="http://lh3.ggpht.com/_hTlcWbt-BP4/TQdNeoEBtDI/AAAAAAAAAvc/kr_zRPxt-mo/image_thumb%5B1%5D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="449" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4867663215604177485?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4867663215604177485/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4867663215604177485' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4867663215604177485'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4867663215604177485'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/12/talend-customer-feed-back.html' title='Talend : Customer feed back'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/TQdNeoEBtDI/AAAAAAAAAvc/kr_zRPxt-mo/s72-c/image_thumb%5B1%5D.png?imgmax=800' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1767433008547155303</id><published>2010-12-06T13:44:00.001-08:00</published><updated>2010-12-15T07:59:34.056-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Update : Amazon Simple DB data loading with Kettle</title><content type='html'>Hi all,&lt;br /&gt;This is just an update for this old article : &lt;a href="http://open-bi.blogspot.com/2010/03/amazon-simpledb-data-loading-with.html"&gt;Amazon Simple DB data loading with Kettle&lt;/a&gt;.&lt;br /&gt;The files are now back and you can download them : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;You can find the Kettle transformation &lt;a href="http://www.decisionsystems-studio.fr/Downloads/Feed_SimpleDB.ktr"&gt;HERE&lt;/a&gt;. &lt;/li&gt;&lt;li&gt;You can find the Jscript &lt;a href="http://www.decisionsystems-studio.fr/Downloads/JScript.txt"&gt;HERE&lt;/a&gt;. &lt;/li&gt;&lt;li&gt;You can find my little flat file &lt;a href="http://www.decisionsystems-studio.fr/Downloads/FlatData.csv"&gt;HERE&lt;/a&gt;. &lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1767433008547155303?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1767433008547155303/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1767433008547155303' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1767433008547155303'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1767433008547155303'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/12/update-amazon-simple-db-data-loading.html' title='Update : Amazon Simple DB data loading with Kettle'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-7974549735777182128</id><published>2010-11-30T05:27:00.001-08:00</published><updated>2010-11-30T05:27:43.110-08:00</updated><title type='text'>Just discovered that … love it !</title><content type='html'>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.commandlinefu.com/commands/browse"&gt;commandlinefu.com&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;No excuse for not being a shell guru now.&lt;/p&gt;  &lt;p&gt;Shell rulez …&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/TPT7y3JY9TI/AAAAAAAAAvM/zTkuOwNce8g/s1600-h/image%5B5%5D.png"&gt;&lt;img style="background-image: none; border-bottom: 0px; border-left: 0px; padding-left: 0px; padding-right: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px; padding-top: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/TPT7zfNqjII/AAAAAAAAAvQ/Xeqi87z4E_4/image_thumb%5B3%5D.png?imgmax=800" width="674" height="484" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-7974549735777182128?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/7974549735777182128/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=7974549735777182128' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7974549735777182128'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7974549735777182128'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/11/just-discovered-that-love-it.html' title='Just discovered that … love it !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_hTlcWbt-BP4/TPT7zfNqjII/AAAAAAAAAvQ/Xeqi87z4E_4/s72-c/image_thumb%5B3%5D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2043980462775098032</id><published>2010-11-26T07:23:00.001-08:00</published><updated>2010-12-15T08:00:12.610-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Postgresql dumps and storage on S3 : the sequel … using dynamic temp EBS</title><content type='html'>Hi all,&lt;br /&gt;Here you can find a &lt;strong&gt;new&lt;/strong&gt; and customized approach for Postgresql dumps and storage on S3. Based on my &lt;a href="http://open-bi.blogspot.com/2010/11/postgresql-dumps-and-storage-on-s3.html"&gt;previous post&lt;/a&gt;, this script uses the EC2 API and has &lt;strong&gt;new features&lt;/strong&gt; and will : &lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Create an EBS volume on EC2,&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Prepare the EBS volume filesystem and mount it on you database server (in EC2 cloud),&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Run the postgresql dump utility and store the dumps on the EBS volume,&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Rend the dump files to S3, into the bucket you want,&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Manage dump file collections in the bucket : clean / delete the previous dump files according to the retention date/period,&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Un-mount and delete the EBS volume, like that you no longer pay for something you don’t need / don’t use.&lt;/strong&gt; &lt;/li&gt;&lt;/ol&gt;&lt;strong&gt;&lt;/strong&gt;&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/TO_Q7wTlf0I/AAAAAAAAAu4/V8TCxoApxA4/s1600-h/image%5B14%5D.png"&gt;&lt;img alt="image" border="0" height="494" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TO_Q8XY1FdI/AAAAAAAAAvA/cNuIwL2m4SQ/image_thumb%5B8%5D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="740" /&gt;&lt;/a&gt;&lt;br /&gt;Here is the script descriptions : &lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Create the temporary EBS volume,&lt;/strong&gt; &lt;ul&gt;&lt;li&gt;&lt;em&gt;VOLUME=`ec2-create-volume&amp;nbsp; --size $VOL_SIZE --region $REGION -z $AVAIL_ZONE | cut -f2`&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Building the filesystem and mounting as /mnt/something&lt;/strong&gt; &lt;ul&gt;&lt;li&gt;&lt;em&gt;STATE=`ec2-attach-volume $VOLUME --instance $INSTANCE --device $DEVICE --region $REGION | cut -f5`&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Start database dump&lt;/strong&gt; &lt;ul&gt;&lt;li&gt;&lt;em&gt;su - postgres -c "$PG_DUMP $BASE | gzip | split -b $SPLITS - $DUMP_FILE"&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Send the dump files to S3 with s3cmd&lt;/strong&gt; &lt;ul&gt;&lt;li&gt;&lt;em&gt;s3cmd -p put $DUMP_FILE* $S3ENDPOINT&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Detaching the EBS Volume&lt;/strong&gt; &lt;ul&gt;&lt;li&gt;&lt;em&gt;STATE=`ec2-detach-volume $VOLUME --region $REGION | cut -f5`&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Clean up the historical dump files on S3, based on &lt;/strong&gt;&lt;ul&gt;&lt;li&gt;&lt;em&gt;for FILENAME in `$S3PUT ls $S3ENDPOINT | cut -d":" -f3`&amp;nbsp; ====&amp;gt; delete&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ol&gt;After running, the temporary EBS volume is no longer attached to the server and delete can be performed.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TO_Q85zDG9I/AAAAAAAAAvE/Qb4N2Ymw0HY/s1600-h/image%5B20%5D.png"&gt;&lt;img alt="image" border="0" height="219" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TO_Q9aHp4JI/AAAAAAAAAvI/lK05W_T1HCk/image_thumb%5B12%5D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="440" /&gt;&lt;/a&gt;&lt;br /&gt;Below, you can find the script. Please note : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;use database_name = bucket name, for convenience. Otherwise you will have to modify the script.&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;use database_name as parameter $1.&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;the final routine (lookup on s3 bucket files for cleanup) is a modified version of the one found &lt;a href="http://www.segmentationfault.es/2010/06/how-to-backup-your-server-to-amazon-s3/"&gt;here&lt;/a&gt;.&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Be sure you have a working installation for the EC2 API (+path)&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Variables to check : &lt;/strong&gt;&lt;ul&gt;&lt;li&gt;Java path &lt;/li&gt;&lt;li&gt;Availability zone (EC2) &lt;/li&gt;&lt;li&gt;Region (EC2) &lt;/li&gt;&lt;li&gt;Device (something from sdf to sdxxxx) &lt;/li&gt;&lt;li&gt;Dump dir : will be hosted on the temp EBS once mounted on /mnt/something &lt;/li&gt;&lt;li&gt;Volume size : the size of the volume to create. Be sure you have enough space for your dumps &lt;/li&gt;&lt;li&gt;Splits : size for multi dump files. &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: andale mono, lucida console, monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"&gt;&lt;code&gt;#!/bin/bash&lt;br /&gt;######################################################&lt;br /&gt;#                                                    #&lt;br /&gt;#        POSTGRES startup script                     #&lt;br /&gt;# Author : Vincent Teyssier                          #&lt;br /&gt;# Date   : 20/11/2010                                #&lt;br /&gt;#                                                    #&lt;br /&gt;#####################################################&lt;br /&gt;#&lt;br /&gt;# General variables&lt;br /&gt;export JAVA_HOME="/mnt/postgres/jdk1.6.0_21"&lt;br /&gt;export EC2_HOME="/mnt/postgres/ec2-api-tools-1.3-53907"&lt;br /&gt;export EC2_PRIVATE_KEY="/mnt/postgres/ec2-api-tools-1.3-53907/keys/pk-file.pem"&lt;br /&gt;export EC2_CERT="/mnt/postgres/ec2-api-tools-1.3-53907/keys/cert-file.pem"&lt;br /&gt;export JDK_HOME="${JAVA_HOME}"&lt;br /&gt;export PATH="${JAVA_HOME}/bin:${PATH}"&lt;br /&gt;export PATH="$PATH:$EC2_HOME/bin"&lt;br /&gt;&lt;br /&gt;# EC2 Variables&lt;br /&gt;AVAIL_ZONE="eu-west-1a"&lt;br /&gt;REGION="eu-west-1"&lt;br /&gt;INSTANCE="your EC2 instance id"&lt;br /&gt;DEVICE="/dev/sdh"&lt;br /&gt;DUMP_DIR="/mnt/postgres/dumps"&lt;br /&gt;VOL_SIZE="100" # in mb&lt;br /&gt;&lt;br /&gt;# Dump variables&lt;br /&gt;PG_DUMP="/usr/lib/postgresql/8.4/bin/pg_dump"&lt;br /&gt;S3PUT="/mnt/postgres/s3tools/s3cmd-1.0.0-rc1/s3cmd"&lt;br /&gt;BASE=$1&lt;br /&gt;SPLITS="70m"&lt;br /&gt;DUMP_TIME=`date +"%Y%m%d"`&lt;br /&gt;DUMP_FILE="$DUMP_DIR/$DUMP_TIME.gz"&lt;br /&gt;S3ENDPOINT="s3://postgresql-dumps/pilotroi/$BASE/"&lt;br /&gt;&lt;br /&gt;echo "***********************************************"&lt;br /&gt;echo "*    DUMP/ BACKUP PROCESS                     *"&lt;br /&gt;echo "*    Starting at :" `date` &lt;br /&gt;echo "***********************************************"&lt;br /&gt;&lt;br /&gt;# Create volume&lt;br /&gt;echo ""&lt;br /&gt;echo "Create Volume"&lt;br /&gt;VOLUME=`ec2-create-volume  --size $VOL_SIZE --region $REGION -z $AVAIL_ZONE | cut -f2`&lt;br /&gt;while ! ec2-describe-volumes $VOLUME --region $REGION | grep -q available; do sleep 1; done&lt;br /&gt;echo "Created volume " $VOLUME "with size of "  $VOL_SIZE " Gb"&lt;br /&gt;&lt;br /&gt;# Attaching volume&lt;br /&gt;echo "***********************************************"&lt;br /&gt;echo "Now attaching volume"&lt;br /&gt;STATE=`ec2-attach-volume $VOLUME --instance $INSTANCE --device $DEVICE --region $REGION | cut -f5`&lt;br /&gt;while ! ec2-describe-volumes $VOLUME --region $REGION | grep -q attached; do sleep 1; done&lt;br /&gt;echo "Volume " $VOLUME "has state : " $STATE&lt;br /&gt;&lt;br /&gt;# Building filesystemm and mounting&lt;br /&gt;echo "***********************************************"&lt;br /&gt;echo "Now building filesystem"&lt;br /&gt;while [ ! -e $DEVICE ]; do echo -n .; sleep 1; done&lt;br /&gt;sudo mkfs.ext3 -F $DEVICE&lt;br /&gt;if [ ! -d $DUMP_DIR ]&lt;br /&gt;then&lt;br /&gt;sudo mkdir $DUMP_DIR&lt;br /&gt;fi&lt;br /&gt;sudo mount $DEVICE $DUMP_DIR&lt;br /&gt;sudo chown postgres:postgres $DUMP_DIR&lt;br /&gt;sudo chmod 777 $DUMP_DIR&lt;br /&gt;su - postgres -c "touch file.txt"&lt;br /&gt;echo "Volume " $VOLUME "mounted on " $DUMP_DIR&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;# Dumping&lt;br /&gt;echo "**********************************************"&lt;br /&gt;echo "Dump started at " `date`&lt;br /&gt;echo "su - postgres -c $PG_DUMP $BASE | gzip | split -b $SPLITS - $DUMP_FILE"&lt;br /&gt;su - postgres -c "$PG_DUMP $BASE | gzip | split -b $SPLITS - $DUMP_FILE"&lt;br /&gt;echo "Dump ended at " `date`&lt;br /&gt;# Sending&lt;br /&gt;echo "**********************************************"&lt;br /&gt;echo "Send to S3 started at " `date`&lt;br /&gt;$S3PUT -p put $DUMP_FILE* $S3ENDPOINT&lt;br /&gt;echo "Send ended at " `date`&lt;br /&gt;#echo "**********************************************"&lt;br /&gt;#echo "Deleting local dump files"&lt;br /&gt;#rm $DUMP_DIR/$BASE.gz*&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;# Cleaning all&lt;br /&gt;sudo umount $DUMP_DIR&lt;br /&gt;echo "$DUMP_DIR unmounted"&lt;br /&gt;sudo rm -Rf $DUMP_DIR&lt;br /&gt;&lt;br /&gt;# Detaching volume&lt;br /&gt;echo "***********************************************"&lt;br /&gt;echo "Now detaching volume"&lt;br /&gt;STATE=`ec2-detach-volume $VOLUME --region $REGION | cut -f5`&lt;br /&gt;while ! ec2-describe-volumes $VOLUME --region $REGION | grep -q available; do sleep 1; done&lt;br /&gt;echo "Volume $VOLUME has now state : $STATE and can be deleted"&lt;br /&gt;ec2-delete-volume $VOLUME --region $REGION&lt;br /&gt;echo "Volume $VOLUME currently deleting"&lt;br /&gt;echo "Database dump is finished at : " `date`&lt;br /&gt;echo "***********************************************"&lt;br /&gt;echo "Cleanup process"&lt;br /&gt;echo "Starting at :" `date`&lt;br /&gt;&lt;br /&gt;LIMIT=`date --date="5 day ago" +"%Y%m%d"`&lt;br /&gt;echo $LIMIT&lt;br /&gt;echo `date '+%F %T'` - Timestamp of 5 days ago: $LIMIT&lt;br /&gt;echo `date '+%F %T'` - Getting the list of available backups&lt;br /&gt;TOTAL=0&lt;br /&gt;for FILENAME in `$S3PUT ls $S3ENDPOINT | cut -d":" -f3`; do&lt;br /&gt;if [[ $FILENAME =~ ([0-9]*)\.gz* ]]&lt;br /&gt;then&lt;br /&gt;echo ${BASH_REMATCH[1]}&lt;br /&gt;TIMESTAMP=${BASH_REMATCH[1]}&lt;br /&gt;echo `date '+%F %T'` - Reading metadata of: $FILENAME&lt;br /&gt;echo -e "\tFilename: $FILENAME"&lt;br /&gt;echo -e "\tTimestamp: $TIMESTAMP"&lt;br /&gt;if [[ $TIMESTAMP -le $LIMIT ]]; then&lt;br /&gt;let "TOTAL=TOTAL+1"&lt;br /&gt;echo -e "\tResult: Backup deleted\n"&lt;br /&gt;$S3PUT del "s3:$FILENAME"&lt;br /&gt;else&lt;br /&gt;echo -e "\tResult: Backup keeped\n"&lt;br /&gt;fi&lt;br /&gt;fi&lt;br /&gt;done&lt;br /&gt;echo `date '+%F %T'` - $TOTAL old backups removed&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;It’s working nicely on all my postgres servers. I’m currently finalizing a custom script for retrieving and re-creating a database back from the dump file collection.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;Feel free to contact me for any question.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2043980462775098032?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2043980462775098032/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2043980462775098032' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2043980462775098032'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2043980462775098032'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/11/postgresql-dumps-and-storage-on-s3_26.html' title='Postgresql dumps and storage on S3 : the sequel … using dynamic temp EBS'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/TO_Q8XY1FdI/AAAAAAAAAvA/cNuIwL2m4SQ/s72-c/image_thumb%5B8%5D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2020891959801513847</id><published>2010-11-22T02:18:00.001-08:00</published><updated>2010-12-15T08:00:45.733-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Postgresql dumps and storage on S3</title><content type='html'>Hi all,&lt;br /&gt;I recently had to manage all my backups from &lt;a href="http://www.postgresql.org/" target="_blank"&gt;Postgresql&lt;/a&gt; instances on the cloud. For all these cloud-based backup jobs, I’m using a three way approach. Maybe you think it’s a bit redundant, but I don’t like surprises.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Database dump to S3 :&lt;/strong&gt; daily, depends on criticity … &lt;/li&gt;&lt;li&gt;&lt;strong&gt;EC2 instance snapshot :&lt;/strong&gt; daily or weekly, depends … I like to know I can keep several generation of instance image, even if I do not change them quite often now … &lt;/li&gt;&lt;li&gt;&lt;strong&gt;EBS volume snapshot :&lt;/strong&gt; hourly or daily, depends on criticity, &lt;/li&gt;&lt;/ul&gt;Today, I want to share with you what I did as first backup process : &lt;strong&gt;database dump to S3. &lt;/strong&gt;Of course, you need to have knowledge of &lt;a href="http://aws.amazon.com/s3/" target="_blank"&gt;Amazon S3&lt;/a&gt; (as well as an account) for that. In the next coming articles, I will show you what I created in order to automate and manage all EC2 instances &amp;amp; volume snapshots.&lt;br /&gt;For the moment let’s go back to db dump to s3. My process is a simple shell script that will : &lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Connect to Postgresql,&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Start a database dump (database name is passed as script argument),&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Split the dump files into several pieces (chunk size is passed as script argument),&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Send the dump files into S3.&lt;/strong&gt; &lt;/li&gt;&lt;/ol&gt;Note : your postgresql server (from which you want to dump) does not have to be hosted on AWS/EC2. This process also works with on premises architecture.&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TOpDQ7B04QI/AAAAAAAAAuU/M7otNga7fFY/s1600-h/image%5B18%5D.png"&gt;&lt;img alt="image" border="0" height="280" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TOpDRV4aJ4I/AAAAAAAAAuY/UCHAGsSnXkA/image_thumb%5B10%5D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="582" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Tools of the trade : pgdump and S3CMD&lt;/h3&gt;&lt;a href="http://s3tools.org/s3cmd"&gt;&lt;img alt="image" border="0" height="47" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TOpDSJqLOyI/AAAAAAAAAuc/SYL1lraFyEI/image4%5B1%5D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="330" /&gt;&lt;/a&gt;&lt;br /&gt;I’m using a wonderful tool called &lt;strong&gt;s3cmd&lt;/strong&gt;, you can find it &lt;a href="http://s3tools.org/s3cmd" target="_blank"&gt;here&lt;/a&gt;. This tool allows you to send, retrieve and manage data on Amazon S3. It also offers bucket management, GPG encryption and https transfer. A very good command line tool that will find its place in your EC2 toolbox.&lt;br /&gt;Installing s3cmd is quite easy and, when using &lt;strong&gt;s3cmd –config&lt;/strong&gt; you will be prompted with your S3 account credentials and other params. You can later retrieve this data into the file called &lt;strong&gt;.s3cmd&lt;/strong&gt;.&lt;br /&gt;The commands are quite simple : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Send file to S3 :&lt;/strong&gt;&amp;nbsp;&lt;strong&gt;s3cmd&lt;/strong&gt; &lt;strong&gt;&lt;em&gt;put&lt;/em&gt;&lt;/strong&gt;&amp;nbsp;&lt;em&gt;file_to_send&lt;/em&gt; &lt;em&gt;your_s3_endpoint&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;List buckets :&lt;/strong&gt; &lt;strong&gt;&lt;em&gt;s3cmd ls&lt;/em&gt;&lt;/strong&gt; &lt;/li&gt;&lt;li&gt;&lt;strong&gt;List content of a bucket&lt;/strong&gt; : &lt;strong&gt;&lt;em&gt;s3cmd ls&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;“Bucket_name”&lt;/em&gt; : Bucket_name will have the form : s3://bucket_name &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Retrieve a file from a bucket :&lt;/strong&gt; &lt;strong&gt;&lt;em&gt;s3cmd get&lt;/em&gt;&lt;/strong&gt; &lt;em&gt;s3://“Bucket_name”/”file_name”.&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;As you can see, really easy usage and fits perfectly in a shell script.&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/TOpDS04nX9I/AAAAAAAAAug/IXVQV6TmP6A/s1600-h/image%5B8%5D.png"&gt;&lt;img alt="image" border="0" height="46" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TOpDTr-FkiI/AAAAAAAAAuk/tLgYr3ltFoI/image_thumb%5B4%5D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="331" /&gt;&lt;/a&gt;&lt;br /&gt;Pg_dump&lt;strong&gt; is the regular backup client application for Posgresql database. Its syntax, simplified here, is :&lt;/strong&gt; &lt;strong&gt;pg_dump “dbname” &amp;gt; outfile&lt;/strong&gt;. You can learn more about pg_dump &lt;a href="http://www.postgresql.org/docs/8.0/interactive/backup.html"&gt;here&lt;/a&gt;. For this current example, I will use the following command : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;pg_dump&lt;/strong&gt;&lt;em&gt; db_name&lt;/em&gt; | &lt;strong&gt;gzip&lt;/strong&gt; | &lt;strong&gt;split&lt;/strong&gt; &lt;em&gt;-b 50m&lt;/em&gt;&amp;nbsp;&lt;em&gt;– dump_file&lt;/em&gt; &lt;ul&gt;&lt;li&gt;&lt;em&gt;pg_dump : the command,&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;em&gt;db_name : the db name you want to dump,&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;em&gt;gzip : create gzipped archives,&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;em&gt;split : split the dump into several pieces,&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;em&gt;-b 50m : split size = 50 Mo&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;em&gt;dump_file : the dump file (the split option will create a collection with extensions like gzaa, gzab …).&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;em&gt;Add –v if you want a verbose log.&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;h3&gt;Process prerequisites &lt;/h3&gt;First, I created an internal account for my Postgresql database. This account has only rights to connect from the database server itself (or localhost). All passwords were disabled for this account, only “trust” is present in pg_hba.conf. Select rights were also granted.&lt;br /&gt;Then, I created a dedicated directory on the database server to hold the dump files. This directory will only hold the dump files for very limited time, they will be deleted after been pushed on S3 with s3cmd. I recommend to store these temp dump files on a separate and dedicated EBS volume.&lt;br /&gt;Last, we need a minimum setup on S3 side. Create the appropriate bucket hierarchy you need. Mine is quite simple for this backup process : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;1st level :&lt;/strong&gt; /postgresql-dumps &lt;ul&gt;&lt;li&gt;&lt;strong&gt;2nd level :&lt;/strong&gt; product name : “pilotroi”, the product name of my company (we have several products) &lt;ul&gt;&lt;li&gt;&lt;strong&gt;2nd level :&lt;/strong&gt; /database name, one bucket by database. This is the second parameter I will pass to the shell script. For flexibility, I use the exact same name as for the postgres database I wan to dump. This allows me to build my bucket endpoint like : &lt;ul&gt;&lt;li&gt;&lt;strong&gt;s3://postgresql-dumps/pilotroi/$BASE/"&lt;/strong&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;&lt;strong&gt;&lt;/strong&gt;&lt;br /&gt;&lt;h3&gt;Custom script&lt;/h3&gt;Below is my custom script. A lot of “echo”, hum ? As I said, I don’t like surprises and prefer having verbose logfiles. This script is cronted and needs two parameters : &lt;br /&gt;&lt;ol&gt;&lt;li&gt;&lt;strong&gt;Database name :&lt;/strong&gt; the name of the db you need to dump. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Splits :&lt;/strong&gt; the size of the chunks you want to generate. For instance, 50m is for generating files having a size of 50 Mo max. &lt;/li&gt;&lt;/ol&gt;Call the script like that, using two params ($1 and $2) : &lt;strong&gt;sh s3_pg_dump.sh client_base 50m&lt;/strong&gt;&lt;br /&gt;&lt;pre style="background-color: #eeeeee; border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; border-right: #999999 1px dashed; border-top: #999999 1px dashed; color: black; font-family: Andale Mono, Lucida Console, Monaco, fixed, monospace; font-size: 12px; line-height: 14px; overflow: auto; padding-bottom: 5px; padding-left: 5px; padding-right: 5px; padding-top: 5px; width: 100%;"&gt;&lt;code&gt;#!/bin/bash&lt;br /&gt;&lt;br /&gt;################################## &lt;br /&gt;#                                # &lt;br /&gt;# POSTGRES DUMPS FOR PILOTROI    # &lt;br /&gt;# Vincent Teyssier               # &lt;br /&gt;# 19/11/2010                     # &lt;br /&gt;#                                # &lt;br /&gt;##################################&lt;br /&gt;&lt;br /&gt;echo "******************************************************" &lt;br /&gt;echo "Database dump is starting at : " `date` &lt;br /&gt;echo "******************************************************"&lt;br /&gt;&lt;br /&gt;PG_DUMP="/usr/lib/postgresql/8.4/bin/pg_dump" &lt;br /&gt;S3PUT="/mnt/postgres/s3tools/s3cmd-1.0.0-rc1/s3cmd"&lt;br /&gt;&lt;br /&gt;BASE=$1 &lt;br /&gt;SPLITS=$2 &lt;br /&gt;DUMP_FILE="/mnt/postgres/dumps/$BASE.gz" &lt;br /&gt;S3ENDPOINT="s3://postgresql-dumps/pilotroi/$BASE/"&lt;br /&gt;&lt;br /&gt;echo "*****************************" &lt;br /&gt;echo "Parameters : " &lt;br /&gt;echo "Base : " $BASE &lt;br /&gt;echo "Splits : " $SPLITS &lt;br /&gt;echo "S3 Endpoint : " $S3ENDPOINT &lt;br /&gt;echo "*****************************"&lt;br /&gt;&lt;br /&gt;echo "Dump started at " `date` &lt;br /&gt;su - postgres -c "$PG_DUMP $BASE | gzip | split -b $SPLITS - $DUMP_FILE" &lt;br /&gt;echo "Dump ended at " `date` &lt;br /&gt;echo "*****************************" &lt;br /&gt;echo "Send to S3 started at " `date` &lt;br /&gt;$S3PUT put $DUMP_FILE* $S3ENDPOINT &lt;br /&gt;echo "Send ended at " `date` &lt;br /&gt;echo "*****************************" &lt;br /&gt;echo "Deleting local dump files" &lt;br /&gt;rm /mnt/postgres/dumps/$BASE.gz*&lt;br /&gt;&lt;br /&gt;echo "******************************************************" &lt;br /&gt;echo "Database dump is finished at : " `date` &lt;br /&gt;echo "******************************************************" &lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;&amp;nbsp;&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Process output&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;This script file will produce the following output.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TOpDUKbnKmI/AAAAAAAAAuo/PLF19jNgmrE/s1600-h/image%5B4%5D.png"&gt;&lt;img alt="image" border="0" height="304" src="http://lh4.ggpht.com/_hTlcWbt-BP4/TOpDVEtV9QI/AAAAAAAAAus/e2beQEvIy1E/image_thumb%5B2%5D.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="562" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Data is now on S3&lt;/h3&gt;&lt;br /&gt;&lt;br /&gt;Using S3 Explorer Firefox plugin, you can see all our files are now stored in their dedicated bucket on S3.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TOpDWNGTCrI/AAAAAAAAAuw/I89EUuq8OGE/s1600-h/image11.png"&gt;&lt;img alt="image" border="0" height="252" src="http://lh3.ggpht.com/_hTlcWbt-BP4/TOpDWxmpejI/AAAAAAAAAu0/GHs-YTIPtys/image_thumb5.png?imgmax=800" style="background-image: none; border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; padding-left: 0px; padding-right: 0px; padding-top: 0px;" title="image" width="747" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;As usual, drop me a message to learn more if needed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2020891959801513847?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2020891959801513847/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2020891959801513847' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2020891959801513847'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2020891959801513847'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/11/postgresql-dumps-and-storage-on-s3.html' title='Postgresql dumps and storage on S3'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/TOpDRV4aJ4I/AAAAAAAAAuY/UCHAGsSnXkA/s72-c/image_thumb%5B10%5D.png?imgmax=800' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2231178031621183345</id><published>2010-11-09T04:36:00.001-08:00</published><updated>2010-11-09T04:36:57.323-08:00</updated><title type='text'>Open source Datamining with R</title><content type='html'>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;Today, I want to give you a very valuable link. You maybe know I’m currently using cloud architectures, map/reduce … etc … but I’m also implementing open source datamining tools like RapidMiner and the famous &lt;strong&gt;R&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;You can find &lt;a href="http://romainfrancois.blog.free.fr/"&gt;&lt;strong&gt;here&lt;/strong&gt;&lt;/a&gt; a very interesting link – I mean a goldmine – about THE french &lt;strong&gt;R&lt;/strong&gt; specialist : &lt;strong&gt;Romain François&lt;/strong&gt;. His blog contains anything you want to know about &lt;strong&gt;R&lt;/strong&gt; : from implementation to advanced features and analytics.&lt;/p&gt;  &lt;p&gt;He recently gave me some precious advices about RProtobuf custom compilation.&lt;/p&gt;  &lt;p&gt;Enjoy our visit.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://www.r-project.org/index.html"&gt;&lt;img style="display: block; float: none; margin-left: auto; margin-right: auto" border="0" alt="R logo" src="http://www.r-project.org/Rlogo.jpg" width="152" height="118" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2231178031621183345?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2231178031621183345/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2231178031621183345' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2231178031621183345'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2231178031621183345'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/11/open-source-datamining-with-r.html' title='Open source Datamining with R'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3135730850290267309</id><published>2010-11-03T04:05:00.001-07:00</published><updated>2010-11-03T04:05:16.681-07:00</updated><title type='text'>Meeting with Brian Gentile (CEO – Jaspersoft) in Paris</title><content type='html'>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;Yesterday, I had the pleasure to meet &lt;strong&gt;Brian Gentile&lt;/strong&gt;, &lt;strong&gt;&lt;a href="http://www.jaspersoft.com"&gt;Jaspersoft&lt;/a&gt; CEO&lt;/strong&gt;, in Paris. We spoke during 1h30 around a coffee about BI, french market, new trends, cloud computing, etc … and of course about JasperSoft&lt;font size="1"&gt; (October 7, 2010, Jaspersoft announced &lt;/font&gt;&lt;a href="http://www.jaspersoft.com/press/jaspersoft-delivers-the-most-powerful-and-affordable-business-intelligence-reporting-server-fo"&gt;&lt;font size="1"&gt;JasperReports Server Professional Edition&lt;/font&gt;&lt;/a&gt;&lt;font size="1"&gt;).&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;It was delightful to speak with someone having a true 360° vision of our industry, having no limits in imagination and providing elegant yet realistic / pragmatic visions of the future. Visions are something really important for me and my way of doing business. I’m often sad to see we have little visions in France …&lt;/p&gt;  &lt;p&gt;I’m not Jaspersoft spokesman, but I can say the following topics are - right now – part of their technology roadmap. And that’s really exciting !&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;Hive, &lt;/li&gt;    &lt;li&gt;Distributed data processing with Hadoop, map / reduce …, &lt;/li&gt;    &lt;li&gt;Cloud computing, &lt;/li&gt;    &lt;li&gt;Accurate geo location features, using best of breed APIs, &lt;/li&gt;    &lt;li&gt;Data analysis with interactions with – to name one – R, &lt;/li&gt;    &lt;li&gt;Large scale analytics : Unstructured data to predictive, &lt;/li&gt;    &lt;li&gt;Elegant mobile BI and real time BI for, as Brian Gentile says, “analytics everywhere”. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;Thanks to &lt;strong&gt;Naoual Mameche&lt;/strong&gt; – &lt;strong&gt;EMEA Account Manager&lt;/strong&gt; – for the meeting organization.&lt;/p&gt;  &lt;p align="left"&gt;You can read more from Brian Gentile on his blog called &lt;strong&gt;&lt;a href="http://openbookonbi.blogspot.com"&gt;The Open Book on BI&lt;/a&gt;&lt;/strong&gt;.&lt;/p&gt;  &lt;p align="center"&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/TNFB6VXhiuI/AAAAAAAAAuM/WILF5ML3tJ4/s1600-h/Vincent%26BrianGentile%5B3%5D.jpg"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="Vincent&amp;amp;BrianGentile" border="0" alt="Vincent&amp;amp;BrianGentile" src="http://lh5.ggpht.com/_hTlcWbt-BP4/TNFB63eAROI/AAAAAAAAAuQ/YP_uVUKFLxA/Vincent%26BrianGentile_thumb%5B1%5D.jpg?imgmax=800" width="309" height="406" /&gt;&lt;/a&gt;&lt;font size="1"&gt;Shooot, I just turned 37 y/o and my starting baldness is gaining ground …&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;As a reminder, I implemented Jaspersoft for &lt;a href="http://www.score-md.com"&gt;Score-MD&lt;/a&gt; this year (ASP style datamining / scoring start-up for which I’m currently working). We are using a cloud infrastructure on EC2 and we are using Jaspersoft for BI &amp;amp; reporting features delivered to our final customers (some are ww and 24/24).&lt;/p&gt;  &lt;p&gt;Also implemented in Score-MD, full cloud : Talend (TIS), PostgreSQL, R, Rapid Miner and Protocol Buffers.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3135730850290267309?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3135730850290267309/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3135730850290267309' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3135730850290267309'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3135730850290267309'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/11/meeting-with-brian-gentile-ceo.html' title='Meeting with Brian Gentile (CEO – Jaspersoft) in Paris'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_hTlcWbt-BP4/TNFB63eAROI/AAAAAAAAAuQ/YP_uVUKFLxA/s72-c/Vincent%26BrianGentile_thumb%5B1%5D.jpg?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8266319556313428332</id><published>2010-10-14T07:05:00.001-07:00</published><updated>2010-12-15T08:01:02.114-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Modified Firefox ElasticFox plugin (handling micro instances)</title><content type='html'>Hi all,&lt;br /&gt;&lt;strong&gt;ElasticFox&lt;/strong&gt; is a wonderful plugin for you guys, AWS and Firefox geeks. &lt;br /&gt;We all know AWS added a new instance size recently : &lt;strong&gt;micro&lt;/strong&gt;. Micro instances are really small setups with approx 650 Mo RAM and are very useful to build small servers, controllers, dns, ftps …&lt;br /&gt;Unfortunately I’m afraid there is no “micro” instance size in ElasticFox so far, have a look below.&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/TLcOHvmH_8I/AAAAAAAAAt8/jVZna4cFxSA/s1600-h/image%5B11%5D.png"&gt;&lt;img alt="image" border="0" height="401" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TLcOICfqOHI/AAAAAAAAAuA/pmud34yPI5A/image_thumb%5B5%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="299" /&gt;&lt;/a&gt; &lt;br /&gt;That’s why I did my own hack. Note : this is a personal hack, not a new version of ElasticFox. Now you can launch a new instance using “&lt;strong&gt;micro&lt;/strong&gt;” size. This is working perfectly for me (beware using micro instances with Ubuntu Lucid : issues with the fstab as described in &lt;a href="http://alestic.com/"&gt;http://alestic.com/&lt;/a&gt;). &lt;br /&gt;Please let me know if this hack is usefull and working for you as well.&lt;br /&gt;You can download this Elasticfox plugin &lt;a href="http://www.score-md.com/elasticfox/elasticfox-1.7.000000.xpi"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;div align="center"&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/TLcOITF5xdI/AAAAAAAAAuE/DToDQhfw90o/s1600-h/image%5B27%5D.png"&gt;&lt;img alt="image" border="0" height="404" src="http://lh3.ggpht.com/_hTlcWbt-BP4/TLcOI96kWBI/AAAAAAAAAuI/GAs39cPs5eE/image_thumb%5B13%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="301" /&gt;&lt;/a&gt;&lt;span style="font-size: xx-small;"&gt;The new micro instance size&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8266319556313428332?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8266319556313428332/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8266319556313428332' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8266319556313428332'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8266319556313428332'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/10/modified-firefox-elasticfox-plugin.html' title='Modified Firefox ElasticFox plugin (handling micro instances)'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/TLcOICfqOHI/AAAAAAAAAuA/pmud34yPI5A/s72-c/image_thumb%5B5%5D.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2825071196429529823</id><published>2010-10-11T15:22:00.001-07:00</published><updated>2010-12-15T08:01:55.403-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='JasperSoft'/><title type='text'>Easy spatial reporting with jscript and JasperSoft</title><content type='html'>Hi all,&lt;br /&gt;Today, a quick post to demonstrate how to create some spatial reporting using simple javascript and Jasper server. Hey, I was about to forgot … everything is hosted on Amazon Cloud Computing of course&amp;nbsp; ;)).&lt;br /&gt;Here is the scenario : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Extract data from database : french departments and a simple KPI : total number of contacts.&lt;/li&gt;&lt;li&gt;Create an XML file for the extracted data because we don’t want live queries running on the database. Regarding the limited amount for data we need, it’s better to query a simple xml file here.&lt;/li&gt;&lt;li&gt;Add spatial coordinates (GPS coordinates for each french department) using Google API or Yahoo API.&lt;/li&gt;&lt;li&gt;Implement some javascript to : &lt;/li&gt;&lt;ul&gt;&lt;li&gt;Call a google map, &lt;/li&gt;&lt;li&gt;Place markers on the map, for each french department.&lt;/li&gt;&lt;li&gt;Add some html to display the values for each map marker (number of contacts).&lt;/li&gt;&lt;/ul&gt;&lt;/ul&gt;&lt;h3&gt;Data files&lt;/h3&gt;Ok, let’s go : extract data and generate XML file containing our data. Easy, here is the output using internal xml functions of Posgresql. Below we have an excerpt showing the structure : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;name : number for french department (from 01 to 95)&lt;/li&gt;&lt;li&gt;nb_conctacts : our main KPI, the number of contacts&lt;/li&gt;&lt;li&gt;lat and lng : latitude and longitude.&lt;/li&gt;&lt;/ul&gt;I will explain, in a future post, how to geocode data using different APIs. For now, consider we have spatial data coming out of the database.&lt;br /&gt;I will create two files : one for Paris and suburbs, the other one for whole french territory. Like this I will be able to choose between a focus on Paris (economic center) or the rest of the country. The names are 'cp_stats-idf.xml' (Paris and suburbs) and 'cp_stats-province.xml' (rest of the country).&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;lt;?xml version="1.0" encoding="ISO-8859-1" standalone="no"?&amp;gt; &lt;br /&gt;&amp;lt;markers&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;marker&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;name&amp;gt;01&amp;lt;/name&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;nb_contacts&amp;gt;39600&amp;lt;/nb_contacts&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;lat&amp;gt;46.24757&amp;lt;/lat&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;lng&amp;gt;5.1307683&amp;lt;/lng&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/marker&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;marker&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;name&amp;gt;02&amp;lt;/name&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;nb_contacts&amp;gt;25507&amp;lt;/nb_contacts&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;lat&amp;gt;49.47692&amp;lt;/lat&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;lng&amp;gt;3.4417367&amp;lt;/lng&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/marker&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;marker&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;name&amp;gt;03&amp;lt;/name&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;nb_contacts&amp;gt;14143&amp;lt;/nb_contacts&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;lat&amp;gt;46.311554&amp;lt;/lat&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;lng&amp;gt;3.4167655&amp;lt;/lng&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/marker&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;marker&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;name&amp;gt;04&amp;lt;/name&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;nb_contacts&amp;gt;14814&amp;lt;/nb_contacts&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;lat&amp;gt;44.077873&amp;lt;/lat&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;lng&amp;gt;6.2375946&amp;lt;/lng&amp;gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; &amp;lt;/marker&amp;gt; &lt;br /&gt;&amp;lt;markers&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;h3&gt;Coding&lt;/h3&gt;Now, let’s use some Javascript. First, we need the &lt;strong&gt;jQuery&lt;/strong&gt; library, written by John Resig (&lt;a href="http://jquery.com/" title="http://jquery.com"&gt;http://jquery.com/&lt;/a&gt;). Download it and put the library in your project directory.&lt;br /&gt;Now, let’s code the second Javascript file, the one that will create the Google Map and place the markers. Here is the code, I called it ‘google_put_markers.js’.&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;$(document).ready(function() { &lt;br /&gt;&amp;nbsp; $("#map").css({ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; height: 500, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; width: 700 &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }); &lt;br /&gt;var LatLngCoord= new google.maps.LatLng(48.859112, 2.346800); &lt;br /&gt;MAP.init('#map', LatLngCoord, 11);&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;nbsp; $("#show_PAR").click(function(e){ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MAP.PutMarkers('cp_stats-idf.xml');&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp; });&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $("#show_PRO").click(function(e){ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MAP.PutMarkers('cp_stats-province.xml');&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp; }); &lt;br /&gt;}); &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;var markersArray = []; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;function clearOverlays() { &lt;br /&gt;&amp;nbsp; if (markersArray) { &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for (i in markersArray) { &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; markersArray[i].setMap(null); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; } &lt;br /&gt;&amp;nbsp; } &lt;br /&gt;} &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;var MAP = { &lt;br /&gt;&amp;nbsp; map: null, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; bounds: null &lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;MAP.init = function(selector, latLng, zoom) { &lt;br /&gt;&amp;nbsp; var myOptions = { &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; zoom:zoom, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; center: latLng, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; mapTypeId: google.maps.MapTypeId.ROADMAP &lt;br /&gt;&amp;nbsp; } &lt;br /&gt;&amp;nbsp; this.map = new google.maps.Map($(selector)[0], myOptions); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; this.bounds = new google.maps.LatLngBounds(); &lt;br /&gt;} &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;MAP.PutMarkers= function(filename) { &lt;br /&gt;clearOverlays(); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; $.get(filename, function(xml){ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; $(xml).find("marker").each(function(){ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var name = $(this).find('name').text(); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var address = $(this).find(nb_contacts).text(); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var lat = $(this).find('lat').text(); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var lng = $(this).find('lng').text(); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var point = new google.maps.LatLng(parseFloat(lat),parseFloat(lng)); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MAP.bounds.extend(point); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var image = 'logo.png'; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var marker = new google.maps.Marker({ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; position: point, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; map: MAP.map, &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; title:"Statistiques", &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; icon: image &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var infoWindow = new google.maps.InfoWindow(); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; var html='&amp;lt;strong&amp;gt;Département : '+name+'&amp;lt;/strong.&amp;gt;&amp;lt;br /&amp;gt;'+address + ' contacts'; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; markersArray.push(marker); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; google.maps.event.addListener(marker, 'click', function() { &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; infoWindow.setContent(html); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; infoWindow.open(MYMAP.map, marker); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MAP.map.fitBounds(MAP.bounds); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; }); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; }); &lt;br /&gt;}&lt;/span&gt;&lt;br /&gt;As you can see, we first create a map, centered on France / Paris. We have several functions for cleaning the map (when changing point of view between regional and Paris), placing markers (two buttons) on that map and displaying infowindows.&lt;br /&gt;&lt;h3&gt;HTML&lt;/h3&gt;Ok, almost done. Now we need a simple html file to put everything in it. Here is the file. As you can see, I created two buttons : one for displaying the markers for the whole french territory, and the other for displaying only markers for the core economic center (Paris and suburb). I call this file ‘display_geostats.html’.&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;&amp;lt;!DOCTYPE html&amp;gt; &lt;br /&gt;&amp;lt;html&amp;gt; &lt;br /&gt;&amp;lt;head&amp;gt; &lt;br /&gt;&amp;lt;title&amp;gt;Google Maps&amp;lt;/title&amp;gt; &lt;br /&gt;&amp;lt;script type="text/javascript" src="&lt;/span&gt;&lt;a href="http://maps.google.com/maps/api/js?sensor=false&amp;quot;"&gt;&lt;span style="font-size: xx-small;"&gt;http://maps.google.com/maps/api/js?sensor=false"&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size: xx-small;"&gt;&amp;gt;&amp;lt;/script&amp;gt; &lt;br /&gt;&amp;lt;script type="text/javascript" src="js/google_put_markers.js"&amp;gt;&amp;lt;/script&amp;gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;lt;script type="text/javascript" src="js/jquery-1.4.1.min.js"&amp;gt;&amp;lt;/script&amp;gt; &lt;br /&gt;&amp;lt;/head&amp;gt; &lt;br /&gt;&amp;lt;body&amp;gt;&amp;lt;div id="map"&amp;gt;&amp;lt;/div&amp;gt; &lt;br /&gt;&amp;lt;div &amp;gt;&amp;lt;input type="button" id="show_PAR" value="Paris et region parisienne" /&amp;gt;&amp;lt;input type="button" id="show_PRO" value="Province" /&amp;gt;&amp;lt;/div&amp;gt; &lt;br /&gt;&amp;lt;div &amp;gt;&amp;lt;/div&amp;gt; &lt;br /&gt;&amp;lt;/body&amp;gt; &lt;br /&gt;&amp;lt;/html&amp;gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;h3&gt;Last checking&lt;/h3&gt;Ok, this is what we should have now : two XML data files, a javascript directory (called typically ‘js’) containing two files (the jquery lib and the file for displaying the markers I called ‘google_put_markers.js’), and the logo file (png). Below the snapshot of my setup.&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/TLONxgLvFYI/AAAAAAAAAtM/TAIWC8Xojg4/s1600-h/image%5B2%5D.png"&gt;&lt;img alt="image" border="0" height="114" src="http://lh3.ggpht.com/_hTlcWbt-BP4/TLONxwMENLI/AAAAAAAAAtQ/SvSjoNIcVFA/image_thumb.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="244" /&gt;&lt;/a&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/TLONyceXnrI/AAAAAAAAAtU/65xsBtb-efw/s1600-h/image%5B5%5D.png"&gt;&lt;img alt="image" border="0" height="59" src="http://lh3.ggpht.com/_hTlcWbt-BP4/TLONy1v4-iI/AAAAAAAAAtY/dF_GxE3qYnw/image_thumb%5B1%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="200" /&gt;&lt;/a&gt;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&lt;h3&gt;Let’s test it&lt;/h3&gt;Ok, now let’s open the html file. You should have something like below. By default, the we can see the view called ‘Province’ which means ‘the whole country’ (pic 1). Each french department has a marker (on this map the marker represents the logo of the company I’m currently working for). If we click one of this marker, we will display the typical Google Map html box with the KPI called ‘number of contacts” in it (pic2).&lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;Ps : better results with Firefox …&lt;/span&gt;&lt;br /&gt;&lt;div align="center"&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/TLON15z8FVI/AAAAAAAAAtc/yO2giZotmTg/s1600-h/image%5B9%5D.png"&gt;&lt;img alt="image" border="0" height="371" src="http://lh5.ggpht.com/_hTlcWbt-BP4/TLON384yA8I/AAAAAAAAAtg/qz3KutQM9uo/image_thumb%5B3%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="486" /&gt;&lt;/a&gt;&lt;em&gt; Pic 1 : the whole country&lt;/em&gt;&lt;/div&gt;&lt;br /&gt;&lt;div align="center"&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/TLON6uR5PUI/AAAAAAAAAtk/9Jpan42u20Y/s1600-h/image%5B15%5D.png"&gt;&lt;img alt="image" border="0" height="376" src="http://lh5.ggpht.com/_hTlcWbt-BP4/TLON8TQsjWI/AAAAAAAAAto/vToNxc5poAs/image_thumb%5B7%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="486" /&gt;&lt;/a&gt; &lt;em&gt;Pic 2 : displaying the content of a marker&lt;/em&gt;&lt;/div&gt;Now let’s click on the button ‘Paris et region parisienne’ (means Paris and the suburbs) and we will display a different map : a zoom on Paris and new markers for this region.&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/TLON_E0YHWI/AAAAAAAAAts/SO8I0ZUjBTQ/s1600-h/image%5B19%5D.png"&gt;&lt;img alt="image" border="0" height="370" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TLOOA_-rmTI/AAAAAAAAAtw/jQGViTDN53Q/image_thumb%5B9%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="484" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;h3&gt;JasperSoft / JasperServer integration&lt;/h3&gt;Ok, now we need to implement everything into JasperServer. Easy, deploy the whole directory we created above (html renamed into jsp file, data files, js directory with the two javascript files). Create the report like you would do for any other report and then you are ready to display it inside your JasperServer portal. &lt;br /&gt;&lt;span style="font-size: xx-small;"&gt;Sorry, I can’t give you the portal address … ;)&lt;/span&gt;&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/TLOOCoX9NiI/AAAAAAAAAt0/TyTGXpPUuMo/s1600-h/image%5B30%5D.png"&gt;&lt;img alt="image" border="0" height="391" src="http://lh6.ggpht.com/_hTlcWbt-BP4/TLOOD5jHInI/AAAAAAAAAt4/OaTuoY4xREM/image_thumb%5B14%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="628" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;h3&gt;Summary&lt;/h3&gt;This was a very simple (and short) example on how to : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;generate maps with Google Maps,&lt;/li&gt;&lt;li&gt;Place markers on this map&lt;/li&gt;&lt;li&gt;Display dynamic information (coming from db) inside the markers&lt;/li&gt;&lt;li&gt;Put everything, simply, into JasperSoft portal.&lt;/li&gt;&lt;/ul&gt;In the next future, I will code more dynamic routines allowing to display icons inside the infowindows and adding colorization / dynamic formatting …&lt;br /&gt;Please feel free to contact me for deepest explanations or more details.&lt;br /&gt;Next article to come will be about using a dozen lines of java to run geocoding lookups with Google or Yahoo (quickest !).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2825071196429529823?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2825071196429529823/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2825071196429529823' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2825071196429529823'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2825071196429529823'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/10/easy-spatial-reporting-with-jscript-and.html' title='Easy spatial reporting with jscript and JasperSoft'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/TLONxwMENLI/AAAAAAAAAtQ/SvSjoNIcVFA/s72-c/image_thumb.png?imgmax=800' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8842070524052111180</id><published>2010-09-17T01:26:00.000-07:00</published><updated>2010-09-17T01:26:59.571-07:00</updated><title type='text'>News !!</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;Ive been quite busy since June. Sorry for that.&lt;br /&gt;I'm currently participating in building a datamining company. We offer direct marketing scoring and segmentation in asp mode.&lt;br /&gt;Quite a lot of work here.&lt;br /&gt;&lt;br /&gt;Our architecture is : &lt;br /&gt;- Postgresql (around 1To)&lt;br /&gt;- JasperSoft&lt;br /&gt;- Talend (TIS)&lt;br /&gt;- R, RapidMiner&lt;br /&gt;- Ubuntu.&lt;br /&gt;&lt;br /&gt;I have done a cloud implementation on AWS. Works really fine, and really a good thing to manage IT cost as expenses (no CAPEX, only OPEX). 4000 $ per month for the whole thing since I precisely manage the uptime and downtine dynamically depending on the workload and processing windows.&lt;br /&gt;Today, we are in production and have 8 more clients to implement before 31/12/2010.&lt;br /&gt;&lt;br /&gt;We did a lot of open source as you see, and we had to create a lot of programs, plugins etc ...&lt;br /&gt;I have, in stock, enough material to write 100 articles here !&lt;br /&gt;I promise I will soon find some time to share all this knowledge with you.&lt;br /&gt;&lt;br /&gt;Best regards,&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8842070524052111180?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8842070524052111180/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8842070524052111180' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8842070524052111180'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8842070524052111180'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/09/news.html' title='News !!'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1773134478679974798</id><published>2010-05-18T08:54:00.001-07:00</published><updated>2010-12-15T08:02:15.518-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Postgresql'/><title type='text'>Data replication with Postgresql and Slony</title><content type='html'>Hi all,&lt;br /&gt;I have been working on data replication around Postgresql. For that purpose, I used the famous Slony replication engine. Very nice piece of work but a cruel lack of documentation and how to. So, here is mine. First I want to say that my work was inspired after reading a similar article called “PostgreSQL, Slony-I, pgadmin, win32, First time config / installation” found on &lt;a href="http://ccstone.blogspot.com/2009/03/postgresql-slony-i-pgadmin-win32-first.html"&gt;ChiChing means&lt;/a&gt; blog.&lt;br /&gt;&lt;h4&gt;The tools&lt;/h4&gt;For this tutorial, you need your favourite linux box, running a &lt;a href="http://www.postgresql.org/"&gt;Postgresql&lt;/a&gt; database and &lt;a href="http://www.slony.info/"&gt;Slony&lt;/a&gt; installed. I encourage you to use the &lt;a href="http://pgfoundry.org/projects/stackbuilder"&gt;Postgres Application Stack builder&lt;/a&gt; which is a ”download and installation wizard combined with a set of pre-configured packages to complement PostgreSQL's one-click installers on Windows, Mac and Linux.” Very nice installer that will take care of all the deployment process. On top of that, you will also use your trusted pgadmin because my tutorial is about setting up replication by mainly using the gui (we’ll see later for the script mode …).&lt;br /&gt;Of course, you need to be root.&lt;br /&gt;&lt;div align="center"&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S_K3-gq1ESI/AAAAAAAAAqs/KSNMpeb1mtQ/s1600-h/image%5B3%5D.png"&gt;&lt;img alt="image" border="0" height="287" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S_K3-3yYCKI/AAAAAAAAAqw/bHgPwOh51IM/image_thumb%5B1%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="409" /&gt;&lt;/a&gt; &lt;em&gt;Stack installer main screen&lt;/em&gt;&lt;/div&gt;&lt;div align="center"&gt;&lt;em&gt;&lt;/em&gt;&lt;/div&gt;&lt;h4&gt;Database setup&lt;/h4&gt;For this tutorial, you want a source table – Fact_Table in Database SOURCE and Schema A -&amp;nbsp; to be replicated into a target table – Fact_Table in Database TARGET and Schema A (same schema name). For the moment, I’m still having troubles to replicate data into a target schema with different name.&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S_K3_EOTEwI/AAAAAAAAAq0/ooYeefZqIeE/s1600-h/image%5B35%5D.png"&gt;&lt;img alt="image" border="0" height="319" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K3_0yb81I/AAAAAAAAAq4/nLXGvJx1tMU/image_thumb%5B15%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="456" /&gt;&lt;/a&gt;&lt;br /&gt;If you don’t have a table ready for that tutorial, here is a code for creating a small and very stupid fact table. Any table you want to replicate needs to have a unique constraint (create a primary key and go on). Create it on SOURCE and TARGET database.&lt;br /&gt;&lt;blockquote&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;CREATE TABLE “Schema_A'”.”Fact_Table”&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;(&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;“ID” bigint NOT NULL,&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;“DATE” date,&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;“PRODUCT” chararcter(100),&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;“PRICE” double precision,&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;“Quantity” double precision,&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;CONSTRAINT “PK_FACTS” PRIMARY KEY (“ID”)&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="color: #004080;"&gt;&lt;strong&gt;)&lt;/strong&gt;&lt;/span&gt;&lt;/blockquote&gt;In this tutorial, I assume the following : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;You want to replicate data from database=SOURCE, schema=Schema_A and table=Fact_Table to database=TARGET, schema=Schema_A and table=Fact_Table.&lt;/li&gt;&lt;li&gt;You use the built in postgresql user&lt;/li&gt;&lt;li&gt;Both source and target schema / tables have been previously created. Slony can manage DDL change / propagation during replication but this will be explained in a further article.&lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Setting up a simple replication process with pgadmin III&lt;/h4&gt;First create 2 databases, one called SOURCE and the other called TARGET.&amp;nbsp; The &lt;strong&gt;SOURCE&lt;/strong&gt; will be the &lt;strong&gt;Master&lt;/strong&gt;, while the &lt;strong&gt;TARGET&lt;/strong&gt; will be the &lt;strong&gt;Slave&lt;/strong&gt;. Right click on database and choose “New Database” (see below). Once done, you will have 2 more dbs in your object browser (below).&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4AH6ENVI/AAAAAAAAAq8/U4sqQ_l0pzA/s1600-h/image%5B8%5D.png"&gt;&lt;img alt="image" border="0" height="317" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4AjFCXfI/AAAAAAAAArA/QirozsZ5Tzg/image_thumb%5B4%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="484" /&gt;&lt;/a&gt;&lt;br /&gt;Now it is time to create the replication objects : everything has to be done into the Replication items inside the two newly created databases. The thery is simple : the two replication objects need references to each other, have a look below : &lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4BP8mRZI/AAAAAAAAArE/x2_DjeQXmmM/s1600-h/image%5B88%5D.png"&gt;&lt;img alt="image" border="0" height="282" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4BrjC4cI/AAAAAAAAArI/M1ApmRAo2OY/image_thumb%5B42%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="463" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h5&gt;1 - Create base replication objects (screen caps below)&lt;/h5&gt;For the Master : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Right click on SOURCE: Replication and choose "New Slony-I Cluster"&lt;/li&gt;&lt;li&gt;Join existing cluster : Do not check this !&lt;/li&gt;&lt;li&gt;Cluster name : choose a name, here it is : REP_CLUSTER&lt;/li&gt;&lt;li&gt;Local Node Left : 1 &lt;/li&gt;&lt;li&gt;Local Node Right : Master Node&lt;/li&gt;&lt;li&gt;Admin Node Left : 99&lt;/li&gt;&lt;li&gt;Admin Node Right : Admin Node&lt;/li&gt;&lt;li&gt;Press OK.&lt;/li&gt;&lt;/ul&gt;For the Slave : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Right click on TARGET : Replication and choose "New Slony-I Cluster"&lt;/li&gt;&lt;li&gt;Join existing cluster : Has to be checked&lt;/li&gt;&lt;li&gt;Server : localhost&lt;/li&gt;&lt;li&gt;Database : SOURCE&lt;/li&gt;&lt;li&gt;Cluster name : REP_CLUSTER (should be automatically recognized)&lt;/li&gt;&lt;li&gt;Local Node Left : 10 &lt;/li&gt;&lt;li&gt;Local Node Right : Slave Node&lt;/li&gt;&lt;li&gt;Admin Node : 99 - Admin Node&lt;/li&gt;&lt;li&gt;Press OK.&lt;/li&gt;&lt;/ul&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4B7DPzDI/AAAAAAAAArM/l6vTTAM861k/s1600-h/image%5B15%5D.png"&gt;&lt;img alt="image" border="0" height="244" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4CdUlq3I/AAAAAAAAArQ/Cvb6UJj1K4M/image_thumb%5B7%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="195" /&gt;&lt;/a&gt;&amp;nbsp; &lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4C4djE-I/AAAAAAAAArU/RTA8EugNGYk/s1600-h/image%5B21%5D.png"&gt;&lt;img alt="image" border="0" height="244" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S_K4DLphH3I/AAAAAAAAArY/gDGNmAiiFdQ/image_thumb%5B9%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="195" /&gt;&lt;/a&gt;&lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;em&gt;Master Node&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Slave Node&lt;/em&gt;&lt;br /&gt;&lt;h5&gt;2 – Create path setup (screen caps below)&lt;/h5&gt;For the Master : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Right click on SOURCE : Replication : sample : Nodes : Master Node : Path and choose New path&lt;/li&gt;&lt;li&gt;Server : 10 - Slave Node&lt;/li&gt;&lt;li&gt;Connect Info : host=localhost port=5432 user=postgres password=yourpassword dbname=TARGET&lt;/li&gt;&lt;li&gt;Conn retry : 10&lt;/li&gt;&lt;li&gt;Press OK.&lt;/li&gt;&lt;/ul&gt;For the Slave : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;Right click on TARGET : Replication : sample : Nodes : Slave Node : Path and choose New path&lt;/li&gt;&lt;li&gt;Server : 1 - Master Node&lt;/li&gt;&lt;li&gt;Connect Info : host=localhost port=5432 user=postgres password=yourpassword dbname=SOURCE&lt;/li&gt;&lt;li&gt;Conn retry : 10&lt;/li&gt;&lt;li&gt;Press OK.&lt;/li&gt;&lt;/ul&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S_K4DkeRtpI/AAAAAAAAArc/axjg-QSMnmI/s1600-h/image%5B31%5D.png"&gt;&lt;img alt="image" border="0" height="190" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4EPpwWNI/AAAAAAAAArg/fEbIzgdaQus/image_thumb%5B13%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="244" /&gt;&lt;/a&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S_K4Ei8jXXI/AAAAAAAAArk/o_ixeytT1YM/s1600-h/image%5B28%5D.png"&gt;&lt;img alt="image" border="0" height="190" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4E4kRl3I/AAAAAAAAAro/y7fiMHP8vgI/image_thumb%5B12%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="244" /&gt;&lt;/a&gt;&amp;nbsp;&amp;nbsp; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Path for &lt;em&gt;Master Node&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Path for Slave Node&lt;/em&gt;&lt;br /&gt;&lt;h5&gt;3 – Define replication set (screen caps below)&lt;/h5&gt;&lt;ul&gt;&lt;li&gt;Right click on db SOURCE : Replication : REP_CLUSTER: Replication Sets and choose new Replication Set&lt;/li&gt;&lt;li&gt;ID : 1&lt;/li&gt;&lt;li&gt;Comment : Replication Set 1&lt;/li&gt;&lt;/ul&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4FVI1v0I/AAAAAAAAArs/5fhjKTZFX_M/s1600-h/image%5B39%5D.png"&gt;&lt;img alt="image" border="0" height="236" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4Fzm9CwI/AAAAAAAAArw/ZMXG7pVwhhE/image_thumb%5B17%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="302" /&gt;&lt;/a&gt;&lt;br /&gt;Now we choose the table we want to be replicated from SOURCE to TARGET.&lt;br /&gt;&lt;ul&gt;&lt;li&gt;Right click on Replication Sets : sample set : Tables and choose New tables&lt;/li&gt;&lt;li&gt;Table : Any table of your choice&lt;/li&gt;&lt;li&gt;ID : 1&lt;/li&gt;&lt;li&gt;Index : Auto filled when choosing table. That’s important to have a unique constraint on the table you want to replicate (primary key for instance).&lt;/li&gt;&lt;li&gt;Comment : a nice comment&lt;/li&gt;&lt;li&gt;Press OK.&lt;/li&gt;&lt;/ul&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4GPf-MhI/AAAAAAAAAr0/n0PaQMV0K3k/s1600-h/image%5B43%5D.png"&gt;&lt;img alt="image" border="0" height="243" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S_K4GWciRvI/AAAAAAAAAr4/F7Ap91WfLd4/image_thumb%5B19%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="311" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;h5&gt;3 – Create a new Subscription (screen caps below)&lt;/h5&gt;&lt;ul&gt;&lt;li&gt;Right click on SOURCE : Replication : sample : Replication Sets : sample set : Subscription and choose New subscription.&lt;/li&gt;&lt;li&gt;Origin : write “1”&lt;/li&gt;&lt;li&gt;Provider : Choose “1 - Master Node”&lt;/li&gt;&lt;li&gt;Receiver : Choose “10 - Slave Node”&lt;/li&gt;&lt;li&gt;Can forward : Keep it unchecked&lt;/li&gt;&lt;li&gt;Press OK.&lt;/li&gt;&lt;/ul&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4G6NbRRI/AAAAAAAAAr8/Yn-fMs5Oz6k/s1600-h/image%5B47%5D.png"&gt;&lt;img alt="image" border="0" height="253" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4HQZ_uuI/AAAAAAAAAsA/PgJ10Vrn__s/image_thumb%5B21%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="319" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;h5&gt;4 – Object browser overview&lt;/h5&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; &lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S_K4HijxtZI/AAAAAAAAAsE/2zp584GrFyI/s1600-h/image%5B51%5D.png"&gt;&lt;img alt="image" border="0" height="446" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4IAjMf2I/AAAAAAAAAsI/b_IG3v2q6T4/image_thumb%5B23%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="254" /&gt;&lt;/a&gt;&amp;nbsp;&amp;nbsp; &lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4IkYE9rI/AAAAAAAAAsM/_gpbEaZNZVo/s1600-h/image%5B56%5D.png"&gt;&lt;img alt="image" border="0" height="452" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4IzZnqzI/AAAAAAAAAsQ/CE21WJFlehw/image_thumb%5B26%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="253" /&gt;&lt;/a&gt; &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Replication object for the Master&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; Replication object for the Slave&lt;br /&gt;&lt;br /&gt;Well, we are done with the database setup using pgAdmin III. Now it’s time to … start the replication !&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;Start the replication&lt;/h4&gt;Now it’s time to write some linux commands. Go to /opt/PostgreSQL/8.3/bin (or any other location depending on your linux flavour and / or Postgresql installation path). You will find there some interesting scripts. Have a look to &lt;strong&gt;slon&lt;/strong&gt;. This is the slony daemon you will start twice : one for the MASTER, the other for the SLAVE. &lt;br /&gt;Note : if you were setting up replication between two physical servers, you would have to start &lt;strong&gt;slon&lt;/strong&gt; on each machine.&lt;br /&gt;&lt;strong&gt;slon&lt;/strong&gt; syntax is quite simple, here it is : &lt;br /&gt;&lt;strong&gt;slon [Options] [ClusterName] [ConnexionInfos]&lt;/strong&gt;&lt;br /&gt;For the Master : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;./slon –d 1 REP_CLUSTER “user=postgres password=vincentt host=localhost port=5432 dbname=SOURCE”&lt;/li&gt;&lt;/ul&gt;For the Slave : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;./slon –d 1 REP_CLUSTER “user=postgres password=vincentt host=localhost port=5432 dbname=TARGET”&lt;/li&gt;&lt;/ul&gt;The command line output is like below, for the Master : &lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S_K4JXo4kFI/AAAAAAAAAsU/m2xASJRcVJk/s1600-h/image%5B60%5D.png"&gt;&lt;img alt="image" border="0" height="417" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4J0lRMvI/AAAAAAAAAsY/OKbLGy2YmBE/image_thumb%5B28%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="540" /&gt;&lt;/a&gt; And for the Slave, a bit different : &lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S_K4KWzgxyI/AAAAAAAAAsg/CRTb2FtCSiQ/s1600-h/image%5B70%5D.png"&gt;&lt;img alt="image" border="0" height="473" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4KymNqdI/AAAAAAAAAsk/8ObsuMqdNnI/image_thumb%5B32%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="561" /&gt;&lt;/a&gt; We can see we have a successfull truncate table (on the target Fact_Table, which was empty). We also have a “copy set 1 done” which means our replication is running fine.&lt;br /&gt;Let’s have a look on the target table to see if everything went fine ….&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S_K4LaA6t1I/AAAAAAAAAso/IMRNl_zaWXI/s1600-h/image%5B74%5D.png"&gt;&lt;img alt="image" border="0" height="359" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S_K4LqeqLnI/AAAAAAAAAss/C4UKRcvCx_4/image_thumb%5B34%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="528" /&gt;&lt;/a&gt; Yeeeeah, it went fine : our Source data is here in the Target table. Now let’s add some data into the Source table …&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4MBPepBI/AAAAAAAAAsw/nIanqI17lyo/s1600-h/image%5B79%5D.png"&gt;&lt;img alt="image" border="0" height="369" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4MmqElXI/AAAAAAAAAs0/thYR_qDF9o4/image_thumb%5B37%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="538" /&gt;&lt;/a&gt; And let’s check on the Target db if the replication did the job we are expecting …&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S_K4M4rv2sI/AAAAAAAAAs4/Zbxc6ngPfCY/s1600-h/image%5B84%5D.png"&gt;&lt;img alt="image" border="0" height="376" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S_K4NV8u42I/AAAAAAAAAs8/7gpzhJwdjlM/image_thumb%5B40%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="549" /&gt;&lt;/a&gt; … yes Sir, it worked !! You can also check by reading the &lt;strong&gt;slon&lt;/strong&gt; command line output.&lt;br /&gt;You are now ready to create your own custom replication scripts using Slony.&lt;br /&gt;I hope this tutorial helped you. Keep me in touch or contact me if you need more informations.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1773134478679974798?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1773134478679974798/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1773134478679974798' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1773134478679974798'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1773134478679974798'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/05/data-replication-with-postgresql-and.html' title='Data replication with Postgresql and Slony'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/S_K3-3yYCKI/AAAAAAAAAqw/bHgPwOh51IM/s72-c/image_thumb%5B1%5D.png?imgmax=800' height='72' width='72'/><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4408915782039935522</id><published>2010-05-05T06:07:00.001-07:00</published><updated>2010-12-15T08:02:28.908-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Geocoding'/><title type='text'>A quick ‘n cheap geocoding process using Talend, Soap and web service …</title><content type='html'>Hi all,&lt;br /&gt;Today, let’s play with &lt;a href="http://fr.talend.com/"&gt;Talend&lt;/a&gt; Open Studio (TOS). Here is a quick and cheap way to process ip adress geocoding with Talend, by using web service call. For this purpose, I will call an ip lookup service provided by &lt;a href="http://www.ippages.com/"&gt;ippages.com&lt;/a&gt;. Using the free version, you will be able to send 20000 ip geocoding queries, that’s good.&lt;br /&gt;First, let’s design our job. I use a tWebServiceInput step, configured like below.&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S-Ftdf2oX7I/AAAAAAAAAqU/Y5mHGKkxZ0E/s1600-h/image%5B3%5D.png"&gt;&lt;img alt="image" border="0" height="455" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S-Ftd1vSiQI/AAAAAAAAAqY/gfTHktxlqeQ/image_thumb%5B1%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="636" /&gt;&lt;/a&gt; &lt;br /&gt;As you can see, we have to tell this step where the WSDL is. In that case, the web service interface is : &lt;a href="http://www.ippages.com/soap2008/lookupserver.php?wsdl"&gt;http://www.ippages.com/soap2008/lookupserver.php?wsdl&lt;/a&gt;. Then, don’t forget to call the appropriate method : callshowmyip_lookup. Finally, the web service is waiting for 7 parameters : you put the ip adress on the first one and – for instance – the country code in the fourth one. Make sure you will store the webservice answer into a dedicated row (for me it is named “line” and your tWebServiceInput step is now ready.&lt;br /&gt;The ippage web service will send you back an XML answer. You will have to manage / process this XML.&lt;br /&gt;Now you can export your ip geocoding data into a new xml file, like in my simple example. Or, better, you can use a tExtractXMLField to parse – using XPATH - your XML and distribute the data you need for any further step. Your XPATH code will start with \root\row\line.&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S-FteMdFJYI/AAAAAAAAAqc/kROocqh41FQ/s1600-h/image%5B7%5D.png"&gt;&lt;img alt="image" border="0" height="103" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S-Ftera_MeI/AAAAAAAAAqg/L-6WqWgNuxk/image_thumb%5B3%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="126" /&gt;&lt;/a&gt;&lt;br /&gt;The XML answer back from ippages will look like this : &lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S-FtfMh7vtI/AAAAAAAAAqk/q78ngDTr4IM/s1600-h/image%5B11%5D.png"&gt;&lt;img alt="image" border="0" height="168" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S-FtfVjL1WI/AAAAAAAAAqo/RtMNk9SYy14/image_thumb%5B5%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="638" /&gt;&lt;/a&gt;&lt;br /&gt;The free usage of ippages has some restrictions : some informations are not available and you have to subscribe to a commercial offer to have them. If you have a look to the XML up there, you will notice that you still have a lot of interesting informations with the free offer.&lt;br /&gt;Happy geocoding.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4408915782039935522?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4408915782039935522/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4408915782039935522' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4408915782039935522'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4408915782039935522'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/05/quick-geocoding-process-using-talend.html' title='A quick ‘n cheap geocoding process using Talend, Soap and web service …'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/S-Ftd1vSiQI/AAAAAAAAAqY/gfTHktxlqeQ/s72-c/image_thumb%5B1%5D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-5662462418160029226</id><published>2010-04-26T08:50:00.001-07:00</published><updated>2010-12-15T08:02:44.882-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Plugins and Kettle V4</title><content type='html'>Hi all,&lt;br /&gt;A quick word to say that the following plugins &lt;span style="color: green;"&gt;&lt;strong&gt;are running fine with &lt;a href="http://www.pentaho.com/pdi_4/?rm=y"&gt;Kettle V4&lt;/a&gt;&lt;/strong&gt;&lt;/span&gt; : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S9W2NGBLmqI/AAAAAAAAAp0/SpIr5YnQDSc/s1600-h/image%5B28%5D.png"&gt;&lt;img alt="image" border="0" height="55" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S9W2NiqFHII/AAAAAAAAAp4/UYkwH4r4eVI/image_thumb%5B20%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="69" /&gt;&lt;/a&gt;&lt;a href="http://code.google.com/p/kqr/"&gt;Qkr&lt;/a&gt; &lt;/strong&gt;: &lt;strong&gt;Transformation plugin to create QRCodes coming from DBs.&lt;/strong&gt;&lt;/li&gt;&lt;li&gt;&amp;nbsp;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S9W2N_aIB7I/AAAAAAAAAp8/C2YyK8SaCk4/s1600-h/image%5B26%5D.png"&gt;&lt;img alt="image" border="0" height="45" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S9W2OoI0kWI/AAAAAAAAAqA/i5RTOEI9sq0/image_thumb%5B18%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="67" /&gt;&lt;/a&gt;&lt;strong&gt;&lt;a href="http://code.google.com/p/kgeocoding/"&gt;Kgeocoding&lt;/a&gt;&lt;/strong&gt; : &lt;strong&gt;Transformation plugin for geocoding addresses.&lt;/strong&gt;&lt;/li&gt;&lt;li&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S9W2O-ReqkI/AAAAAAAAAqE/gXEHgGwhswY/s1600-h/image%5B3%5D.png"&gt;&lt;img alt="image" border="0" height="49" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S9W2PVHbReI/AAAAAAAAAqI/ZHSFXkyKkIA/image_thumb%5B1%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="68" /&gt;&lt;/a&gt;&amp;nbsp;&lt;strong&gt;&lt;a href="http://code.google.com/p/krrd/"&gt;krrd&lt;/a&gt;&lt;/strong&gt; : &lt;strong&gt;Transformation plugin to feed RRD files (RTG …).&lt;/strong&gt;&lt;/li&gt;&lt;/ul&gt;&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S9W2PhX5wzI/AAAAAAAAAqM/x8qaEmmRNrE/s1600-h/image%5B31%5D.png"&gt;&lt;img alt="image" border="0" height="72" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S9W2QHGJ9fI/AAAAAAAAAqQ/K5HRROHI0Us/image_thumb%5B21%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="83" /&gt;&lt;/a&gt; The plugin SendToS3 (job plugin)&lt;span style="color: red;"&gt; &lt;strong&gt;is not working on Kettle V4 for the moment.&lt;/strong&gt;&lt;/span&gt; Some AbstractMethodError is flying around. I have to find why and patch …&lt;br /&gt;Vincent&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-5662462418160029226?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/5662462418160029226/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=5662462418160029226' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5662462418160029226'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5662462418160029226'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/04/plugins-and-kettle-v4.html' title='Plugins and Kettle V4'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_hTlcWbt-BP4/S9W2NiqFHII/AAAAAAAAAp4/UYkwH4r4eVI/s72-c/image_thumb%5B20%5D.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-5040720160883529818</id><published>2010-04-26T04:57:00.000-07:00</published><updated>2010-12-15T08:03:00.625-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Geocoding with Kettle : new plugin</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;The Kettle geocoding plugin is &lt;a href="http://code.google.com/p/kgeocoding/downloads/list"&gt;HERE&lt;/a&gt;.&lt;br /&gt;But shhhhhh .... don't tell it too loud .... ;)&lt;br /&gt;And give me feedback if you want new features, improvements, etc ...&lt;br /&gt;&lt;br /&gt;Ps : some usefull infos about the API term of use &lt;a href="http://groups.google.com/group/google-maps-api/browse_thread/thread/1e5699692b75d6e0/acc6ba5e41d5deca#acc6ba5e41d5deca"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Vincent&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-5040720160883529818?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/5040720160883529818/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=5040720160883529818' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5040720160883529818'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5040720160883529818'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/04/geocoding-with-kettle-new-plugin_26.html' title='Geocoding with Kettle : new plugin'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-6743791875788633930</id><published>2010-04-22T06:10:00.001-07:00</published><updated>2010-12-15T08:03:14.701-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Geocoding'/><title type='text'>Kettle plugin with Google map GeoCoding : I’m wrong !</title><content type='html'>Hi all,&lt;br /&gt;Here is a link to a thread I initiated this morning after I was warned about my wrong usage of the Google Map API. Note that the limit for geocoding is now 2500 and no more 15000, which is not anymore a good thing in term of ETL process.&lt;br /&gt;&lt;a href="http://groups.google.com/group/google-maps-api/browse_thread/thread/1e5699692b75d6e0/acc6ba5e41d5deca#acc6ba5e41d5deca"&gt;GeoCoding plugin thread on Google groups.&lt;/a&gt;&lt;br /&gt;According I’m currently violating the term of use, the plugin will be unavailable for download today at 00:00 CET.&lt;br /&gt;I’m sorry for that. I will start coding a new geocoding plugin for Kettle, using another – free -&amp;nbsp; geocoder / API …&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-6743791875788633930?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/6743791875788633930/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=6743791875788633930' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6743791875788633930'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6743791875788633930'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/04/kettle-plugin-with-google-map-geocoding.html' title='Kettle plugin with Google map GeoCoding : I’m wrong !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8289870627618168246</id><published>2010-04-21T11:03:00.001-07:00</published><updated>2010-12-15T08:03:30.504-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>GeoCoding with Kettle : new plugin</title><content type='html'>Hi all,&lt;br /&gt;I created a plugin for &lt;strong&gt;geocoding&lt;/strong&gt; addresses into &lt;a href="http://www.pentaho.com/pdi_4/"&gt;Kettle&lt;/a&gt; v3.5. This plugin is using the google maps API. You can learn more about this API &lt;a href="http://code.google.com/intl/fr/apis/maps/"&gt;HERE&lt;/a&gt;.&lt;br /&gt;&lt;h4&gt;What is Geocoding ?&lt;/h4&gt;According to wikipedia, &lt;strong&gt;geocoding&lt;/strong&gt; is “the process of finding associated geographic coordinates (often expressed as latitude and longitude) from other geographic data, such as street addresses, or zip codes”. Normalization is the process to clean an input address and putit into a normalized, standardized format.&lt;br /&gt;&lt;strong&gt;Reverse geocoding&lt;/strong&gt; is the opposite : finding a complete address from GPS coordinates.&lt;br /&gt;&lt;div align="center"&gt;&lt;img height="254" src="http://www.newsgab.com/forum/attachments/celebrity-pictures/17844d1159815922-rachel-hunter-world-map-bodypaint-swimsuit-ra_hu120-b.jpg" style="display: block; float: none; margin-left: auto; margin-right: auto;" width="124" /&gt;&lt;em&gt;Raised relief map … a basic tool for geocoding.&lt;/em&gt;&lt;/div&gt;&lt;h4&gt;The plugin&lt;/h4&gt;For the moment, it is a basic &lt;strong&gt;V1 release, but fully working&lt;/strong&gt;. A lot more features are about to be added (advanced geocoding).&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S889zsh71XI/AAAAAAAAAo0/A3oJNEcCd3E/s1600-h/image27.png"&gt;&lt;img alt="image" border="0" height="65" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S8890H1l4VI/AAAAAAAAAo4/-VbR5WlwLac/image_thumb13.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline;" title="image" width="80" /&gt;&lt;/a&gt; &lt;br /&gt;Here is the plugin screen, in Kettle. This is a basic screen as you can see. You need to enter the following : &lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;GMapKey :&lt;/strong&gt; your google map key. The geocoding works without it … well for me. But I recommend you to sign on for the API and use your Google key. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Input Address Field :&lt;/strong&gt; the address field, from the incoming rows, on which you want to process the geocoding &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Normalized address :&lt;/strong&gt; give the column name in which the normalized address will be stored. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;City Field :&lt;/strong&gt; give the column name in which the city name will be stored. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;GPS Coord Fields :&lt;/strong&gt; give the column name in which the GPS Coordinates address will be stored. &lt;/li&gt;&lt;/ul&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S8890-jR1PI/AAAAAAAAAo8/xwaCkuk8IC4/s1600-h/image11.png"&gt;&lt;img alt="image" border="0" height="268" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S8891oydwtI/AAAAAAAAApA/XZbUPEtLmak/image_thumb5.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="396" /&gt;&lt;/a&gt; &lt;br /&gt;Here is the main Kettle screen with a transformation sample.&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S8892cyI5BI/AAAAAAAAApE/oLpElsqTTKg/s1600-h/image7.png"&gt;&lt;img alt="image" border="0" height="300" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S88927wegkI/AAAAAAAAApI/KWwtL2tOZVk/image_thumb3.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline;" title="image" width="631" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;h4&gt;Let’s see how it works&lt;/h4&gt;For the example above, I used 4 row creation steps to create 4 types of addresses (French, USA, Asia, Africa). Here is the output : a code, a raw adress (with typos and disorder) and a comment.&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S8893cGvasI/AAAAAAAAApM/1ld0E0HAczg/s1600-h/image16.png"&gt;&lt;img alt="image" border="0" height="193" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S8894GUQchI/AAAAAAAAApQ/yuArJZ_sURE/image_thumb8.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="537" /&gt;&lt;/a&gt;&lt;br /&gt;Let’s imagine now we want to normalize the Raw address content field and retrieve the corresponding GPS coordinates for each address. Let’s do it, we set up the plugin screen with the following informations : your GMap key, the “Raw address” input field and the names for the normalized address, the city field and the GPS coords.&lt;br /&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S8894gOYBFI/AAAAAAAAApU/jfr6Tn8rbvY/s1600-h/image20.png"&gt;&lt;img alt="image" border="0" height="287" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S8895UJ8EpI/AAAAAAAAApY/hQ5XnOjmHAo/image_thumb10.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="425" /&gt;&lt;/a&gt;&lt;br /&gt;Now we can plug everything and start the transformation. The plugin is asking for geocoding to the Google map API for each address. You will find the result set as follows :&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S89l2cybFUI/AAAAAAAAApk/BExhAPJJasg/s1600-h/image21%5B1%5D.png"&gt;&lt;img alt="image" border="0" height="175" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S8896emu09I/AAAAAAAAApo/aXqxl3mc3xw/image21_thumb.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline;" title="image" width="715" /&gt;&lt;/a&gt;&lt;br /&gt;The original fields are still here (Code, Raw Adress and Comment), but the plugin added 3 more fields according the names you set up previously (&lt;strong&gt;Norm_address, City and GPS_Coord&lt;/strong&gt;). As you can see, the adress is normalized and formated, thanks to Google map API. The GPS coords are : lat / lng.&lt;br /&gt;&lt;h4&gt;Limits&lt;/h4&gt;After some readings, I noticed you can ask for geocoding up to 15.000 time per day. This is a limitation of the Google map API. I didn’t try to go above 15.000 addresses / geocoding demands. I let you check this (create 15001 lines in the row creation steps …).&lt;br /&gt;&lt;h4&gt;I want it&lt;/h4&gt;No problem. You can download the plugin &lt;a href="http://code.google.com/p/kgeocoding/downloads/list"&gt;HERE&lt;/a&gt; (plugin, xml file and icon) and test it. Like usual, everything is packed into a single jar using fatjar.&lt;br /&gt;&lt;h4&gt;What’s next ?&lt;/h4&gt;&lt;strong&gt;This is a basic geocoding process&lt;/strong&gt;. I’m currently working on something more powerfull, with more features : using all the API attributes, give ability to the user to choose which attributes he wants / doesn’t want, reverse geocoding … etc …&lt;br /&gt;Please, if this plugin is usefull for you, tell me more about your needs. I will be happy to upgrade this plugin for your usage.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8289870627618168246?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8289870627618168246/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8289870627618168246' title='9 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8289870627618168246'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8289870627618168246'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/04/geocoding-with-kettle-new-plugin.html' title='GeoCoding with Kettle : new plugin'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_hTlcWbt-BP4/S8890H1l4VI/AAAAAAAAAo4/-VbR5WlwLac/s72-c/image_thumb13.png?imgmax=800' height='72' width='72'/><thr:total>9</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8006519809048682617</id><published>2010-04-13T11:35:00.000-07:00</published><updated>2010-04-13T12:00:56.592-07:00</updated><title type='text'>OSBI &amp; ATOL</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;Today, I want to put Sylvain Decloix, OSBI and ATOL into the lights.&lt;br /&gt;&lt;br /&gt;Sylvain Decloix is a french BI Manager focused on Open Source, working for &lt;a href="http://www.atolcd.com/"&gt;ATOL &lt;/a&gt;company.&lt;br /&gt;He has a blog &lt;a href="http://www.osbi.fr/"&gt;HERE&lt;/a&gt;, where you can find very good articles (in french) about BI, data integration, datawarehousing and reporting.&lt;br /&gt;Not only a consultant and a manager able to provide high quality support on projects , Sylvain is what we can call a visionnary : he has very good knowledge of the BI industry, often delivers very sharp analysis about the market, trends and actors.&lt;br /&gt;On top of that, he is very experienced with spatial data management and geocoding.&lt;br /&gt;&lt;br /&gt;I recommend a frequent visit to his blog, and if you are looking for BI skills and consulting in France, you can definitely ask ATOL for assistance.&lt;br /&gt;&lt;br /&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 256px; DISPLAY: block; HEIGHT: 96px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5459698808975983314" border="0" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/S8S_I9mjAtI/AAAAAAAAAn8/oFBgF-hF7VA/s320/ATOL.bmp" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8006519809048682617?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8006519809048682617/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8006519809048682617' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8006519809048682617'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8006519809048682617'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/04/osbi-atol.html' title='OSBI &amp; ATOL'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_hTlcWbt-BP4/S8S_I9mjAtI/AAAAAAAAAn8/oFBgF-hF7VA/s72-c/ATOL.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-7588816210454842990</id><published>2010-04-12T05:41:00.001-07:00</published><updated>2010-04-12T05:58:45.862-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>QRCode encoder with Kettle : new plugin</title><content type='html'>&lt;p&gt;Hi all,&lt;/p&gt;&lt;p&gt;I’ve made a new Kettle plugin to answer to a specific need from one of my clients in retail industry : a QRCode encoder.&lt;/p&gt;&lt;h4&gt;What is a QrCode ?&lt;/h4&gt;&lt;p&gt;It is a matrix code created by Denso Wave, a japanese company. QR stands for “quick response” because of the high speed decoding process. Today, most of our mobile phones (ex : iPhone) can read (decode) these QRCodes and read the information within to feed an application (agenda, adress book …). Some companies use it for encoding inside a logistic process. These codes are also widely used in ads. More infos &lt;a href="http://en.wikipedia.org/wiki/QR_Code"&gt;HERE&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S8MUxmRNsrI/AAAAAAAAAnU/lK9oRC-JhIQ/s1600-h/image2.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S8MUy2XRkWI/AAAAAAAAAnY/5b1Lt2P7sQc/image_thumb.png?imgmax=800" width="132" height="132" /&gt;&lt;/a&gt;  &lt;img src="http://likeiknowit.files.wordpress.com/2010/03/qr-code.jpg" /&gt;&lt;/p&gt;&lt;p align="center"&gt;&lt;em&gt;One of the code above is mine. But which one ?&lt;/em&gt;&lt;/p&gt;&lt;p align="center"&gt;&lt;em&gt;&lt;/em&gt;&lt;/p&gt;&lt;h4&gt;The Kettle plugin&lt;/h4&gt;&lt;p&gt;I used the famoux ZXING java library to encode any data into a QRCode. This library is quite complete and powerfull, yet sometimes a bit difficult to install and understand (read the wiki). You can find it &lt;a href="http://code.google.com/p/zxing/"&gt;HERE&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;The plugin is quite simple : a single screen holds all the needed informations.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S8MUziWgIoI/AAAAAAAAAnc/gibM42cPJMI/s1600-h/image6.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: block; FLOAT: none; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; MARGIN-LEFT: auto; BORDER-LEFT-WIDTH: 0px; MARGIN-RIGHT: auto" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S8MU1b0q6qI/AAAAAAAAAng/QXjTk6ZRahY/image_thumb2.png?imgmax=800" width="513" height="347" /&gt;&lt;/a&gt; Here are the fields : &lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Step name :&lt;/strong&gt; the step name. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;QRColumn :&lt;/strong&gt; the column you want to encode, coming from a previous step. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Size x :&lt;/strong&gt; the X size of your generated QRCode file (recommanded : 128). &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Sixe y :&lt;/strong&gt; the Y size of your generated QRCode file (recommanded : 128). &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Picture format :&lt;/strong&gt; png or gif &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Destination dir :&lt;/strong&gt; the destination directory where all the pictures / QRCodes will be written. &lt;/li&gt;&lt;/ul&gt;&lt;h4&gt;Running the Plugin&lt;/h4&gt;&lt;p&gt;Very simple. Have a look below to my sample transformation. A data input coming from a csv file (can be a query or anything else) and the plugin itself. The plugin will read all the incoming data from the QRColumn you specified and will create a file for each value. The generated files will have the same name but will have a counter in the filename, corresponding to the rownumber.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S8MU2TmzfZI/AAAAAAAAAnk/wMttuCxTfmw/s1600-h/image15.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S8MU30uLfVI/AAAAAAAAAno/i2RZaUBbxa8/image_thumb7.png?imgmax=800" width="641" height="448" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;The generated files, from my examples, are now on my C:\, as specified in the plugin window.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S8MU49GMHgI/AAAAAAAAAns/VnEvWszGoPA/s1600-h/image19.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S8MU6Gom7jI/AAAAAAAAAnw/XGW1dQS65AA/image_thumb9.png?imgmax=800" width="649" height="337" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;h4&gt;I want it !&lt;/h4&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Ok, no problem, you can find the package &lt;a href="http://code.google.com/p/kqr/"&gt;HERE&lt;/a&gt;. As usual, everything is compiled with fat jar in order to have only one jar file. The package holds : &lt;/p&gt;&lt;ul&gt;&lt;li&gt;The plugin itself (+ xml file and icon file), &lt;/li&gt;&lt;li&gt;The data sample csv file, &lt;/li&gt;&lt;li&gt;My QRCode, to add me to your favorites ;) &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Don’t forget to have a look to &lt;a href="http://www.pentaho.com/"&gt;Pentaho&lt;/a&gt; and the new &lt;a href="http://www.pentaho.com/pdi_4/?hp=y"&gt;Kettle / PDI release 4&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;Have fun and keep me informed about your usage / testing or new feature request.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-7588816210454842990?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/7588816210454842990/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=7588816210454842990' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7588816210454842990'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7588816210454842990'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/04/qrcode-encoder-with-kettle-new-plugin_12.html' title='QRCode encoder with Kettle : new plugin'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_hTlcWbt-BP4/S8MUy2XRkWI/AAAAAAAAAnY/5b1Lt2P7sQc/s72-c/image_thumb.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-5827691439689404577</id><published>2010-04-08T12:55:00.001-07:00</published><updated>2010-12-15T08:03:45.112-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Plugin : Update RRD Tool with Kettle (Cacti, MRTG …).</title><content type='html'>Hi all,&lt;br /&gt;This post will give you more details about my new Kettle plugin to feed RRDTools database.&lt;br /&gt;&lt;h4&gt;What is RRDTool ?&lt;/h4&gt;According to &lt;strong&gt;Tobias Oetiker&lt;/strong&gt;, RRDtool is the OpenSource industry standard, high performance data logging and graphing system for time series data. Use it to write your custom monitoring shell scripts or create whole applications using its Perl, Python, Ruby, TCL or PHP bindings.&lt;br /&gt;You can learn more about RRDTools and &lt;strong&gt;Tobias Oetiker&lt;/strong&gt; fantastic work on his homepage &lt;a href="http://oss.oetiker.ch/rrdtool/"&gt;HERE&lt;/a&gt;.&lt;br /&gt;With RRDTools, you can easily store time series data and create realtime graphics like the one below. For instance, I use it today for one of my client on Paris in order to monitor various real time business / IT indicators : travel booking, passenger reservation, search engine sollicitation, xml proxy load and mainframe usage.&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S740gn6uh2I/AAAAAAAAAl8/S1wEbpBpeWg/s1600-h/image3.png"&gt;&lt;img alt="image" border="0" height="146" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S740h6qRg_I/AAAAAAAAAmA/CA0_wlXGjp0/image_thumb1.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="400" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;h4&gt;The libs I used&lt;/h4&gt;I used JRobin, a java port of &lt;strong&gt;Tobias Oetiker&lt;/strong&gt; RRDTools. JRobin was made by the talented &lt;strong&gt;Sasa Markovic&lt;/strong&gt;. The JRobin home page are &lt;a href="http://oldwww.jrobin.org/"&gt;HERE&lt;/a&gt; and &lt;a href="http://www.jrobin.org/index.php/Main_Page"&gt;HERE&lt;/a&gt;. I recommand a visit in order to be fully aware of all JRobin features.&lt;br /&gt;According to Sasa Markovic, “JRobin is a 100% pure java implementation of &lt;a href="http://www.rrdtool.org/"&gt;RRDTool's&lt;/a&gt; functionality. It follows the same logic and uses the same data sources, archive types and definitions as RRDTool does. JRobin supports all standard operations on Round Robin Database (RRD) files: CREATE, UPDATE, FETCH, LAST, DUMP, XPORT&amp;nbsp; and GRAPH. JRobin's API is made for those who are familiar with RRDTool's concepts and logic, but prefer to work with pure java. If you provide the same data to RRDTool and JRobin, you will get exactly the same results and graphs.” &lt;br /&gt;I confirm everything.&lt;br /&gt;The graphical rendering is very good looking as you can see below.&lt;br /&gt;&lt;img alt="" border="0" height="212" src="http://oldwww.jrobin.org/images/gallery/demo0.png" width="278" /&gt;&lt;img alt="" border="0" height="211" src="http://oldwww.jrobin.org/images/gallery/complexdemo3.png" width="323" /&gt;&lt;br /&gt;&lt;img alt="" border="0" height="237" src="http://oldwww.jrobin.org/images/gallery/zarama1.png" width="283" /&gt;&lt;img alt="" border="0" height="236" src="http://oldwww.jrobin.org/images/gallery/demo1.png" width="326" /&gt;&lt;br /&gt;&lt;h4&gt;&amp;nbsp;&lt;/h4&gt;&lt;h4&gt;The plugin&lt;/h4&gt;First, I recommand to read carefully everything related to RRDTools and JRobin. You must be familiar with this technology first.&lt;br /&gt;The Kettle plugin is quite simple : a single user interface to create a RRD file, add archives and feed the file.&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S740iIWB4bI/AAAAAAAAAmE/Vs2tOGb2Eyg/s1600-h/image10.png"&gt;&lt;img align="left" alt="image" border="0" height="87" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S740ipMZvqI/AAAAAAAAAmI/u44Q41WZmdk/image_thumb4.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline; margin-left: 0px; margin-right: 0px;" title="image" width="110" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S740jDAtjpI/AAAAAAAAAmM/WE-XlrzbzyI/s1600-h/image7.png"&gt;&lt;img alt="image" border="0" height="382" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S740jyZef-I/AAAAAAAAAmQ/4yGtnEmSx-4/image_thumb3.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="303" /&gt;&lt;/a&gt; A short description of this window :&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;Nom étape (sorry in french, will be translated) :&lt;/strong&gt; Step name &lt;/li&gt;&lt;li&gt;&lt;strong&gt;RRD File :&lt;/strong&gt; the RRD file to be created. This file will hold all your time series data and archives. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Datasource :&lt;/strong&gt; an RRD file can have 1 or more datasource. For the moment, my plugin is restricted to 1 datasource, which is most of the time enough. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Type :&lt;/strong&gt; &lt;ul&gt;&lt;li&gt;&lt;em&gt;&lt;strong&gt;Gauge :&lt;/strong&gt;&lt;/em&gt; Does no store the rate of change, it saves the actual value itself. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;&lt;em&gt;Counter :&lt;/em&gt;&lt;/strong&gt; To store the rate of change of the value over a step period (assume the value is always increasing). Ex : traffic counters. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;&lt;em&gt;Derive :&lt;/em&gt;&lt;/strong&gt; The same as Counter, but will handle negative values. Ex : free disk space. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;&lt;em&gt;Absolute :&lt;/em&gt;&lt;/strong&gt; To store the rate of change, but the previous value is set to 0. &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;Heartbeat :&lt;/strong&gt; If the RRD file does not receive value (PDP) within 300 seconds, it will wait for another 300 seconds (total = 600 seconds). If no value after 600 seconds, the flag UNKNOWN will be stored. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Starttime :&lt;/strong&gt; The unix timestamp as the RRD file starting point. Must be a unix timestamp. In a future release, I will code a converter and place it into the user interface. You can easily compute unix timestamps by using this &lt;a href="http://unixepoch.com/index.htm"&gt;web page&lt;/a&gt; or this &lt;a href="http://www.epochconverter.com/"&gt;one&lt;/a&gt;. This timestamp must be lower than the one coming from your data. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;Min and Max :&lt;/strong&gt; The minimum value and the maximum value, if predictable. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;The combo zone :&lt;/strong&gt; This combo gadget will be used to define RRA : Round Robin Archives. An RRA will define how the consolidated data is stored. We have 4 major parameters : &lt;ul&gt;&lt;li&gt;&lt;strong&gt;&lt;em&gt;CF, for Consolidation Function :&lt;/em&gt;&lt;/strong&gt; &lt;ul&gt;&lt;li&gt;&lt;em&gt;AVERAGE : Store the average value&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;em&gt;MIN : Store the minimum value&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;em&gt;MAX : Store the max value&lt;/em&gt; &lt;/li&gt;&lt;li&gt;&lt;em&gt;LAST : Store the last known value&lt;/em&gt; &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;li&gt;&lt;strong&gt;&lt;em&gt;xff :&lt;/em&gt;&lt;/strong&gt; XFile factor. This is the percentage of values that can be unknown without making the recorded value flagged as UNKNOWN. Must be between 0 and 1, with 0.1 intervals. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;&lt;em&gt;Steps :&lt;/em&gt;&lt;/strong&gt;&amp;nbsp; Number of values to be consolidated, regarding the chosen CF. Must be integer. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;&lt;em&gt;Rows :&lt;/em&gt;&lt;/strong&gt; Number of samples to keep. Must be integer. &lt;/li&gt;&lt;/ul&gt;&lt;/li&gt;&lt;/ul&gt;The Add button will add the RRA (round robin archive) in the combo list, and then will be used for the RRD file creation. &lt;br /&gt;Once the RRD file is successfully created, you will see a little message on botton of the user interface.&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S740kAT1x-I/AAAAAAAAAmU/IXQeGvfOQMc/s1600-h/image40.png"&gt;&lt;img alt="image" border="0" height="213" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S740kvYe6xI/AAAAAAAAAmY/N-_A3rLKjFQ/image_thumb20.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="357" /&gt;&lt;/a&gt; &lt;br /&gt;Let’s have a look to some RRD file internals, using another nice tool from &lt;a href="http://oldwww.jrobin.org/index.html"&gt;Sasa Markovic&lt;/a&gt; : RRD inspector. I’m sure you will easily understand the RRD structure, if you are not already familiar with RRDTools.&lt;br /&gt;The above screen shows us a RRD file created with one datasource called Speed, using a type GAUGE with a heartbeat of 600 seconds, with minimum and maximum values set to 0 to 2000. &lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S740lJJrllI/AAAAAAAAAmc/XuO9_ftMnCQ/s1600-h/image15.png"&gt;&lt;img alt="image" border="0" height="328" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S740mHHiCWI/AAAAAAAAAmg/uwVrhV-Nzdo/image_thumb7.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="470" /&gt;&lt;/a&gt;&lt;br /&gt;This RRD file also has a unique RRA (round robin archive), using the AVERAGE consolidation function, with xff set to 0.5, 1 step (compute each value = no average in fact) and 24 rows.This file has been created using the Kettle plugin on my C:\ harddrive.&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S740m5GB5PI/AAAAAAAAAmk/KD1fPujty-Q/s1600-h/image19.png"&gt;&lt;img alt="image" border="0" height="327" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S740nSggVDI/AAAAAAAAAmo/UkAfTC8VjVE/image_thumb9.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="472" /&gt;&lt;/a&gt;&lt;br /&gt;If we select then panel “Archive data” we will be able to see all the values currently stored into the RRD file.&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S740oLmKqBI/AAAAAAAAAms/CgWwV9QN7UY/s1600-h/image27.png"&gt;&lt;img alt="image" border="0" height="342" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S740oz8o2YI/AAAAAAAAAmw/P9YAQYqE3xo/image_thumb13.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="489" /&gt;&lt;/a&gt; &lt;br /&gt;RRD inspector is a fantastic little tool, very usefull when creating RRD files and checking everything is well done.&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;How to use the plugin ?&lt;/h4&gt;Very simple. First you create a RRD File using the user interface shown above. Then you have to control the file has been created, just to be sure. Finally, you can connect the step to a previous one in Kettle. In my example, I used a flat file containing some simple timestamps and values.&lt;br /&gt;Here is my flat file : a unix timestamp with 5 mins intervals (starting Thu, 8 Apr 2010 12:00:00 UTC) and some simple values from 5 to 140.&lt;br /&gt;TimeStamp;Value &lt;br /&gt;1270728000;5 &lt;br /&gt;1270728300;10 &lt;br /&gt;1270728600;15 &lt;br /&gt;1270728900;20 &lt;br /&gt;1270729200;25 &lt;br /&gt;1270729500;40 &lt;br /&gt;1270729800;50 &lt;br /&gt;1270730100;60 &lt;br /&gt;1270730400;70 &lt;br /&gt;1270730700;80 &lt;br /&gt;1270731000;90 &lt;br /&gt;1270731300;100 &lt;br /&gt;1270731600;120 &lt;br /&gt;1270731900;140&lt;br /&gt;And here is my sample transformation.&lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S740pBMJYkI/AAAAAAAAAm0/Ozb8G3PxurU/s1600-h/image31.png"&gt;&lt;img alt="image" border="0" height="193" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S740pnkoTbI/AAAAAAAAAm4/4DywsHBzsGg/image_thumb15.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="464" /&gt;&lt;/a&gt; &lt;br /&gt;Hit play, and voilà … the plugin will feed the RRD file and give you a nice output log for each value. &lt;strong&gt;To be short : a RRD file is only expecting a unix timestamp and a value.&lt;/strong&gt;&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S740qcr_uXI/AAAAAAAAAm8/uLNmQKiTOJc/s1600-h/image36.png"&gt;&lt;img alt="image" border="0" height="446" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S740rjwZMWI/AAAAAAAAAnA/crngf4V_wZo/image_thumb18.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="image" width="476" /&gt;&lt;/a&gt; &lt;br /&gt;&lt;h4&gt;Generating a graph&lt;/h4&gt;Well, this is not really Kettle oriented, but I will give you some code to create graphics from your RRD file, previously loaded with Kettle.&lt;br /&gt;This simple java snippet …&lt;br /&gt;public static void RenderRRDGraph(long TimeStart, long TimeStop, String Consol, String RRDGraphFormat) throws IOException, RrdException{ &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; //Create gif graph &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RrdGraphDef graphDef = new RrdGraphDef(); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; graphDef.setVerticalLabel("m/s"); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; graphDef.setTimeSpan(TimeStart, TimeStop); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; graphDef.datasource("myspeed", "C:\\testRRD", "speed", Consol); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; graphDef.line("myspeed", new Color(0xFF, 0, 0), null, 2); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; graphDef.setFilename("C:\\testRRD." + RRDGraphFormat); &lt;br /&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; RrdGraph graph = new RrdGraph(graphDef); &lt;br /&gt;}&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S740ry1PpTI/AAAAAAAAAnE/pmU4VGGMXY4/s1600-h/testRRD3.png"&gt;&lt;img alt="testRRD" border="0" height="120" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S740svhD2UI/AAAAAAAAAnI/qXusjL86JyA/testRRD_thumb1.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: block; float: none; margin-left: auto; margin-right: auto;" title="testRRD" width="379" /&gt;&lt;/a&gt; &lt;br /&gt;… will create this png.&lt;br /&gt;Very simple as you can see (well this example is really really simple compared to what we can really do, but I can’t give you any snapshot of the graphs I did for my client – I have NDA on this). You can imagine now creating some real time graphics (RTG) using this technology.&lt;br /&gt;&lt;br /&gt;&lt;h4&gt;The package&lt;/h4&gt;Let’s go back to Pentaho and Kettle : I created a package for you. You will find the plugin itself (compiled and archived under Eclipse using Fat Jar in order to embedd the JRobin library, the icon, the xml file, the flat file and a sample transformation (the one described above).&lt;br /&gt;This package can be donwloaded on its &lt;a href="http://code.google.com/p/krrd/downloads/list"&gt;Google code page&lt;/a&gt;.&lt;br /&gt;Please keep me informed about your testing, and feel free to contact me if further features (or fixes !) are needed.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-5827691439689404577?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/5827691439689404577/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=5827691439689404577' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5827691439689404577'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5827691439689404577'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/04/plugin-update-rrd-tool-with-kettle.html' title='Plugin : Update RRD Tool with Kettle (Cacti, MRTG …).'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/S740h6qRg_I/AAAAAAAAAmA/CA0_wlXGjp0/s72-c/image_thumb1.png?imgmax=800' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-646720809311720082</id><published>2010-04-07T12:48:00.001-07:00</published><updated>2010-04-07T12:48:09.099-07:00</updated><title type='text'>Update RRD Tool with Kettle (Cacti …). New plugin.</title><content type='html'>&lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;I finally finished my new &lt;a href="http://kettle.pentaho.org/"&gt;Kettle&lt;/a&gt; plugin. This one will be used to &lt;strong&gt;create&lt;/strong&gt; and &lt;strong&gt;manage&lt;/strong&gt; RRD files as well as &lt;strong&gt;feeding these RRD files&lt;/strong&gt;. Of course, you need to use &lt;a href="http://www.pentaho.com/"&gt;Pentaho Data Integration Tool&lt;/a&gt; for that (aka Kettle).&lt;/p&gt;  &lt;p&gt;Reminder : RRD – Round Robin Database - is an open source tool for storage and retrieval of time series data. Very usefull when you need to monitor a couple of values in real time.&lt;/p&gt;  &lt;p&gt;This new plugin is fully working and &lt;strong&gt;will be released tomorow&lt;/strong&gt; for the community. I still have to do some cleaning, write a how-to and describe all the library I used (&lt;u&gt;and give full credit their authors&lt;/u&gt;). But for the moment, you can find some pictures below showing the stuff running in Kettle with a simple transformation sending data from a csv file.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S7zhalujEbI/AAAAAAAAAls/fOuBlH-YFE0/s1600-h/image4.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S7zhb5yhCPI/AAAAAAAAAlw/UcjlYf7Du3k/image_thumb2.png?imgmax=800" width="641" height="453" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;em&gt;&lt;font size="1"&gt;Above : You can create up to one datasource (enough for a first release) and you can create as many Consolidations as you want (very usefull), using the grid and the helper to choose the right values.&lt;/font&gt;&lt;/em&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S7zhdNL9-sI/AAAAAAAAAl0/59J0JkXVFkQ/s1600-h/image14.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S7zhd5Cz94I/AAAAAAAAAl4/vJiO2GUn0oc/image_thumb8.png?imgmax=800" width="634" height="441" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;Here is what you can do with RDDTools !!! Pretty sexy, no ? I will use this Kettle plugin for one of my clients in Paris, who wants to monitor sales, booking and search engine optimization in real time, using a window gadget or a web page. The charts will look like these examples …&lt;/p&gt;  &lt;p&gt;&lt;img border="0" alt="" src="http://oldwww.jrobin.org/images/mrtg/rrdtool-daily.png" width="295" height="204" /&gt;&lt;img border="0" alt="" src="http://oldwww.jrobin.org/images/graph_api/traffic_jrobin.png" width="299" height="204" /&gt;&lt;/p&gt;  &lt;p&gt;&lt;/p&gt;  &lt;p&gt;After this first coming release, I think I will add some features like dumping values or – better – updating the charts after the data feeding. Ah, no … just one in priority : convert epoch to human readable date …. ;) (because my plugin needs unix timespamp for the moment), but that’s an easy one.&lt;/p&gt;  &lt;p&gt;Please, contact me if you are interested by testing this plugin.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-646720809311720082?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/646720809311720082/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=646720809311720082' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/646720809311720082'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/646720809311720082'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/04/update-rrd-tool-with-kettle-cacti-new_07.html' title='Update RRD Tool with Kettle (Cacti …). New plugin.'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_hTlcWbt-BP4/S7zhb5yhCPI/AAAAAAAAAlw/UcjlYf7Du3k/s72-c/image_thumb2.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-7265627310831429427</id><published>2010-03-29T08:44:00.001-07:00</published><updated>2010-12-15T08:01:41.169-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Download new Kettle Job Plugin : SendToS3 – Release 2</title><content type='html'>Hi all,&lt;br /&gt;As said in my previous post (below), the release 2 for the Kettle Job Plugin I made is available for download &lt;a href="https://code.google.com/p/ks3/"&gt;HERE&lt;/a&gt;. This release offers a bucket management screen (creation, listing, assign …).&lt;br /&gt;More to come : after been asked for, I will add some metadata / object attributes to be linked with the file in S3. Very usefull if you need, like me, to manage file collections. Have a look below for an example of object attributes.&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S7DK0xsHWmI/AAAAAAAAAlk/nhO7I4LzyhE/s1600-h/image%5B3%5D.png"&gt;&lt;img alt="image" border="0" height="475" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S7DK1Qkdq1I/AAAAAAAAAlo/IGjocPl2iGg/image_thumb%5B1%5D.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline;" title="image" width="629" /&gt;&lt;/a&gt; &lt;br /&gt;Enjoy,&lt;br /&gt;Vincent&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-7265627310831429427?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/7265627310831429427/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=7265627310831429427' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7265627310831429427'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7265627310831429427'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/download-new-kettle-job-plugin-sendtos3.html' title='Download new Kettle Job Plugin : SendToS3 – Release 2'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/S7DK1Qkdq1I/AAAAAAAAAlo/IGjocPl2iGg/s72-c/image_thumb%5B1%5D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-5239522566792465923</id><published>2010-03-26T04:21:00.001-07:00</published><updated>2010-12-15T08:01:27.870-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>NEW FEATURE : Kettle Job Plugin : SendToS3</title><content type='html'>Hi all,&lt;br /&gt;Here is an overview of a new feature for the Kettle Job plugin “&lt;strong&gt;Send files to Amazon S3&lt;/strong&gt;”. As you can see, I added a screen for bucket management : creation, listing, autorefresh … For those who are new to S3, a bucket is – more or less – like a folder but with specific S3 constraints.&lt;br /&gt;During the last week, I was asked to add some more features : zip file, file renaming, file timestamping … Seems people are sending a lot of things on S3 now …&lt;br /&gt;I will find time to add these features during April.&lt;br /&gt;Still need to clean my code and I will soon push a release on the plugin &lt;a href="https://code.google.com/p/ks3/"&gt;Google code page&lt;/a&gt;.&lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S6yYmnCKveI/AAAAAAAAAlc/8wPCXLkT8is/s1600-h/image%5B12%5D.png"&gt;&lt;img alt="image" border="0" height="439" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S6yYnBFdtBI/AAAAAAAAAlg/ulKS5_7PZfQ/image_thumb%5B6%5D.png?imgmax=800" style="border-bottom-width: 0px; border-left-width: 0px; border-right-width: 0px; border-top-width: 0px; display: inline;" title="image" width="643" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-5239522566792465923?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/5239522566792465923/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=5239522566792465923' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5239522566792465923'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5239522566792465923'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/update-kettle-job-plugin-send-files-to.html' title='NEW FEATURE : Kettle Job Plugin : SendToS3'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/S6yYnBFdtBI/AAAAAAAAAlg/ulKS5_7PZfQ/s72-c/image_thumb%5B6%5D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3380753725664498370</id><published>2010-03-22T08:29:00.001-07:00</published><updated>2010-03-26T04:08:14.827-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Kettle job plugin : send files to Amazon S3</title><content type='html'>&lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;I created a job plugin for Kettle for file sending to Amazon S3. You can download this plugin here. It is based on &lt;a href="http://jets3t.s3.amazonaws.com/index.html"&gt;Jets3&lt;/a&gt; toolkit (jets3t-0.7.2). You can download the plugin &lt;a href="https://code.google.com/p/ks3/"&gt;HERE&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;Below is the plugin GUI, as well as an example and you can also see the log output.&lt;/p&gt;  &lt;p&gt;The plugin needs : &lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;strong&gt;Access Key&lt;/strong&gt; : Your S3 access key. You must have a &lt;a href="http://aws.amazon.com/s3/"&gt;S3&lt;/a&gt; Account. &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Private Key&lt;/strong&gt; : Your private key. This key won’t be displayed in the Kettle log output. &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;S3 Bucket&lt;/strong&gt; : A bucket is like a directory. It must be existing. &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;Filename&lt;/strong&gt; : The path and filename for the file you want to send to S3. &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S6ea8NyqFgI/AAAAAAAAAlU/sDceUn2O3j0/s1600-h/Spoon_workbench%5B1%5D.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="Spoon_workbench" border="0" alt="Spoon_workbench" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S6ea9Ph3P0I/AAAAAAAAAlY/QFrYk7Ba4J0/Spoon_workbench_thumb%5B1%5D.png?imgmax=800" width="642" height="410" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Once pushed in Amazon S3, you can see your file in the target bucket (here, a stupid win dll was sent).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S6eMwTlyUmI/AAAAAAAAAlE/9jMKHGft_hE/s1600-h/File_in_S3%5B1%5D.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="File_in_S3" border="0" alt="File_in_S3" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S6eMw0paouI/AAAAAAAAAlI/1yeadNe49O0/File_in_S3_thumb%5B1%5D.png?imgmax=800" width="478" height="367" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;And, this is the job icon.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S6eMxmRQfFI/AAAAAAAAAlM/3WjTBD1902M/s1600-h/SS3%5B2%5D.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="SS3" border="0" alt="SS3" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S6eMyL-z4II/AAAAAAAAAlQ/AaqOR3NYCsM/SS3_thumb.png?imgmax=800" width="132" height="132" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;I will soon add some new features like : bucket creation in the UI, bucket listing, xml parsing for S3 return code and maybe encryption.&lt;/p&gt;  &lt;p&gt;Feel free to contact me.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3380753725664498370?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3380753725664498370/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3380753725664498370' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3380753725664498370'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3380753725664498370'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/kettel-job-plugin-send-files-to-amazon.html' title='Kettle job plugin : send files to Amazon S3'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/S6ea9Ph3P0I/AAAAAAAAAlY/QFrYk7Ba4J0/s72-c/Spoon_workbench_thumb%5B1%5D.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8768110827467076172</id><published>2010-03-19T09:34:00.001-07:00</published><updated>2010-03-19T09:34:34.053-07:00</updated><title type='text'>The beast</title><content type='html'>&lt;p&gt;&amp;#160;&lt;/p&gt;  &lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;Here is “my” IBM SVC 4800 SAN (middle &amp;amp; bottom) for one of my projects in Paris. Fiber channel SAN.&lt;/p&gt;  &lt;p&gt;&lt;strong&gt;10 To and 24 Go of cache memory. I’m expecting good IOs.&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;Yum.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S6OnlXWXHCI/AAAAAAAAAk0/9e863cJIptM/s1600-h/IMG_1720%5B5%5D.jpg"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="IMG_1720" border="0" alt="IMG_1720" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S6OnmIsvRuI/AAAAAAAAAk4/0JKdRoD_4LQ/IMG_1720_thumb%5B3%5D.jpg?imgmax=800" width="369" height="479" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8768110827467076172?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8768110827467076172/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8768110827467076172' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8768110827467076172'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8768110827467076172'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/beast.html' title='The beast'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/S6OnmIsvRuI/AAAAAAAAAk4/0JKdRoD_4LQ/s72-c/IMG_1720_thumb%5B3%5D.jpg?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3059670332723787497</id><published>2010-03-15T04:08:00.000-07:00</published><updated>2010-03-22T09:22:35.425-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>WIP : New book about Kettle</title><content type='html'>&lt;p&gt;Hi all,&lt;br /&gt;&lt;br /&gt;I'm excited (not usual for a Monday morning) to give you this info / link to Roland Bouman's blog.&lt;br /&gt;He - with Matt Casters and Jos Van Dongen - is preparing a &lt;strong&gt;book about Kettle&lt;/strong&gt; ! Everything you ever wanted to know about sex … sorry … about &lt;strong&gt;Kettle&lt;/strong&gt; will there !&lt;br /&gt;Wait until September 2010 ....&lt;br /&gt;&lt;/p&gt;&lt;p&gt;More infos here :&lt;br /&gt;&lt;a href="http://rpbouman.blogspot.com/2010/03/writing-another-book-pentaho-kettle.html"&gt;http://rpbouman.blogspot.com/2010/03/writing-another-book-pentaho-kettle.html&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://www.wiley.com/WileyCDA/WileyTitle/productCd-0470635177.html"&gt;&lt;img src="http://media.wiley.com/product_data/coverImage300/77/04706351/0470635177.jpg" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3059670332723787497?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3059670332723787497/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3059670332723787497' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3059670332723787497'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3059670332723787497'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/wip-new-book-about-kettle.html' title='WIP : New book about Kettle'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2211743881635720388</id><published>2010-03-10T16:02:00.001-08:00</published><updated>2010-03-22T09:22:47.247-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Currency repository with kettle</title><content type='html'>&lt;p&gt;&lt;/p&gt;&lt;p&gt;Hi all,&lt;/p&gt;&lt;p&gt;Today, I had to create a currency repository for one of my client in financial services. Easy with Informatica connected to one of the real time financial interfaces (Bloomberg, Reuters, etc …). The challenge was to gather data for all major currencies and store the Euro exchange rates over time. Easy I as said. It was done in no time.&lt;/p&gt;&lt;p&gt;Then, on coffee time, I thought : “How to do that with Kettle, with a complete free approach ?”. Here again, easy. Let me explain.&lt;/p&gt;&lt;h3&gt;The data sources&lt;/h3&gt;&lt;p&gt;From 2003 to 2005, I worked for the very official &lt;strong&gt;French National Bank (Banque de France - BDF) and the European Central Bank (ECB).&lt;/strong&gt; Since that time, I still have a lot of links and data sources about economics and stats. For Euro exchange rates, we have 2 possible data sources which are FREE of access and usage : &lt;/p&gt;&lt;ul&gt;&lt;li&gt;&lt;strong&gt;RSS streams&lt;/strong&gt; : broadcasted daily, at 14:15 (not a minute more), these RSS are easily accessible and you can leverage them to build your own exchange rate repository. Some data transformation are needed but it is very simple. These RSS streams offer 5 days of historical data. They are available on the ECB website &lt;a href="http://www.ecb.europa.eu/home/html/rss.en.html"&gt;HERE&lt;/a&gt;. &lt;/li&gt;&lt;li&gt;&lt;strong&gt;XML file&lt;/strong&gt; : like the RSS Stream, a daily XML file is available. It contains only data for the current day. Here again, you can easily load it with Kettle and the XML step with the proper parameters and configuration. The XML file is available on the ECB website &lt;a href="http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml"&gt;HERE&lt;/a&gt;. &lt;/li&gt;&lt;/ul&gt;&lt;p&gt;Let’s see everything in detail.&lt;/p&gt;&lt;h3&gt;Solution Nb1 : RSS streams from European Central Bank.&lt;/h3&gt;&lt;p&gt;If you go on the RSS page (&lt;a href="http://www.ecb.europa.eu/home/html/rss.en.html"&gt;HERE&lt;/a&gt;), you will see a lot of streams available for every currency on the market.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gytSp7zvI/AAAAAAAAAhI/eed71YMvxP8/s1600-h/image%5B3%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gyuIcFBxI/AAAAAAAAAhM/vdVhFvcLJE0/image_thumb%5B1%5D.png?imgmax=800" width="443" height="257" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;The RSS stream is easy to read and understand. Let’s click on the first one : US Dollar. As you can see, we have the currency exchange rate with the Euro and a date. We will need to do some parsing here.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gyuqnm3WI/AAAAAAAAAhQ/3n-GlIVs970/s1600-h/image%5B7%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gyvPtdhsI/AAAAAAAAAhU/8sbB4TEEyIM/image_thumb%5B3%5D.png?imgmax=800" width="635" height="91" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;Okay, we have now the RSS links and we just had a quick overview of its internal structure, now time to go playing with Kettle. First we put a “RSS Reader” on the workbench and we set it up. As you can see, I added each RSS link corresponding to each currency. No possibility to use a single RSS stream for all currencies (I will investigate this point).&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gywZuOfLI/AAAAAAAAAhY/jf522xw8NC4/s1600-h/image%5B19%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gyxOOictI/AAAAAAAAAhc/M-PPZ3V8KCQ/image_thumb%5B9%5D.png?imgmax=800" width="559" height="380" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;On the second tab, nothing to do, just be sure you have a 0 in the field “Max number”.&lt;/p&gt;&lt;p&gt;On the third tab, Fields, we want to choose only 2 fields : “Date de publication” (exchange rate timestamp) and “Titre” (the string holding the exchange rate for the Euro). Let’s use String as datatype. Below is the Field tab.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gyxoSaSBI/AAAAAAAAAhg/S6DUtcwVXak/s1600-h/image%5B18%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gyyAjAlZI/AAAAAAAAAhk/ZTPvcnM46UM/image_thumb%5B8%5D.png?imgmax=800" width="561" height="379" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;If you hit the preview button, you will see the RSS stream popping out on your screen like this. Cool.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gyyrdU9MI/AAAAAAAAAho/O45Tcw0GJbA/s1600-h/image%5B23%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gyzn-ZJtI/AAAAAAAAAhs/NqfWYNadirM/image_thumb%5B11%5D.png?imgmax=800" width="470" height="344" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;Okay, now we have to process our data in order to feed a table, somewhere on your datawarehouse or your application. Let’s have a look at the transformation I did in 3 mins for that purpose.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gyz5nSJmI/AAAAAAAAAhw/hYpLYiBr2a4/s1600-h/image%5B40%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy0l1OhAI/AAAAAAAAAh0/jc27KMnifTw/image_thumb%5B20%5D.png?imgmax=800" width="622" height="291" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;You can see the RSS Reader on the left, no need to go futher on this one. Then some other steps : &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gy03dvKEI/AAAAAAAAAh4/ZGXbJCbXmPs/s1600-h/image%5B52%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy1Rp7gRI/AAAAAAAAAh8/mEeXvcbvYPw/image_thumb%5B26%5D.png?imgmax=800" width="59" height="66" /&gt;&lt;/a&gt; &lt;strong&gt;Field split&lt;/strong&gt; : the field Titre, as we saw it with the RSS Reader, is a long string containing all what we need : the exchange rate and the international currency code (USD, EUR, CHF …). With this step, I split the Titre field into two new fields that will hold the exchange rate (Cours) and the currency code (Devise). The delimter is a space, that’s why you can’t see it on the screeshot below : &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy1we9l1I/AAAAAAAAAiA/-qz4kMNIcPw/s1600-h/image%5B36%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gy2ZOa8wI/AAAAAAAAAiE/LBtVWL8bET0/image_thumb%5B18%5D.png?imgmax=800" width="628" height="228" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gy23-uXgI/AAAAAAAAAiI/vvyQ6LObhmU/s1600-h/image%5B64%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy3cHH_kI/AAAAAAAAAiM/gZpQ9j-dcs4/image_thumb%5B32%5D.png?imgmax=800" width="42" height="48" /&gt;&lt;/a&gt; &lt;strong&gt;RATES data input :&lt;/strong&gt; We feed the target database / table. Take care to manage your historical data here, remember the RSS stream is sending 5 days of historical data each day.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gy305nv0I/AAAAAAAAAiQ/rRTr6RbmZ58/s1600-h/image%5B63%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy4U6TjdI/AAAAAAAAAiU/J8ZS7d4IY_k/image_thumb%5B31%5D.png?imgmax=800" width="91" height="68" /&gt;&lt;/a&gt; &lt;strong&gt;Keep single currency :&lt;/strong&gt; That’s the second part of the transformation. Here, we need to build a table with the couples : currency code / currency name. Remember we only have the currency code (USD …) and it would be nice to build a tiny dimension with the real name of the country and the currency. This step will only keep distinct values.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gy4wCq3VI/AAAAAAAAAiY/4JXivWnjbOA/s1600-h/image%5B44%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy5WXzTNI/AAAAAAAAAic/HPX9kXyGzeQ/image_thumb%5B22%5D.png?imgmax=800" width="457" height="249" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gy6NUrEbI/AAAAAAAAAig/j8Ra0aqgYKc/s1600-h/image%5B62%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy6tG6OCI/AAAAAAAAAik/ArCIOlC2SlA/image_thumb%5B30%5D.png?imgmax=800" width="115" height="50" /&gt;&lt;/a&gt; &lt;strong&gt;Map ECB currency code / currency name&lt;/strong&gt; : the previous distinct values will be mapped with the currency real and long names. Look at the configuration screen below, a new field is created (Devise) and you can copy paste the currency names from the ECB page where we grabbed the RSS links. The currency codes are international standards, no risk to see them changing one morning.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5g0ddVjpyI/AAAAAAAAAko/sAgCtjsjQgI/s1600-h/image%5B123%5D.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: inline; BORDER-TOP: 0px; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5g0eLlUSTI/AAAAAAAAAks/hIPEwZU3G24/image_thumb%5B61%5D.png?imgmax=800" width="511" height="517" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gy8QeEu2I/AAAAAAAAAiw/9S6Z5lyI1YQ/s1600-h/image%5B70%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gy87vz8cI/AAAAAAAAAi0/QRieUyk6jBk/image_thumb%5B36%5D.png?imgmax=800" width="61" height="52" /&gt;&lt;/a&gt;&lt;strong&gt; CURRENCIES data input : &lt;/strong&gt;Final step, we feed the currency dimension with the couples code / currency name. Since it is a typical short dimension and new currencies are not frequent, you can update this dimension once in a while, or when a new RSS is added …&lt;/p&gt;&lt;p&gt;If we make a quick extract of the data we created, the output will look like this for the rates (left) and the currencies (right).&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gy9fR0-iI/AAAAAAAAAi4/tH5vSrH4D0g/s1600-h/image%5B73%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gy9wIcuTI/AAAAAAAAAi8/IzCbGO2V-n0/image_thumb%5B37%5D.png?imgmax=800" width="177" height="244" /&gt;&lt;/a&gt; &lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gy-c8Zz5I/AAAAAAAAAjA/agMC8ZeZ-j8/s1600-h/image%5B76%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy-37RjVI/AAAAAAAAAjE/WMjMF3VjOFM/image_thumb%5B38%5D.png?imgmax=800" width="125" height="244" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;h3&gt;Solution Nb2 : XML file from the European Central Bank.&lt;/h3&gt;&lt;p&gt;Well, doing the same with an XML input is possible and easy too. First, we need to find the appropriate XML file. This one can be found &lt;a href="http://www.ecb.europa.eu/stats/eurofxref/eurofxref-daily.xml"&gt;HERE&lt;/a&gt;. It is quite simple and interesting data is the timestamp, the currency and the rate. Remember, this XML file only contain data for one day.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gy_ppfeWI/AAAAAAAAAjI/SoA2XROk8Ps/s1600-h/image%5B81%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gzAqCDPkI/AAAAAAAAAjM/nJVXDU4n9-s/image_thumb%5B41%5D.png?imgmax=800" width="625" height="491" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;This XML file is available everyday at 14:15, so you can schedule your job to run in order to gather the lastest data.&lt;/p&gt;&lt;p&gt;Let’s process this file now. For this, we will create a transformation looking like this one below. We will also create a currency dimension table, like we did for the previous example.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gzBK-_MKI/AAAAAAAAAjQ/JoUH68bhYQY/s1600-h/image%5B90%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gzCanu_OI/AAAAAAAAAjU/66DbBvdt1Vw/image_thumb%5B46%5D.png?imgmax=800" width="623" height="334" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gzCjKPxKI/AAAAAAAAAjY/4uLeptikfA8/s1600-h/image%5B93%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gzDXH_sNI/AAAAAAAAAjc/MugCHm9Hfvw/image_thumb%5B47%5D.png?imgmax=800" width="75" height="67" /&gt;&lt;/a&gt; XML Extract : this is the core component of this transformation. It can read an XML file, parse it, process it, based on your Xpath query. Let’s have a look about configuration.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gzEc6ivBI/AAAAAAAAAjg/v5SU3ljhKbY/s1600-h/image%5B103%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gzFAB6wqI/AAAAAAAAAjk/f5bMxMQvNts/image_thumb%5B51%5D.png?imgmax=800" width="604" height="321" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;The link to the XML file has to be written in the main tab. The second tab is more sensitive : here we have to specify a XPath for the document. In our case, the XPath must be : &lt;strong&gt;/gesmes:Envelope/*[name()='Cube']/*[name()='Cube']/*[name()='Cube']&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gzFpcmQ3I/AAAAAAAAAjo/5gz9H-sG-8g/s1600-h/image%5B111%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gzGNqC9QI/AAAAAAAAAjs/cJoCj1ZnjKk/image_thumb%5B55%5D.png?imgmax=800" width="598" height="318" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;The last tab (Fields) is also very important : we need to indicate the XPath attributes in order to reach the elements we need. In our case, &lt;strong&gt;@currency&lt;/strong&gt;, &lt;strong&gt;@rate&lt;/strong&gt; and &lt;strong&gt;@time&lt;/strong&gt; are mandatory.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5gzG_ACSRI/AAAAAAAAAjw/AhvYTVebBtY/s1600-h/image%5B107%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gzHb26lMI/AAAAAAAAAj0/tLUUNSOrYbs/image_thumb%5B53%5D.png?imgmax=800" width="603" height="320" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gzHnAMkzI/AAAAAAAAAj4/ycylldhxs-o/s1600-h/image%5B96%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gzI6tYcII/AAAAAAAAAj8/4UZwH9yRor4/image_thumb%5B48%5D.png?imgmax=800" width="57" height="71" /&gt;&lt;/a&gt; &lt;strong&gt;RATES :&lt;/strong&gt; the XML file is directly written into a target database / table. No need for custom transformation here. The target data looks like the previous example.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gzJNTznFI/AAAAAAAAAkA/8bEAl2saCSU/s1600-h/image%5B99%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gzJi4h1OI/AAAAAAAAAkE/yCpK1zYhqo0/image_thumb%5B49%5D.png?imgmax=800" width="179" height="72" /&gt;&lt;/a&gt; &lt;strong&gt;Map ECB currency code / country :&lt;/strong&gt; we still need to create a tiny dimension with the couples :currency code / currency full name. Same process as the previous example. Luckily, the currency codes are a worldwide standard and we can re use the step from the previous transformation.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy7Mp4LsI/AAAAAAAAAkI/PD8JxYGEPdM/s1600-h/image%5B115%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5gy71oAeII/AAAAAAAAAkM/rqMtINIFSJw/image_thumb%5B57%5D.png?imgmax=800" width="192" height="195" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gzK0u5cDI/AAAAAAAAAkQ/c1a10NPEueY/s1600-h/image%5B114%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gzLekJFHI/AAAAAAAAAkU/yrl6ohQoOIw/image_thumb%5B56%5D.png?imgmax=800" width="84" height="78" /&gt;&lt;/a&gt; &lt;strong&gt;CURRENCIES :&lt;/strong&gt; finally, the currency dimension is written into its target database / table.&lt;/p&gt;&lt;p&gt;The rates are on the left while the currencies – inchanged – are on the right. Remember : only one day of data is available with the XML file. You will maybe notice that the date format is different compared to the RSS data (yyyy-MM-dd versus dd/MM/yy) but this is something you can easily manage if necessary.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gzLu-oUYI/AAAAAAAAAkY/kqFCjtxwl1M/s1600-h/image%5B118%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gzMPd4UMI/AAAAAAAAAkc/CuRDEQLA-NU/image_thumb%5B58%5D.png?imgmax=800" width="226" height="240" /&gt;&lt;/a&gt; &lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5gy-c8Zz5I/AAAAAAAAAjA/agMC8ZeZ-j8/s1600-h/image%5B76%5D.png"&gt;&lt;img style="BORDER-RIGHT-WIDTH: 0px; DISPLAY: inline; BORDER-TOP-WIDTH: 0px; BORDER-BOTTOM-WIDTH: 0px; BORDER-LEFT-WIDTH: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5gy-37RjVI/AAAAAAAAAjE/WMjMF3VjOFM/image_thumb%5B38%5D.png?imgmax=800" width="125" height="244" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;Well, this was a quick and handy way to reach some official currency data and process it in Kettle. Of course, you can easily customize and optimize these jobs. &lt;/p&gt;&lt;p&gt;If you have troubles running theses examples, feel free to reach me and I will provide you with the transformation files.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2211743881635720388?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2211743881635720388/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2211743881635720388' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2211743881635720388'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2211743881635720388'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/currency-repository-with-kettle.html' title='Currency repository with kettle'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/S5gyuIcFBxI/AAAAAAAAAhM/vdVhFvcLJE0/s72-c/image_thumb%5B1%5D.png?imgmax=800' height='72' width='72'/><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1988159114301389115</id><published>2010-03-09T07:01:00.001-08:00</published><updated>2010-12-15T08:04:03.531-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Cloud Computing'/><title type='text'>Send file to S3 with cool GUI</title><content type='html'>Hi all,&lt;br /&gt;A cool way to manage your S3 assets : &lt;a href="http://s3fm.com/" title="http://s3fm.com/"&gt;http://s3fm.com/&lt;/a&gt;. You can create / edit / delete buckets as well as uploading files and downloading also.&lt;br /&gt;I’m currently writing some code that could be used in Kettle for sending and retrieving files to/from S3. Soon to come, I’m still working on it.&lt;br /&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Zi2wgRtjI/AAAAAAAAAhA/zU_NzHfEktc/s1600-h/image%5B8%5D.png"&gt;&lt;img alt="image" border="0" height="407" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Zi3v7qgmI/AAAAAAAAAhE/gCCoqIjjMvc/image_thumb%5B4%5D.png?imgmax=800" style="border-bottom: 0px; border-left: 0px; border-right: 0px; border-top: 0px; display: inline;" title="image" width="636" /&gt;&lt;/a&gt; &lt;br /&gt;In this example, I’m sending US public data set (consumer expanditure survey) flat file, available on Amazon, that will be later used for data processing and stats in order to validate a stream.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1988159114301389115?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1988159114301389115/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1988159114301389115' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1988159114301389115'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1988159114301389115'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/send-file-to-s3-with-cool-gui.html' title='Send file to S3 with cool GUI'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh3.ggpht.com/_hTlcWbt-BP4/S5Zi3v7qgmI/AAAAAAAAAhE/gCCoqIjjMvc/s72-c/image_thumb%5B4%5D.png?imgmax=800' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3603648430317803695</id><published>2010-03-08T16:23:00.001-08:00</published><updated>2010-03-22T09:23:02.067-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Sending tweets with Kettle !</title><content type='html'>&lt;p&gt; &lt;/p&gt;&lt;p&gt;Hi all !&lt;/p&gt;&lt;p&gt;Recently, while meeting clients, I was speaking a lot about pervasive BI, operational BI, real time BI …. well all these “new” tendancies about BI. On top of that, I’m currently delivering some mobile BI features for one of my clients, based on Cognos tools.&lt;/p&gt;&lt;p&gt;Then I thought to myself : “Why not sending *intelligent* tweets with an ETL tool ?”. &lt;/p&gt;&lt;p&gt;Let’s imagine we are a very successfull company and we want to tweet on how fantastic are our sales. Or imagine that, for any reason, you regularly gather some data and put it on twitter for public release (I’m doing this with some weather probe data from the family farm in the south, while I’m living in Paris …).&lt;/p&gt;&lt;p&gt;Ok, just ask and it’s done.&lt;/p&gt;&lt;h3&gt;Sending tweets with Kettle&lt;/h3&gt;&lt;p&gt;The process is the following : &lt;/p&gt;&lt;ul&gt;&lt;li&gt;Query a table to retrieve the nb of items sold and the total amount of sales. For simplicicty you can also generate rows in kettle.&lt;/li&gt;&lt;li&gt;Build the tweet and the command that will send it,&lt;/li&gt;&lt;li&gt;Send the tweet.&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;I will use one transformation and one job. The transformation will retrieve the data from the table, then send this data (1 row, 2 columns) to the job. This job will build the tweet and send it to twitter. Easy.&lt;/p&gt;&lt;h3&gt;The transformation&lt;/h3&gt;&lt;p&gt;First, a &lt;strong&gt;Table input&lt;/strong&gt;. This step holds the SQL query to retrieve the data I need (1 row, 2 colunms) : &lt;/p&gt;&lt;p&gt;&lt;strong&gt;&lt;em&gt;Select sum(sold_items), sum(amount) from sales_facts&lt;/em&gt;&lt;/strong&gt;&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5WU4ApA1iI/AAAAAAAAAeQ/hDe0QL_uBTc/s1600-h/image%5B3%5D.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: block; FLOAT: none; MARGIN-LEFT: auto; BORDER-TOP: 0px; MARGIN-RIGHT: auto; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5WU4uuxkZI/AAAAAAAAAeU/Jyz0CFoifAU/image_thumb%5B1%5D.png?imgmax=800" width="280" height="107" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;Then, the &lt;strong&gt;Set Variables&lt;/strong&gt; step will capture and map the query results with variables. These variables will be accessible into the job. Use the button &lt;strong&gt;Get Fields&lt;/strong&gt; for simplicity.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5WU5DijPLI/AAAAAAAAAec/6UUR1ZSniAM/s1600-h/image%5B8%5D.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: block; FLOAT: none; MARGIN-LEFT: auto; BORDER-TOP: 0px; MARGIN-RIGHT: auto; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5WU5lULFMI/AAAAAAAAAeg/pCG1UBkZYsA/image_thumb%5B4%5D.png?imgmax=800" width="520" height="246" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;h3&gt;The Job&lt;/h3&gt;&lt;p&gt;The job has 3 steps : the usual &lt;strong&gt;Start&lt;/strong&gt; step, the &lt;strong&gt;Transformation&lt;/strong&gt; step linked to the transformation we just created above and a &lt;strong&gt;Shell&lt;/strong&gt; step. Have a look.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5WU54bgWsI/AAAAAAAAAek/ydBINYe9a4w/s1600-h/image%5B12%5D.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: block; FLOAT: none; MARGIN-LEFT: auto; BORDER-TOP: 0px; MARGIN-RIGHT: auto; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5WU6Qjq__I/AAAAAAAAAeo/feXDirBT1wo/image_thumb%5B6%5D.png?imgmax=800" width="420" height="126" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;Nothing special here, except I checked the option “&lt;strong&gt;Copy previous results to args&lt;/strong&gt;” inside the transformation settings.&lt;/p&gt;&lt;p&gt;Then, we have the Shell step. This step will call a famous linux utility : &lt;a href="http://curl.haxx.se/"&gt;cURL&lt;/a&gt;. &lt;em&gt;cURL is a command line tool for transferring data with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS, FILE, IMAP, SMTP, POP3 and RTSP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos...), file transfer resume, proxy tunneling.&lt;/em&gt;&lt;/p&gt;&lt;p&gt;I recommand using cURL under Linux. The windows release is a pain in the neck (installation issues, dll issues, issue with libeay32.dll, etc ….).&lt;/p&gt;&lt;p&gt;The command to send a tweet with cURL is quite easy : &lt;strong&gt;curl --basic --user yourtwitteraccount:yourtwitterpasswd --data status=”Your tweet” &lt;/strong&gt;&lt;a href="http://twitter.com/statuses/update.xml"&gt;&lt;strong&gt;http://twitter.com/statuses/update.xml&lt;/strong&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;You can build this command with the Shell step with the following configuration : &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5WU678aN6I/AAAAAAAAAes/wecjDfdbwEU/s1600-h/image%5B20%5D.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: inline; BORDER-TOP: 0px; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5WU7oj9QJI/AAAAAAAAAew/0eR73vCo84k/image_thumb%5B10%5D.png?imgmax=800" width="444" height="455" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;u&gt;Some explanations here : &lt;/u&gt;&lt;/p&gt;&lt;p&gt;I’m not calling cURL but I use a wrapper script file. I had some issues calling directly the cURL executable with /usr/bin/curl. Using a wrapper script file seems more appropriate. Look at my file below. Parameter $2 has to be embrassed with double quotes ; usefull if you need to send tweets with more than one word ;) &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5WU8MMF3UI/AAAAAAAAAe0/tUieou10FDE/s1600-h/image%5B24%5D.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: block; FLOAT: none; MARGIN-LEFT: auto; BORDER-TOP: 0px; MARGIN-RIGHT: auto; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5WU9Bu5N6I/AAAAAAAAAe4/YJ17rctNdVk/image_thumb%5B12%5D.png?imgmax=800" width="393" height="164" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;As you can see, my curl wrapper file is waiting for 3 params, and these 3 params are passed in the &lt;strong&gt;Fields&lt;/strong&gt; zone of the &lt;strong&gt;Shell&lt;/strong&gt; window step. I obfuscated my Twitter account, in red. Here is the detail for these parameters : &lt;/p&gt;&lt;ul&gt;&lt;li&gt;$1 : The Twitter username and password with the syntax [username:password],&lt;/li&gt;&lt;li&gt;$2 : The tweet message, with double quotes,&lt;/li&gt;&lt;li&gt;$3 : The endpoint : the Twitter destination aka &lt;a href="http://twitter.com/statuses/update.xml"&gt;http://twitter.com/statuses/update.xml&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;&lt;/p&gt;&lt;p&gt;In my exemple, I will send a tweet saying : “We sold [total of items sold] items for a total amount of [total amount of sales] USD.”&lt;/p&gt;&lt;h3&gt;&lt;/h3&gt;&lt;h3&gt;Sending the tweet !&lt;/h3&gt;&lt;p&gt;Let’s save the work and start the job. You will see a verbose XML command output, this is good sign. Have a look below : &lt;/p&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5WU96DvogI/AAAAAAAAAe8/yLYAYQY2S74/s1600-h/image%5B33%5D.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: block; FLOAT: none; MARGIN-LEFT: auto; BORDER-TOP: 0px; MARGIN-RIGHT: auto; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5WU-fvpYaI/AAAAAAAAAfA/ayM8MdLS78U/image_thumb%5B17%5D.png?imgmax=800" width="513" height="269" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;h3&gt;Is my tweet really published ?&lt;/h3&gt;&lt;p&gt;Yes, of course it is ! Connect to your Twitter profile, and check for the tweet.&lt;/p&gt;&lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5WU-1HfpXI/AAAAAAAAAfE/2TeBNyQtnbA/s1600-h/image%5B34%5D.png"&gt;&lt;img style="BORDER-BOTTOM: 0px; BORDER-LEFT: 0px; DISPLAY: block; FLOAT: none; MARGIN-LEFT: auto; BORDER-TOP: 0px; MARGIN-RIGHT: auto; BORDER-RIGHT: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5WU_gvR-CI/AAAAAAAAAfI/ARolVqq1YRw/image_thumb%5B18%5D.png?imgmax=800" width="546" height="333" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;h3&gt;The goodies&lt;/h3&gt;&lt;p&gt;The transformation can be downloaded &lt;a href="http://dl.free.fr/qiYKfIby5"&gt;HERE&lt;/a&gt;.&lt;/p&gt;&lt;p&gt;The job can be downloaded &lt;a href="http://dl.free.fr/qGzFMHA8Q"&gt;HERE&lt;/a&gt;.&lt;/p&gt;&lt;p&gt; &lt;/p&gt;&lt;p&gt;Enjoy and find new usage for this !!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3603648430317803695?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3603648430317803695/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3603648430317803695' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3603648430317803695'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3603648430317803695'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/sending-tweets-with-kettle.html' title='Sending tweets with Kettle !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh5.ggpht.com/_hTlcWbt-BP4/S5WU4uuxkZI/AAAAAAAAAeU/Jyz0CFoifAU/s72-c/image_thumb%5B1%5D.png?imgmax=800' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8779081202712108672</id><published>2010-03-05T07:57:00.001-08:00</published><updated>2010-12-06T13:36:31.734-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Amazon SimpleDB data loading with Kettle !!</title><content type='html'>  &lt;p&gt;Hi all !&lt;/p&gt;  &lt;p align="justify"&gt;It's been a few days I'm thinking about feeding Amazon SimpleDB with an ETL tool like Kettle / PDI.   &lt;br /&gt;Well, it's done. I have a working prototype. It’s a “quick and dirty” prototype of course but it works. I hope we will soon have an official Kettle plugin for that.    &lt;br /&gt;&lt;/p&gt;  &lt;h3&gt;&lt;/h3&gt;  &lt;h3&gt;Requirements&lt;/h3&gt;  &lt;p align="justify"&gt;You have to be familiar with Amazon AWS, EC2 and SimpleDB. Of course you need a valid account on Amazon Web Services. If you want to learn more about SimpleDB, click &lt;a href="http://aws.amazon.com/simpledb/"&gt;HERE&lt;/a&gt;. You can play with SimpleDB with a graphical interface before starting hard stuff, click here for the &lt;a href="http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1137&amp;amp;categoryID=149"&gt;ScratchPad&lt;/a&gt; (don’t forget to browser the javascript source code, a lot to learn here !).&lt;/p&gt;  &lt;p align="justify"&gt;You need to know how to use Kettle, the famous data integration tool from Pentaho. To learn more about Kettle, follow this &lt;a href="http://wiki.pentaho.com/display/EAI/Latest+Pentaho+Data+Integration+%28aka+Kettle%29+Documentation"&gt;link&lt;/a&gt;. To discover the full Pentaho BI solution, click &lt;a href="http://www.pentaho.com/"&gt;here&lt;/a&gt;. I recommand you to discover &lt;a href="http://www.pentaho.com/products/try_bi_suite.php?fotm=y"&gt;Pentaho BI Suite Enterprise Edition&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;The process&lt;/h3&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Epzh7PLYI/AAAAAAAAAc4/Wt9YVdjFW5Q/s1600-h/image%5B9%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5Ep0oBW_uI/AAAAAAAAAc8/Ugbk7txGe4E/image_thumb%5B7%5D.png?imgmax=800" width="632" height="339" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p align="justify"&gt;First you have to know how SimpleDB is organized and how it’s working.   &lt;br /&gt;For the developper, SimpleDB is not seen a traditional relational database like the ones he’s used to work with. Instead of thinking in terms of tables and columns, you have to face a different approach : data is organized within Domains, which are similar to an Excel tab. Then, inside a domain, data is stored with the couple : Attribute/Value. XML guys won’t be suprised with this storage method.    &lt;br /&gt;Let's first have a look at a typical relational table. Just a reminder ;) &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep1VykhyI/AAAAAAAAAdA/GjB3C32x0XA/s1600-h/image%5B14%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5Ep2JZXpkI/AAAAAAAAAdE/XeRQiTGnq1w/image_thumb%5B10%5D.png?imgmax=800" width="633" height="132" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Now, let’s see how your data will look like once store inside SimpleDB. A bit of XML now. As you can see, this extract represents the first line of the relational table show above. This row is composed of an item name (let’s say for convenience, but it’s false, it’s like the primary key) and attributes. These attributes are made of a &lt;strong&gt;Name&lt;/strong&gt; and an associated &lt;strong&gt;Value&lt;/strong&gt;.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5Ep2t9EKMI/AAAAAAAAAdI/5R3CF2_z9LA/s1600-h/image%5B23%5D.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep3SL8DxI/AAAAAAAAAdM/7KCGXH_6V-o/image_thumb%5B15%5D.png?imgmax=800" width="338" height="575" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;See the difference ? That's the Amazon SimpleDB API. I'm pretty sure that data, at low level, is finally stored into a relational schema, somewhere... But for the developper, this is the way it’s must be done. &lt;/p&gt;  &lt;p&gt;Okay, okay. But how to transform my relational structure into something that will be received and understood by the SimpleDB API ? We have two challenges here : transformation and sending. Ok, go for it.&lt;/p&gt;  &lt;h3&gt;The Mapping !&lt;/h3&gt;  &lt;p&gt;Here is my transformation, done with Kettle. Pretty simple, uh ? Let’s go in detail now…&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5Ep325bRcI/AAAAAAAAAdQ/mtAmuLPZyRY/s1600-h/image%5B27%5D.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep4kPPnSI/AAAAAAAAAdU/jhR6mYPf4Ws/image_thumb%5B17%5D.png?imgmax=800" width="636" height="340" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;First you have a CSV file input. This data input will be reformated to build Name/Values couples and then these couples will be concatenated into a valid URL. Once signed, this URL will be sent to Amazon API and the data will be inserted into the domain (previously created). You can see, on my transformation, a File Output : I use it, sometimes, for debugging. In our exemple, it was easy for me to see and analyse the generated URL into a notepad (here, the link is not activated).&lt;/p&gt;  &lt;p&gt;In my example I will use a typical csv file as data source (based on the same relational table shown above). Here is my flat file, typical with ; as separators. &lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: arial; font-size: 78%"&gt;ID;Category;Subcat;Name;Color;Size;Make;Model;Year     &lt;br /&gt;Item_01;Clothes;Sweater;Cathair Sweater;Siamese;Small, Medium, Large;Nike;Swoosh;2003      &lt;br /&gt;Item_02;Clothes;Pants;Designer Jeans;Paisley Acid Wash;30x32, 32x32, 32x34;Trusardi;BigButt;2005      &lt;br /&gt;Item_03;Clothes;Pants;Sweatpants;Blue, Yellow, Pink;Large;Diesel;Steel;2006, 2007      &lt;br /&gt;Item_04;Car Parts;Engine;Turbos;Pink;Medium;Audi;S4;2000, 2001, 2002      &lt;br /&gt;Item_05;Car Parts;Emissions;O2 Sensor;Black;Small;Audi;S4;2000, 2001, 2002&lt;/span&gt;&lt;/p&gt;    &lt;h3&gt;The JScript Code !&lt;/h3&gt;  &lt;p&gt;I confess : I only wrote 5% of the JScript code. Let me explain. When you suscribe to Amazon SimpleDB, you can download the official API, written in Java, and use it to create, manage and populate your SimpleDB domain. Java is very usefull of course, but I was looking for Jscript in order to put everything into Kettle. Then I downloaded &lt;strong&gt;Amazon SimpleDB ScratchPad&lt;/strong&gt;. This is a nice utility that allows you to play with SimpleDB without coding, just the mouse. When looking into this application directories, you can find all the Jscript source code needed ! Then my work consisted on porting the ScratchPad code into a Kettle Jscript step, with some adjustments.&lt;/p&gt;  &lt;p&gt;This code is a bit long to be shown here, so click &lt;a href="http://dl.free.fr/tTkTMbYCz"&gt;HERE&lt;/a&gt; to download it. Let’s have a hi level overview of the JScript layout.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep5A4hFuI/AAAAAAAAAdY/gv_b2dlkwhM/s1600-h/image%5B31%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep5msLMxI/AAAAAAAAAdc/PeRIfHqQDzQ/image_thumb%5B19%5D.png?imgmax=800" width="631" height="363" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The process if very simple : each row is cutted into &lt;strong&gt;Name/Values&lt;/strong&gt; couples (URL building &amp;amp; URL formating routines), these couples are then concatenated into a valid URL (URL concatenation). This URL is then signed (SHA-1 hash algo) and sent to the HTTP client step.&lt;/p&gt;  &lt;p&gt;Here is the basic code to create the URL : &lt;/p&gt;  &lt;p&gt;&lt;span style="font-size: 78%"&gt;var URL2POST = &amp;quot;&lt;/span&gt;&lt;a href="https://sdb.amazonaws.com%22/"&gt;&lt;span style="font-size: 78%"&gt;https://sdb.amazonaws.com&amp;quot;&lt;/span&gt;&lt;/a&gt;    &lt;br /&gt;&lt;span style="font-size: 78%"&gt;+ &amp;quot;?SignatureVersion=1&amp;amp;Action=&amp;quot; + &amp;quot;PutAttributes&amp;quot;     &lt;br /&gt;+ &amp;quot;&amp;amp;Version=&amp;quot; + encodeURIComponent(&amp;quot;2009-04-15&amp;quot;)      &lt;br /&gt;+ &amp;quot;&amp;amp;DomainName=&amp;quot; + encodeURIComponent('MyStore')      &lt;br /&gt;+ &amp;quot;&amp;amp;ItemName=&amp;quot; + encodeURIComponent(ID)      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.1.Name=Category&amp;quot;      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.1.Value=&amp;quot; + encodeURIComponent(Category)      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.2.Name=Subcat&amp;quot;      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.2.Value=&amp;quot; + encodeURIComponent(Subcat)      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.3.Name=Subcat&amp;quot;      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.3.Value=&amp;quot; + encodeURIComponent(Name)      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.4.Name=Color&amp;quot;      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.4.Value=&amp;quot; + encodeURIComponent(Color)      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.5.Name=Size&amp;quot;      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.5.Value=&amp;quot; + encodeURIComponent(Size)      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.6.Name=Make&amp;quot;      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.6.Value=&amp;quot; + encodeURIComponent(Make)      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.7.Name=Model&amp;quot;      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.7.Value=&amp;quot; + encodeURIComponent(Model)      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.8.Name=Year&amp;quot;      &lt;br /&gt;+ &amp;quot;&amp;amp;Attribute.8.Value=&amp;quot; + encodeURIComponent(Year)      &lt;br /&gt;+ &amp;quot;&amp;amp;Timestamp=&amp;quot; + timestamp      &lt;br /&gt;+ &amp;quot;&amp;amp;AWSAccessKeyId=&amp;quot; + encodeURIComponent(accesskey);&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;Note that in my Jscript code, &lt;strong&gt;I didn’t make any loop to go though all the source columns&lt;/strong&gt;. As it is a quick proof of concept, based on fixed length data structure, I used one line of code for each column in order to create Name/Value couples. If you look closely into Amazon Scratchpad code source, you will see a loop in the function “generateSignedURL”. &lt;strong&gt;This is how things have to be done of course !&lt;/strong&gt;&lt;/p&gt;  &lt;p&gt;The final URL looks like this one : &lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep6aosEuI/AAAAAAAAAdg/yciyB_4h8hY/s1600-h/image%5B36%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5Ep6wQFgUI/AAAAAAAAAdk/gQrkwrHUblw/image_thumb%5B22%5D.png?imgmax=800" width="638" height="116" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;p&gt;Let’s see it more in detail : &lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;The endpoint : &lt;a href="https://sdb.amazonaws.com/"&gt;https://sdb.amazonaws.com&lt;/a&gt; &lt;/li&gt;    &lt;li&gt;The SignatureVersion, always 1 for me. &lt;/li&gt;    &lt;li&gt;The Action needed, in our case PutAttributes, in order to load data into the domain. &lt;/li&gt;    &lt;li&gt;The Version, always 2009-04-15. Don’t know why … &lt;/li&gt;    &lt;li&gt;The DomainName : MyStore, in my case. You can create yours easily. &lt;/li&gt;    &lt;li&gt;The ItemName : Item_01 corresponding to my primary key. &lt;/li&gt;    &lt;li&gt;Then you have all the Name/Values couples : Attribute names and Attribute value. &lt;/li&gt;    &lt;li&gt;A timestamp : calculated by a Jscript function. &lt;/li&gt;    &lt;li&gt;Your AWS Access Key. Mine is obfuscated in the exemple above. &lt;/li&gt;    &lt;li&gt;Your Signature : this is you secret AWS Access key that will be signed by the SHA-1 hash algo, as seen above. Obfuscated here again.&lt;/li&gt; &lt;/ul&gt;  &lt;h3&gt;Security&lt;/h3&gt;  &lt;p&gt;Let’s talk about these AWS Access Keys and Signature. In my proof of concept, these keys are stored in clear in my JScript. Of course, this is not recommanded. I let you imagine a more convenient way to be more secure (parameters, repository …).&lt;/p&gt;  &lt;h3&gt;Let’s send it to Amazon !&lt;/h3&gt;  &lt;p&gt;Pretty easy now, each row will be sent to a HTTP client step, using a Jscript variable called URL2POST. This step will send the URL to Amazon SimpleDB and the row will be inserted into your domain.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5Ep7cw6yGI/AAAAAAAAAdo/6uyKFjpftWM/s1600-h/image%5B44%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep7z-WaVI/AAAAAAAAAds/lb9ogL9u2ho/image_thumb%5B26%5D.png?imgmax=800" width="563" height="279" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;For the moment, I have no time to handle any return code from Amazon API but it’s very easy since Amazon sends you back an XML message like the one below, in case of success. In case of failure, the message is self explanatory.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5Ep8DZEIlI/AAAAAAAAAdw/cmGcu8R4_yU/s1600-h/image%5B48%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep8i4XdQI/AAAAAAAAAd0/VMtJU7FqtVU/image_thumb%5B28%5D.png?imgmax=800" width="577" height="138" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;h3&gt;The goodies !&lt;/h3&gt;  &lt;p&gt;You can find the Kettle transformation &lt;a href="http://www.decisionsystems-studio.fr/Downloads/Feed_SimpleDB.ktr"&gt;HERE&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;You can find the Jscript &lt;a href="http://www.decisionsystems-studio.fr/Downloads/JScript.txt"&gt;HERE&lt;/a&gt;.&lt;/p&gt;  &lt;p&gt;You can find my little flat file &lt;a href="http://www.decisionsystems-studio.fr/Downloads/FlatData.csv"&gt;HERE&lt;/a&gt;.&lt;/p&gt;  &lt;h3&gt;How to be sure the data is in ?&lt;/h3&gt;  &lt;p&gt;Pretty easy. Start the Amazon Scratchpad utility, enter your access code and key, go to GetAttributes API drop down menu and fill in the Domain Name and one Item_Name. Have look here. Note : my keys are obfuscated here again.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5Ep9lzwwQI/AAAAAAAAAd4/0XBYnJgj-Uo/s1600-h/image%5B56%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5Ep-ZUq18I/AAAAAAAAAd8/gnjMLLWeevE/image_thumb%5B32%5D.png?imgmax=800" width="434" height="365" /&gt;&lt;/a&gt; Hit “Invoke Request” button, and see your data.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5Ep-5uYAOI/AAAAAAAAAeA/hi67s2MT8S8/s1600-h/image%5B60%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep_6NyrII/AAAAAAAAAeE/sKCWlZ0T_KY/image_thumb%5B34%5D.png?imgmax=800" width="416" height="411" /&gt;&lt;/a&gt; There is another way to check your data. You can write a SQL query in order to see the whole data stored in a given DomainName. Here again, with the ScratchPad, go to “Select” in the API drop down menu. Then enter “select * from MyStore” in the Select Expression field. Hit Invoke Request button, and you will see all your data.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5EqAr7XIuI/AAAAAAAAAeI/2LIzHC0r_XM/s1600-h/image%5B65%5D.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5EqBBBAdnI/AAAAAAAAAeM/yMPKFepm9jA/image_thumb%5B37%5D.png?imgmax=800" width="426" height="359" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;The output will look like this one (continues for each Item …).&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5Ep2t9EKMI/AAAAAAAAAdI/5R3CF2_z9LA/s1600-h/image%5B23%5D.png"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5Ep3SL8DxI/AAAAAAAAAdM/7KCGXH_6V-o/image_thumb%5B15%5D.png?imgmax=800" width="338" height="575" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;h3&gt;That’s nice, but what for ?&lt;/h3&gt;  &lt;p&gt;Imagine you have, like me, to think about storing emails or a call center knowledge base on the cloud. You have messages, and you have headers. Why not storing headers in SimpleDB and message bodies into S3 ? That’s a good solution. In that case, SimpleDB will handle a few attributes while the heavy data will be stored into S3, with the help of any third party database (open source or not). Of course, you have to manage the link between S3 data and SimpleDB headers, but that’s another story …&lt;/p&gt;  &lt;h3&gt;&lt;/h3&gt;  &lt;h3&gt;More to come&lt;/h3&gt;  &lt;p&gt;Please, give me a feedback for this article. I’m currently working on something more reliable and more professional. If I have time, I will try to write a Kettle plugin.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8779081202712108672?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8779081202712108672/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8779081202712108672' title='22 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8779081202712108672'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8779081202712108672'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/03/amazon-simpledb-data-loading-with.html' title='Amazon SimpleDB data loading with Kettle !!'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/S5Ep0oBW_uI/AAAAAAAAAc8/Ugbk7txGe4E/s72-c/image_thumb%5B7%5D.png?imgmax=800' height='72' width='72'/><thr:total>22</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-462165790366183602</id><published>2010-02-01T03:33:00.000-08:00</published><updated>2010-02-01T03:39:29.162-08:00</updated><title type='text'>FOSDEM !!!!</title><content type='html'>Don't miss it !&lt;br /&gt;&lt;br /&gt;&lt;a href="http://www.fosdem.org/"&gt;&lt;img alt="FOSDEM, the Free and Open Source Software Developers' European Meeting" src="http://www.fosdem.org/promo/fosdem" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;ULB Campus Solbosh&lt;br /&gt;Avenue Franklin D. Roosevelt, 50&lt;br /&gt;1050 Bruxelles&lt;br /&gt;&lt;br /&gt;&lt;p align="left"&gt;&lt;a href="http://fosdem.org/maps/campus"&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 271px; DISPLAY: block; HEIGHT: 313px; CURSOR: hand" border="0" alt="" src="http://fosdem.org/maps/campus" /&gt;&lt;/a&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-462165790366183602?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/462165790366183602/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=462165790366183602' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/462165790366183602'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/462165790366183602'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/02/fosdem.html' title='FOSDEM !!!!'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8491096260501562490</id><published>2010-01-27T06:59:00.001-08:00</published><updated>2010-01-27T07:45:56.954-08:00</updated><title type='text'>Sparklines for Excel !</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;Today I want to speak about Sparklines for Excel, "a set of free user defined functions for Excel to create sparklines".&lt;br /&gt;Sparklines ? You know, those little graphics, often cell sized, as seen on a lot of financial / stock exchange / scientific screens.&lt;br /&gt;&lt;br /&gt;Well, Frabrice Rimlinger is a smart french guy who coded these sparklines for Excel. VBA code, for all Excel flavors.&lt;br /&gt;&lt;br /&gt;Have a look to his blog and download the macros. It is worth a try !&lt;br /&gt;&lt;a href="http://sparklines-excel.blogspot.com/"&gt;http://sparklines-excel.blogspot.com/&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://img.chandoo.org/v/l/sales-data-dashboard-alex-kerin-1-excel.png"&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 736px; DISPLAY: block; HEIGHT: 540px; CURSOR: hand" border="0" alt="" src="http://img.chandoo.org/v/l/sales-data-dashboard-alex-kerin-1-excel.png" /&gt;&lt;/a&gt; &lt;/p&gt;&lt;p&gt;&lt;span style="font-family:georgia;"&gt;And now, my own personal usage of Sparkline : graphically showing source table sizes, related to my current datawarehouse project in Paris. I'm using the treemap component : each square size is calculated reagarding the table size. Colorization is done with a dynamic color scale.&lt;/span&gt; &lt;/p&gt;&lt;p&gt;&lt;br /&gt;&lt;/p&gt;&lt;p&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 501px; DISPLAY: block; HEIGHT: 316px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5431438437657563058" border="0" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/S2BYe2f7T7I/AAAAAAAAAcs/9wKXUIhYj94/s400/MySparklines.bmp" /&gt; &lt;/p&gt;&lt;p&gt;Nice, isn't it ? And you know what ? Theses sparklines can be used with an OLAP Excel frontend : PALO, TM1 ... your choice... &lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8491096260501562490?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8491096260501562490/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8491096260501562490' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8491096260501562490'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8491096260501562490'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2010/01/sparklines-for-excel.html' title='Sparklines for Excel !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_hTlcWbt-BP4/S2BYe2f7T7I/AAAAAAAAAcs/9wKXUIhYj94/s72-c/MySparklines.bmp' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-6268541533651200325</id><published>2009-12-10T09:26:00.000-08:00</published><updated>2009-12-10T10:26:39.662-08:00</updated><title type='text'>Scripts for the EC2 !</title><content type='html'>&lt;div&gt; &lt;div&gt; &lt;p&gt;Hi all,&lt;/p&gt; &lt;p&gt;I’m currently deep into EC2, mixing cloud computing and BI. Very interesting, awesome. As you know, Amazon released two interesting features a few days ago : &lt;/p&gt; &lt;ul&gt; &lt;li&gt;you can store an instance image in EBS, not in S3. You will only pay for EBS.  &lt;li&gt;you run and stop an instance, instead of just launching it and terminate it. The stop feature halts the instance and stops the money counter. Once you decide to run it, you will find your machine back with all the customization you did before. Note : you have to suscribe to Elastic IP in order to have your external ip adress unchanged. &lt;/li&gt;&lt;/ul&gt;&lt;br&gt; &lt;p&gt;As I said, I’m now deep into EC2, making prototypes around open source BI. That’s why I started to build my own tool box. Today I want to share two scripts with you. These scripts were written to start and stop instances from EBS backed AMIs : &lt;/p&gt; &lt;ul&gt; &lt;li&gt;stop instance script : use it to stop a specific instance (previously started). I added some extra infos about the instance to be stopped. Look at the command output below. &lt;/li&gt;&lt;/ul&gt;&lt;img style="text-align: center; margin: 0px auto 10px; width: 400px; display: block; height: 198px; cursor: hand" id="BLOGGER_PHOTO_ID_5413667056765252946" border="0" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/SyE1gVq6NVI/AAAAAAAAAcc/0B8ybF5mIqc/s400/stop_instance.JPG"&gt; &lt;ul&gt; &lt;li&gt;start instance script : use it to start specific instance (previously stopped). I added some extra infos about the instance to be stopped. Look at the command output below. &lt;/li&gt;&lt;/ul&gt;&lt;img style="text-align: center; margin: 0px auto 10px; width: 400px; display: block; height: 208px; cursor: hand" id="BLOGGER_PHOTO_ID_5413667267723465394" border="0" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/SyE1snjRErI/AAAAAAAAAck/7LSE4EZIN0U/s400/start_instance.JPG"&gt;&lt;br&gt; &lt;p&gt;&lt;/p&gt; &lt;p&gt;&lt;/p&gt; &lt;p&gt;Here is the stop script : &lt;/p&gt;&lt;pre style="border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; padding-bottom: 5px; line-height: 14px; background-color: #eee; padding-left: 5px; width: 97.46%; padding-right: 5px; font-family: andale mono, lucida console, monaco, fixed, monospace; height: 1053px; color: #000000; font-size: 10px; overflow: auto; border-top: #999999 1px dashed; border-right: #999999 1px dashed; padding-top: 5px"&gt;&lt;code&gt;&lt;p&gt;#!/bin/bash&lt;br&gt;#Script by Vincent Teyssier-12/2009 &lt;p&gt;speeder_spinner()&lt;br&gt;{&lt;br&gt;PROC=$1;COUNT=0&lt;br&gt;while [ -d /proc/$PROC ];do&lt;br&gt;echo -ne '/\b' ; sleep 0.05&lt;br&gt;echo -ne '-\b' ; sleep 0.05&lt;br&gt;echo -ne '\\\b' ; sleep 0.05&lt;br&gt;echo -ne '|\b' ; sleep 0.05&lt;br&gt;done&lt;br&gt;} &lt;p&gt;get_ami_infos()&lt;br&gt;{&lt;br&gt;X=0 &lt;p&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for ELEMENT in `ec2-describe-instances $INSTANCE`&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MY_ARRAY[$X]=$ELEMENT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ((X = X + 1))&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; done &lt;p&gt;echo " "&lt;br&gt;echo "-- STOPPING INSTANCE --"&lt;br&gt;echo -e "Instance :""\t\033[1m$INSTANCE\033[0m"&lt;br&gt;echo -e "From AMI :""\t${MY_ARRAY[6]}"&lt;br&gt;echo -e "Desc&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :""\t${MY_ARRAY[3]}"&lt;br&gt;echo -e "Config&amp;nbsp;&amp;nbsp; :""\t${MY_ARRAY[10]}"&lt;br&gt;echo -e "OS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :""\t${MY_ARRAY[13]}"&lt;br&gt;echo "------------------------"&lt;br&gt;echo " "&lt;br&gt;} &lt;p&gt;get_stopping_result()&lt;br&gt;{&lt;br&gt;# Stop the ESB backed instance&lt;br&gt;Y=0&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for ELEMENT2 in `ec2-stop-instances $INSTANCE`&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MY_ARRAY2[$Y]=$ELEMENT2&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ((Y = Y + 1))&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; done &lt;p&gt;echo " "&lt;br&gt;echo "-- RESULT FROM STOPPING COMMAND --"&lt;br&gt;echo -e "Instance&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :""\t${MY_ARRAY2[1]}"&lt;br&gt;echo -e "Initial state :""\t\033[1m${MY_ARRAY2[2]}\033[0m"&lt;br&gt;echo -e "New state&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :""\t\033[1m${MY_ARRAY2[3]}\033[0m"&lt;br&gt;echo "----------------------------------"&lt;br&gt;echo " "&lt;br&gt;} &lt;p&gt;get_instance_status()&lt;br&gt;{&lt;br&gt;status=${MY_ARRAY2[3]} &lt;p&gt;while [ "$status" == "${MY_ARRAY2[3]}" ]&lt;br&gt;do&lt;br&gt;#echo $status&lt;br&gt;status=`ec2-describe-instances $INSTANCE | cut -f6 | sed -n 2p`&lt;br&gt;done &lt;p&gt;echo " "&lt;br&gt;echo "-- INSTANCE STATUS --"&lt;br&gt;echo -e "Instance :""\t$INSTANCE"&lt;br&gt;echo -e "From AMI :""\t\033[1m$status\033[0m"&lt;br&gt;echo "------------------------"&lt;br&gt;echo " "&lt;br&gt;} &lt;p&gt;#MAIN&lt;br&gt;INSTANCE=$1 #get params, more to come&lt;br&gt;echo "Please wait, gathering infos "&lt;br&gt;get_ami_infos &amp;amp;&lt;br&gt;speeder_spinner $! &lt;p&gt;echo "Please wait, stopping the instance "&lt;br&gt;get_stopping_result &amp;amp;&lt;br&gt;speeder_spinner $! &lt;p&gt;echo "Please wait, doing the job "&lt;br&gt;get_instance_status &amp;amp;&lt;br&gt;speeder_spinner $! &lt;p&gt;exit &lt;/p&gt;&lt;/font&gt;&lt;/pre&gt;&lt;br&gt;&lt;p&gt;&lt;/code&gt;And here is the start script : &lt;/p&gt;&lt;pre style="border-bottom: #999999 1px dashed; border-left: #999999 1px dashed; padding-bottom: 5px; line-height: 14px; background-color: #eee; padding-left: 5px; width: 97.46%; padding-right: 5px; font-family: andale mono, lucida console, monaco, fixed, monospace; height: 1094px; color: #000000; font-size: 10px; overflow: auto; border-top: #999999 1px dashed; border-right: #999999 1px dashed; padding-top: 5px"&gt;&lt;code&gt;&lt;p&gt;#!/bin/bash&lt;br&gt;#Script by Vincent Teyssier-12/2009 &lt;p&gt;speeder_spinner()&lt;br&gt;{&lt;br&gt;PROC=$1;COUNT=0&lt;br&gt;while [ -d /proc/$PROC ];do&lt;br&gt;echo -ne '/\b' ; sleep 0.05&lt;br&gt;echo -ne '-\b' ; sleep 0.05&lt;br&gt;echo -ne '\\\b' ; sleep 0.05&lt;br&gt;echo -ne '|\b' ; sleep 0.05&lt;br&gt;done&lt;br&gt;} &lt;p&gt;get_ami_infos()&lt;br&gt;{&lt;br&gt;X=0&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for ELEMENT in `ec2-describe-instances $INSTANCE`&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MY_ARRAY[$X]=$ELEMENT&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ((X = X + 1))&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; done&lt;br&gt;echo " "&lt;br&gt;echo "-- STARTING INSTANCE --"&lt;br&gt;echo -e "Instance :""\t\033[1m$INSTANCE\033[0m"&lt;br&gt;echo -e "From AMI :""\t${MY_ARRAY[6]}"&lt;br&gt;echo -e "Desc&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :""\t${MY_ARRAY[3]}"&lt;br&gt;echo -e "Config&amp;nbsp;&amp;nbsp; :""\t${MY_ARRAY[10]}"&lt;br&gt;echo -e "OS&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :""\t${MY_ARRAY[13]}"&lt;br&gt;echo "------------------------"&lt;br&gt;echo " "&lt;br&gt;} &lt;p&gt;get_starting_result()&lt;br&gt;{&lt;br&gt;# Start the ESB backed instance&lt;br&gt;Y=0&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp; for ELEMENT2 in `ec2-start-instances $INSTANCE`&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; do&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; MY_ARRAY2[$Y]=$ELEMENT2&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; ((Y = Y + 1))&lt;br&gt;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; done&lt;br&gt;echo " "&lt;br&gt;echo "-- RESULT FROM STOPPING COMMAND --"&lt;br&gt;echo -e "Instance&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :""\t${MY_ARRAY2[1]}"&lt;br&gt;echo -e "Initial state :""\t${MY_ARRAY2[2]}"&lt;br&gt;echo -e "New state&amp;nbsp;&amp;nbsp;&amp;nbsp;&amp;nbsp; :""\t\033[1m${MY_ARRAY2[3]}\033[0m"&lt;br&gt;echo "----------------------------------"&lt;br&gt;echo " "&lt;br&gt;} &lt;p&gt;get_instance_status()&lt;br&gt;{&lt;br&gt;status=${MY_ARRAY2[3]}&lt;br&gt;#last_status=${MY_ARRAY2[3]} &lt;p&gt;while [ "$status" == "${MY_ARRAY2[3]}" ]&lt;br&gt;do&lt;br&gt;#echo $status&lt;br&gt;status=`ec2-describe-instances $INSTANCE | cut -f6 | sed -n 2p`&lt;br&gt;done &lt;p&gt;echo " "&lt;br&gt;echo "-- INSTANCE STATUS --"&lt;br&gt;echo -e "Instance :""\t$INSTANCE"&lt;br&gt;echo -e "From AMI :""\t\033[1m$status\033[0m"&lt;br&gt;echo "------------------------"&lt;br&gt;echo " "&lt;br&gt;} &lt;p&gt;#MAIN&lt;br&gt;INSTANCE=$1 #get params, more to come&lt;br&gt;echo "Please wait, gathering infos"&lt;br&gt;get_ami_infos &amp;amp;&lt;br&gt;speeder_spinner $! &lt;p&gt;echo "Please wait, starting the instance"&lt;br&gt;get_starting_result &amp;amp;&lt;br&gt;speeder_spinner $! &lt;p&gt;echo "Please wait, updating status"&lt;br&gt;get_instance_status &amp;amp;&lt;br&gt;speeder_spinner $!&lt;br&gt;exit &lt;/p&gt;&lt;/font&gt;&lt;/pre&gt;&lt;br&gt;&lt;p&gt;&lt;/code&gt;&lt;code&gt;&lt;font face="Verdana"&gt;Enjoy and give me feed back about your EC2 adventures.&lt;/font&gt;&lt;/code&gt;&lt;/p&gt;&lt;/div&gt;&lt;/div&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-6268541533651200325?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/6268541533651200325/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=6268541533651200325' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6268541533651200325'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6268541533651200325'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/12/scripts-for-ec2.html' title='Scripts for the EC2 !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/SyE1gVq6NVI/AAAAAAAAAcc/0B8ybF5mIqc/s72-c/stop_instance.JPG' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-6565632717143712948</id><published>2009-12-04T03:33:00.001-08:00</published><updated>2009-12-04T03:33:12.905-08:00</updated><title type='text'>It’s not open source, but it’s cloud related</title><content type='html'>&lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;Microsoft SQL Server 2008 + analytic functions are available on the cloud. Nicely done, I have to admit. Speed has still to be improved.&lt;/p&gt;  &lt;p&gt;If you want to have a look and play with some analytics, datamining etc … click &lt;a href="http://www.sqlserverdatamining.com/cloud/"&gt;here&lt;/a&gt;. Datasets are already available for playing. It’s Public access, no credentials. You can also play with it by using an Excel plugin.&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/SxjzcJ28wiI/AAAAAAAAAcM/VsU0lUsk23w/s1600-h/image%5B4%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/SxjzcuZVPmI/AAAAAAAAAcQ/H69UvNsRg30/image_thumb%5B2%5D.png?imgmax=800" width="445" height="288" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/SxjzdZw-cWI/AAAAAAAAAcU/eXw8UYTleeQ/s1600-h/image%5B9%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: block; float: none; margin-left: auto; border-top: 0px; margin-right: auto; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/Sxjzd5ARpEI/AAAAAAAAAcY/HchoXL3YKzo/image_thumb%5B5%5D.png?imgmax=800" width="448" height="290" /&gt;&lt;/a&gt;&lt;/p&gt;  &lt;p&gt;Have a nice play.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-6565632717143712948?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/6565632717143712948/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=6565632717143712948' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6565632717143712948'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6565632717143712948'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/12/its-not-open-source-but-its-cloud.html' title='It’s not open source, but it’s cloud related'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh4.ggpht.com/_hTlcWbt-BP4/SxjzcuZVPmI/AAAAAAAAAcQ/H69UvNsRg30/s72-c/image_thumb%5B2%5D.png?imgmax=800' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4649077864967501429</id><published>2009-12-04T02:10:00.001-08:00</published><updated>2009-12-04T02:10:51.891-08:00</updated><title type='text'>New and very interesting features on EC2 !!</title><content type='html'>&lt;p&gt;Hi all,&lt;/p&gt;  &lt;p&gt;I received this notification from Amazon this morning. New sexy features like booting from ESB snapshots and a new API to create images (no more command line !).&lt;/p&gt;  &lt;p&gt;Please read below (part of the Amazon communication).&lt;/p&gt;  &lt;p&gt;&lt;b&gt;Amazon EC2 Boot from Amazon EBS&lt;/b&gt;     &lt;br /&gt;Amazon EC2 has also announced the ability to boot instances directly from Amazon EBS snapshots, providing significantly increased flexibility in how customers can manage their instances. You can still save an Amazon Machine Image (AMI) in an Amazon S3 bucket and boot it from the local instance store, but you can now also choose to save AMIs as Amazon EBS snapshots and boot directly from an Amazon EBS volume. When an instance is booted from an Amazon EBS snapshot, the root partition of the instance is created on an Amazon EBS volume. Instances booted from Amazon EBS volumes can be stopped and later restarted, preserving any of the state that is saved to your volume and allowing you to modify some properties of your instances while it is stopped. For example, you can change your instance size or update the kernel it is using, or attach your root partition to a different running instance, making it easier to do debugging when you are creating new boot images. When booting from an Amazon EBS volume, AMIs and root partitions are no longer limited to 10GB, but can be up to 1TB in size, enabling significantly more complex images. Additionally, you are not charged for stopped instance hours and you will only incur charges for your Amazon EBS volumes while your instance is stopped, allowing you to reduce your Amazon EC2 costs when you do not need your instances running. Customers can now use a newly launched API that makes it easy to bundle images without using the command line tools, and can also take advantage of the fact that the content of an Amazon EBS volume is available to the instance immediately on volume creation which can lead to much faster instance boot times. For more details on this new addition to Amazon EC2, please see the &lt;a href="http://www.amazon.com/gp/r.html?R=CW4KWOE2FY3J&amp;amp;C=3K3RFVIMK021I&amp;amp;H=ZAMWYNMNGEMZ5ZMOYSQGYODYCMKA&amp;amp;T=C&amp;amp;U=http%3A%2F%2Fec2-downloads.s3.amazonaws.com%2FBootFromEBSGSGGuide.pdf"&gt;Boot from Amazon EBS Feature Guide&lt;/a&gt;.&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4649077864967501429?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4649077864967501429/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4649077864967501429' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4649077864967501429'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4649077864967501429'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/12/new-and-very-interesting-features-on.html' title='New and very interesting features on EC2 !!'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4655029932587344114</id><published>2009-12-01T13:14:00.001-08:00</published><updated>2009-12-01T13:14:43.461-08:00</updated><title type='text'>Pentaho Solutions book by Roland Bouman</title><content type='html'>&lt;p&gt;&lt;font face="Arial"&gt;Hi all !&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;Today’s book review is about “&lt;strong&gt;Pentaho Solutions – Business Intelligence and Datawarehousing with Pentaho and MySQL&lt;/strong&gt;”, written by Roland Bouman and Jos Van Dongen. Thanks to Roland and Wiley for my own copy.&lt;/font&gt;&lt;/p&gt;  &lt;p align="center"&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/SxWHQBZObNI/AAAAAAAAAbU/hS4qrHOgAyk/s1600-h/Pentaho_Solutions_Cover%5B3%5D.jpg"&gt;&lt;font face="Arial"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="484326 cover.indd" border="0" alt="484326 cover.indd" src="http://lh6.ggpht.com/_hTlcWbt-BP4/SxWHQnkmxrI/AAAAAAAAAbY/TBm4g1tynuQ/Pentaho_Solutions_Cover_thumb%5B1%5D.jpg?imgmax=800" width="223" height="277" /&gt;&lt;/font&gt;&lt;/a&gt;&lt;font face="Arial"&gt;ISBN: 978-0-470-48432-6 &lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;&lt;strong&gt;Roland&lt;/strong&gt; is an IT expert, ranging from web application development and business process analysis to business intelligence. He worked for Inter Access, MySQL AB, Sun Microsystem and is now working for Strukton Rail. He co-authored the &lt;/font&gt;&lt;a href="http://store.vervante.com/c/v/595352502.html"&gt;&lt;font face="Arial"&gt;MySQL Cluster 5.1 Certification Study Guide&lt;/font&gt;&lt;/a&gt;&lt;font face="Arial"&gt;. Please, have a look to his &lt;/font&gt;&lt;a href="http://rpbouman.blogspot.com/"&gt;&lt;font face="Arial"&gt;blog&lt;/font&gt;&lt;/a&gt;&lt;font face="Arial"&gt;.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;&lt;strong&gt;Jos&lt;/strong&gt; is a BI expert for more than 15 years. He is also a well know author and a presenter. After a long career in BI, he created his own consulting business in 1998 : &lt;a href="http://www.tholis.com"&gt;Tholis Consulting&lt;/a&gt;. He is also covering BI developments for the Dutch Database Magazine.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;This book is really an impressive work, I mean it. I’m quite used (and sometimes bored) to read technical books about BI, datamining, data management, etc … but this one has its own sound, a familiar sound. It’s like having Roland behind your shoulder for a private master class on Pentaho tools. Within some short minutes, you have a complete Pentaho system running on your server/workstation and ready for testing, developping, trying features … well, doing anything you always wanted to do with Pentaho. Roland and Jos have has succeeded within the challenge of writting a book for different kind of readers : &lt;/font&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;font face="Arial"&gt;&lt;strong&gt;The complete newbie&lt;/strong&gt; in BI who wants to learn the basics of our datawarehousing and decision support, learn more about BI processes, tools, approaches, data modelling, datamining … with Pentaho tools. &lt;/font&gt;&lt;/li&gt;    &lt;li&gt;&lt;font face="Arial"&gt;&lt;strong&gt;The expert in BI&lt;/strong&gt; architecture design and implementation, who wants to learn how to deploy a Pentaho solution, how to set up a complete ETL hub with PDI, create PDI clusters … &lt;/font&gt;&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;font face="Arial"&gt;Let’s have a look to the book agenda : &lt;/font&gt;&lt;/p&gt;  &lt;ol&gt;   &lt;li&gt;&lt;strong&gt;&lt;font face="Arial"&gt;Getting started with Pentaho&lt;/font&gt;&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;&lt;font face="Arial"&gt;Basics, from first installation to a complete and detailed view on the Pentaho stack.&lt;/font&gt; &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;&lt;font face="Arial"&gt;Dimensional modelling and datawarehouse design&lt;/font&gt;&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;&lt;font face="Arial"&gt;Learn (or refresh your knowledge) the dimensional modelling used in datawarehouses.&lt;/font&gt; &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;&lt;font face="Arial"&gt;ETL and data integration&lt;/font&gt;&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;&lt;font face="Arial"&gt;Discover PDI and learn how to developp transformations. From beginners to advanced users.&lt;/font&gt; &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt;    &lt;li&gt;&lt;strong&gt;&lt;font face="Arial"&gt;Business Intelligence Applications&lt;/font&gt;&lt;/strong&gt;       &lt;ul&gt;       &lt;li&gt;&lt;font face="Arial"&gt;Learn everything on metadata management, scheduling, bursting, dashboards, OLAP and datamining.&lt;/font&gt; &lt;/li&gt;     &lt;/ul&gt;   &lt;/li&gt; &lt;/ol&gt;  &lt;p&gt;&lt;font face="Arial"&gt;This book will help you to :&amp;#160; &lt;/font&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;font face="Arial"&gt;Discover Pentaho concepts and approach. &lt;/font&gt;&lt;/li&gt;    &lt;li&gt;&lt;font face="Arial"&gt;Refresh your knowledge on dimensional modeling and data warehousing design (Kimball’s star schemas).&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;&lt;font face="Arial"&gt;Develop data transformations, jobs, decision streams with Pentaho Data Integration (PDI, aka Kettle). &lt;/font&gt;&lt;/li&gt;    &lt;li&gt;&lt;font face="Arial"&gt;Use advanced and powerfull PDI features like variables, remote execution and clustering.&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;&lt;font face="Arial"&gt;Make complete design and deploy reports / charts using Pentaho Report Designer.&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;&lt;font face="Arial"&gt;Take advantage with OLAP engine and create flexible pivot tables with typical drill up/drill down features using Pentaho Analysis Services.&lt;/font&gt; &lt;/li&gt;    &lt;li&gt;&lt;font face="Arial"&gt;Search for patterns in your data and make your first steps in datamining using Pentaho data mining.&lt;/font&gt; &lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;font face="Arial"&gt;I recommand this book to anyone involved in Business Intelligence and Open Source. If you are thinking about trying some Pentaho software (as a new project or with coexistence with your actual and proprietary – expensive - BI stack), this book is definitely for you. You will stop spending time reading forums, tech reviews, looking for feedback (frequent pains with open source …) and will access directly the knowledge you need. If you are currently developing an open source BI project with Pentaho, this book will help you to implement the whole stack with best practices and deliver a world class BI solution for a low TCO.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;If you are a newbie into Business Intelligence, you will be pleased to discover a book you can easily understand thanks to Roland and Jon’s talent for popularizing datawarehouse fundamentals and technology. This book will go along with you and ease your work on BI and Pentaho, the most exciting open source BI suite available today.&lt;/font&gt;&lt;/p&gt;  &lt;p&gt;&lt;font face="Arial"&gt;You can buy this book on Amazon at an attractive price, &lt;/font&gt;&lt;a href="http://www.amazon.com/Pentaho-Solutions-Business-Intelligence-Warehousing/dp/0470484322"&gt;&lt;font face="Arial"&gt;here&lt;/font&gt;&lt;/a&gt;&lt;font face="Arial"&gt;.&lt;/font&gt;&lt;/p&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4655029932587344114?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4655029932587344114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4655029932587344114' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4655029932587344114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4655029932587344114'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/12/pentaho-solutions-book-by-roland-bouman.html' title='Pentaho Solutions book by Roland Bouman'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://lh6.ggpht.com/_hTlcWbt-BP4/SxWHQnkmxrI/AAAAAAAAAbY/TBm4g1tynuQ/s72-c/Pentaho_Solutions_Cover_thumb%5B1%5D.jpg?imgmax=800' height='72' width='72'/><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1939733381959425102</id><published>2009-11-23T04:43:00.000-08:00</published><updated>2010-03-09T01:34:48.339-08:00</updated><title type='text'>PDI clusters – Part 1 : How to build a simple PDI cluster.</title><content type='html'>&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 100%"&gt;&lt;strong&gt;Hi all !&lt;/strong&gt;&lt;/span&gt;     &lt;br /&gt;    &lt;br /&gt;&lt;/span&gt;&lt;span style="font-family: arial; font-size: 85%"&gt;I would like to start a collection of posts dedicated to PDI / Kettle clustering.    &lt;br /&gt;After surfing the web, I noticed a lot of people is asking how to build PDI clusters, how to test and deploy them in a production environment. Also a lot of questions about Carte usage. So, I will try to make some tutorials about this fantastic feature offered by PDI. &lt;/span&gt;  &lt;br /&gt;&lt;span style="font-family: arial"&gt;At that time, I want to recommend you a book : “&lt;strong&gt;Pentaho Solutions – Business Intelligence and Datawarehousing with Pentaho and MySQL&lt;/strong&gt;”, written by Roland Bouman and Jos Van Dongen. This book is a fantastic source of knowledge about Pentaho and will help you understanding the Pentaho ecosystem and tools. My complete review about this book &lt;/span&gt;&lt;a href="http://open-bi.blogspot.com/2009/12/pentaho-solutions-book-by-roland-bouman.html"&gt;&lt;span style="font-family: arial"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt;.    &lt;br /&gt;    &lt;br /&gt;&lt;u&gt;&lt;span style="font-size: 100%"&gt;&lt;strong&gt;Agenda &lt;/strong&gt;&lt;/span&gt;&lt;/u&gt;&lt;/span&gt;  &lt;br /&gt;  &lt;ul&gt;   &lt;ul&gt;     &lt;ul&gt;       &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #000000; font-size: 85%"&gt;How to build a simple PDI cluster (1 master, 2 slaves). This post.&lt;/span&gt; &lt;/span&gt;&lt;/li&gt;        &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #000000; font-size: 85%"&gt;How to build a simple PDI server on Amazon Cloud Computing (EC2).&lt;/span&gt; &lt;/span&gt;&lt;/li&gt;        &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #000000; font-size: 85%"&gt;How to build a PDI cluster on Amazon Cloud Computing (EC2).&lt;/span&gt; &lt;/span&gt;&lt;/li&gt;        &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #000000; font-size: 85%"&gt;How to build a dynamic PDI cluster on Amazon Cloud Computing (EC2).&lt;/span&gt; &lt;/span&gt;&lt;/li&gt;     &lt;/ul&gt;   &lt;/ul&gt; &lt;/ul&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;This first post is about building a simple PDI cluster, composed of 1 master and 2 slaves, in a virtualized environment (vmware).        &lt;br /&gt;After this article, you will be able to build your PDI cluster and play with it on a simple laptop of desktop (3 giga of ram is a must have).         &lt;br /&gt;        &lt;br /&gt;&lt;/span&gt;&lt;u&gt;&lt;span style="font-size: 100%"&gt;&lt;b&gt;Why PDI clustering ?&lt;/b&gt; &lt;/span&gt;&lt;/u&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;Imagine you have to make some very complex transformations and finally load a huge amout of data into your target warehouse.      &lt;br /&gt;You have two solutions to handle this task : &lt;/span&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;SCALE UP&lt;/strong&gt; : Build a strong unique PDI server with a lot of RAM and CPU. This unique server (let’s call it an ETL hub) will handle all the work by itself. &lt;/span&gt;&lt;/li&gt;    &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;SCALE OUT&lt;/strong&gt; : Create an array of smaller servers. Each of them will handle a small part of the work. &lt;/span&gt;&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;Clustering is scaling out. You divide the global workload and distribute it accross many nodes, these smaller tasks will be processed in parallel (or near parallel). The global performance equals the slowest node of your cluster.      &lt;br /&gt;If we consider PDI, a cluster is composed of :&lt;/span&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;ONE MASTER :&lt;/strong&gt; this node is acting like a conductor, assigning the sub-tasks to the slaves and merging the results coming back from the slaves when the sub tasks are done. &lt;/span&gt;&lt;/li&gt;    &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;SLAVES :&lt;/strong&gt; from 1 to many. The slaves are the nodes that will really do the job, process the tasks and then send back the results to the master for reconciliation. &lt;/span&gt;&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;Let's have a look to this schema. You can see the typical architecture around a PDI cluster : data sources, the master, the registered slaves and the target warehouse. The more PDI slaves you implement, the better parallelism / performance you have.      &lt;br /&gt;      &lt;br /&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/SwqEQBKjwHI/AAAAAAAAAaY/UVvqAQ5fZB0/s1600/PDI+cluster+general.png"&gt;&lt;span style="font-family: arial"&gt;&lt;img style="width: 459px; display: block; float: none; height: 281px; margin-left: auto; cursor: hand; margin-right: auto" id="BLOGGER_PHOTO_ID_5407279713337196658" border="0" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/SwqEQBKjwHI/AAAAAAAAAaY/UVvqAQ5fZB0/s400/PDI+cluster+general.png" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt;      &lt;br /&gt;      &lt;br /&gt;&lt;span style="font-size: 100%"&gt;&lt;b&gt;&lt;u&gt;The virtual cluster&lt;/u&gt;&lt;/b&gt; &lt;/span&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;Let's build our first virtual cluster now. &lt;/span&gt;&lt;span style="font-size: 85%"&gt;First, you will need vmware or virtual box (or virtual PC from Ms). I use vmware, so from now I will speak about vmware only, but you can transpose easily.&lt;/span&gt; I decided to use Suse Enterprise Linux 11 for these virtual machines. It is a personal choice, but you can do the same with Fedora, Ubuntu, etc …       &lt;br /&gt;      &lt;br /&gt;&lt;span style="font-size: 85%"&gt;Let's build 3 virtual machines : &lt;/span&gt;&lt;/span&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;&lt;strong&gt;The Master&lt;/strong&gt; : Suse Enterprise Linux 11 - this machine will host PDI programs and PDI repository, a mysql database with phpmyadmin (optional).&lt;/span&gt; &lt;/span&gt;&lt;/li&gt;    &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;&lt;strong&gt;The Slave 1&lt;/strong&gt; : Suse Enterprise Linux 11 - this machine will host PDI programs and will run carte.&lt;/span&gt; &lt;/span&gt;&lt;/li&gt;    &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;&lt;strong&gt;The Slave 2&lt;/strong&gt; : Suse Enterprise Linux 11 - this machine will host PDI programs and will run carte.&lt;/span&gt; &lt;/span&gt;&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;As you can see below, the three virtual machines are located on the same subnet, using fixed IP adresses ranging from &lt;strong&gt;192.168.77.128 (Master) to 192.168.77.130 (Slave 2)&lt;/strong&gt;. &lt;/span&gt;&lt;span style="font-size: 85%"&gt;On the vmware side, I used a &amp;quot;host only&amp;quot; network connection. You have to be able to ping your master from the two slaves, ping the two slaves from the master and also ping the three virtual machines from your host.&lt;/span&gt; &lt;span style="font-size: 85%"&gt;The easiest way is to disable the firewall on each Suse machine because we don't need security for this exercise.&lt;/span&gt; &lt;/span&gt;&lt;/p&gt; &lt;a href="http://2.bp.blogspot.com/_hTlcWbt-BP4/SwqI86U6WRI/AAAAAAAAAag/MUhRiTPwbtg/s1600/PDI+cluster+configs.png"&gt;&lt;span style="font-family: arial"&gt;&lt;img style="width: 434px; display: block; float: none; height: 309px; margin-left: auto; cursor: hand; margin-right: auto" id="BLOGGER_PHOTO_ID_5407284882642196754" border="0" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/SwqI86U6WRI/AAAAAAAAAag/MUhRiTPwbtg/s400/PDI+cluster+configs.png" width="446" height="318" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt; &lt;/span&gt;  &lt;p&gt;&lt;u&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;&lt;span style="font-size: 100%"&gt;The Master configuration&lt;/span&gt;&lt;/strong&gt; &lt;/span&gt;&lt;/u&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;As I said, the Master virtual machine is hosting PDI, a mysql database and the PDI repository. But let's have a closer look to the internal configuration, especially with the Carte program config files.      &lt;br /&gt;From Pentaho wiki, Carte is &amp;quot;a simple web server that allows you to execute transformations and jobs remotely&amp;quot;. Carte is a major component when building clusters because this program is a kind of a middleware between the Master and the Slave servers : the slaves will register themselves with the Master by notifying they are ready to receive tasks to process. On top of that, you can reach Carte web service to remotely monitor, start and stop transformations / jobs. You can learn more on Carte from the &lt;/span&gt;&lt;a href="http://wiki.pentaho.com/display/EAI/Carte+User+Documentation"&gt;&lt;span style="font-family: arial"&gt;Pentaho wiki&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt;.      &lt;br /&gt;      &lt;br /&gt;The picture below explains the registration process between slaves and a master. &lt;/span&gt;&lt;/p&gt; &lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/SxbZbPrjEgI/AAAAAAAAAbc/uhzXxOlySvM/s1600-h/Master%20Slave%20registration%5B5%5D.png"&gt;&lt;span style="font-family: arial"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="Master Slave registration" border="0" alt="Master Slave registration" src="http://lh5.ggpht.com/_hTlcWbt-BP4/SxbZbzJnAFI/AAAAAAAAAbg/u66YYKEOv1M/Master%20Slave%20registration_thumb%5B3%5D.png?imgmax=800" width="454" height="339" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt;    &lt;br /&gt;    &lt;br /&gt;On the Master, two files are very important. The files are configuration files, written in XML. They are self explanatory, easy to read : &lt;/span&gt;  &lt;ul&gt;   &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;Repositories.xml&lt;/strong&gt; : your slave must have a valid repositories.xml file, updated with all informations about your repository connexion (hosted on the Master for this example). See below for my config file. &lt;/span&gt;&lt;/li&gt;    &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;Carte xml configuration file&lt;/strong&gt; : located in /pwd/, this file contains only one section for defining the cluster master (ip, port, credentials). In the /pwd/ directory, you will find some example configuration files. Pick one, for instance the one labelled &amp;quot;8080&amp;quot; and apply the changes described below. I will keep the port 8080 for communication between the Master and the two Slaves. See below for my config file. &lt;/span&gt;&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #cc0000"&gt;Repositories.xml on Master&lt;/span&gt;&lt;/span&gt;&lt;/p&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #cc0000"&gt;&lt;/span&gt;&lt;/span&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5YVaOD5r4I/AAAAAAAAAfQ/opyruckrVs8/s1600-h/image%5B8%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5YVa9wUJ_I/AAAAAAAAAfU/QklpYOPqZiE/image_thumb%5B4%5D.png?imgmax=800" width="637" height="513" /&gt;&lt;/a&gt;&amp;#160; &lt;br /&gt;&lt;span style="font-family: arial; color: #cc0000; font-size: 85%"&gt;Carte xml configuration file on Master&lt;/span&gt;     &lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5YVbu1THPI/AAAAAAAAAfY/Csq3Wz8xuBU/s1600-h/image%5B16%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVcLy-T-I/AAAAAAAAAfc/CjV1fH7A0io/image_thumb%5B10%5D.png?imgmax=800" width="631" height="212" /&gt;&lt;/a&gt; &lt;/p&gt;  &lt;br /&gt;&lt;span style="font-size: 85%"&gt;&lt;u&gt;&lt;span style="font-family: arial; font-size: 100%"&gt;&lt;strong&gt;The Slave configuration&lt;/strong&gt;&lt;/span&gt;&lt;/u&gt;&lt;/span&gt;   &lt;br /&gt;  &lt;br /&gt;&lt;font size="2"&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;As I said, the two Slave virtual machines are hosting PDI. Now let's have a look on how to configure some very important files, the same files we changed for the Master.&lt;/span&gt;&lt;/span&gt; &lt;/font&gt;  &lt;p&gt;&lt;/p&gt;  &lt;ul&gt;   &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;Repositories.xml&lt;/strong&gt; : your slave must have a valid repositories.xml file, updated with all informations about your repository (hosted on the Master for this example). See below for my config file. &lt;/span&gt;      &lt;br /&gt;&lt;/li&gt;    &lt;li&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;Carte xml configuration file&lt;/strong&gt; : located in /pwd/, this file contains two sections : the master section and the slave section. In the /pwd/ directory, you will find some example configuration files. Pick one, for instance the &amp;quot;8080&amp;quot; one and apply the changes described below. Note that the default user and password for Carte is cluster / cluster. Here again the file is self explanatory, see below for my config file. &lt;/span&gt;&lt;/li&gt; &lt;/ul&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #ff0000"&gt;Repositories.xml on Slave1 and Slave2 :        &lt;br /&gt;&lt;/span&gt;Same as for the Master, see above.       &lt;br /&gt;      &lt;br /&gt;&lt;span style="color: #cc0000"&gt;Carte xml configuration file on Slave1 (note address is 192.168.77.128, don’t write “localhost” for Slave1)&lt;/span&gt; &lt;/span&gt;    &lt;br /&gt;&lt;/p&gt; &lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5YVckztSsI/AAAAAAAAAfg/BMkreYQ_1e4/s1600-h/image%5B20%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVdcrRGyI/AAAAAAAAAfk/NUqllmFxisw/image_thumb%5B12%5D.png?imgmax=800" width="626" height="348" /&gt;&lt;/a&gt;   &lt;br /&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #cc0000; font-size: 85%"&gt;Carte xml configuration file on Slave2 (note : address is 192.168.77.130, don’t write “localhost” for Slave2)&lt;/span&gt; &lt;/span&gt;  &lt;br /&gt;  &lt;br /&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5YVeO0ZjNI/AAAAAAAAAfo/FhFA8TPWnlg/s1600-h/image%5B24%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5YVfKkBKhI/AAAAAAAAAfs/MslXdxW8UGs/image_thumb%5B14%5D.png?imgmax=800" width="626" height="370" /&gt;&lt;/a&gt;   &lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;strong&gt;&lt;u&gt;&lt;span style="font-family: arial; font-size: 100%"&gt;Starting everything&lt;/span&gt;&lt;/u&gt;&lt;/strong&gt;   &lt;br /&gt;  &lt;br /&gt;  &lt;p&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;Now it is time to fire the programs. I assume you have already started mysql and your PDI repository is active and reachable by PDI.&lt;/span&gt; It is quite recommended that you work with a repository hosted on a relational db. Let's fire Carte on the Master first. The command is quite simple : &lt;em&gt;./carte.sh [xml config file]&lt;/em&gt;.&lt;/span&gt;     &lt;br /&gt;&lt;/p&gt; &lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5YVfaeDQWI/AAAAAAAAAfw/ruvfjcLsWZ4/s1600-h/image%5B28%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5YVgND38GI/AAAAAAAAAf0/kyunnHvyufo/image_thumb%5B16%5D.png?imgmax=800" width="627" height="142" /&gt;&lt;/a&gt;   &lt;br /&gt;&lt;span style="font-family: arial; font-size: 85%"&gt;This output means that your Master is running and a listener is activated on the Master adress (ip address) on port 8080.&lt;/span&gt; &lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;Now let's start the two slaves. Here again, the command is simple : &lt;em&gt;./carte.sh [xml config file]&lt;/em&gt;. &lt;/span&gt;&lt;span style="font-size: 85%"&gt;Look below the output for the Slave1, you can see that Carte has now registered Slave1 (192.168.77.129) to the master server . Everything is working fine so far.&lt;/span&gt;&lt;/span&gt;   &lt;br /&gt;  &lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVg9aDWsI/AAAAAAAAAf4/cJgJxewGilc/s1600-h/image%5B32%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVh0fntfI/AAAAAAAAAf8/6t6Ctxg1U_c/image_thumb%5B18%5D.png?imgmax=800" width="634" height="151" /&gt;&lt;/a&gt;   &lt;br /&gt;&lt;span style="font-family: arial"&gt;Finally the output for Slave2. Look below the output for the Slave2, you can see that Carte has now registered Slave2 (192.168.77.130) to the master server . Everything is fine so far here again. &lt;/span&gt;  &lt;br /&gt;  &lt;br /&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YViR9FvDI/AAAAAAAAAgA/hsqsN7SovbE/s1600-h/image%5B38%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5YVjSizQPI/AAAAAAAAAgE/gh7XMV11AU8/image_thumb%5B22%5D.png?imgmax=800" width="629" height="157" /&gt;&lt;/a&gt; &lt;code&gt;   &lt;br /&gt;&lt;span style="font-family: arial; font-size: 85%"&gt;At that point, we have a working Master and two registered slaves (Slave1 and Slave2) waiting to receive tasks from the Master. It is time, now, to create the cluster array and a PDI transformation (and a job to run it). Let's go for it.&lt;/span&gt;     &lt;br /&gt;    &lt;br /&gt;    &lt;br /&gt;    &lt;p&gt;&lt;strong&gt;&lt;u&gt;&lt;span style="font-family: arial; font-size: 100%"&gt;PDI configuration&lt;/span&gt;&lt;/u&gt;&lt;/strong&gt;&lt;/p&gt;   &lt;span style="font-family: arial"&gt;&lt;span style="color: #000000; font-size: 85%"&gt;First we have to declare the slaves previously created and started. That's pretty easy. Let's select the Explorer mode on the left pane. Do a left click on the &amp;quot;Slave &lt;/span&gt;&lt;span style="color: #000000; font-size: 85%"&gt;server&amp;quot; folder, this will pop up a new window in which you will declare Slave1 like below.&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: arial; color: #000000; font-size: 85%"&gt;      &lt;br /&gt;      &lt;br /&gt;      &lt;br /&gt;      &lt;p&gt;&lt;a href="http://2.bp.blogspot.com/_hTlcWbt-BP4/SwvwFk1mQhI/AAAAAAAAAa4/M_WnVzOco3Y/s1600/Slave+server+declaration.bmp"&gt;&lt;span style="font-family: arial; font-size: 85%"&gt;&lt;img style="width: 400px; display: block; float: none; height: 171px; margin-left: auto; cursor: hand; margin-right: auto" id="BLOGGER_PHOTO_ID_5407679756166906386" border="0" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/SwvwFk1mQhI/AAAAAAAAAa4/M_WnVzOco3Y/s400/Slave+server+declaration.bmp" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial; font-size: 85%"&gt;&amp;#160;&lt;/span&gt;&lt;span style="font-size: 85%"&gt;&lt;span style="font-family: arial"&gt;           &lt;br /&gt;            &lt;br /&gt;&lt;span style="color: #000000"&gt;Repeat &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: arial"&gt;&lt;span style="font-size: 85%"&gt;&lt;span style="color: #000000"&gt;the same operation for Slave1 and Slave2 in order to have 3 registered servers like the picture above. Don't forget to type the right ip port (we are working with 8080 since the begining of this exercise).              &lt;br /&gt;              &lt;br /&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="color: #000000; font-size: 85%"&gt;Now we have to declare the cluster. Right click on the cluster folder (next folder) and choose New. This will pop up a new window in which you will fill the cluster parameters : Just type a new name for your cluster and then click on the &amp;quot;select servers&amp;quot; button. Now choose your three servers and click ok. You will then notice your cluster is created (Master and Slave) like below.&lt;/span&gt;&lt;/span&gt;         &lt;br /&gt;&lt;/p&gt;      &lt;p&gt;&lt;a href="http://1.bp.blogspot.com/_hTlcWbt-BP4/SwvzMEarusI/AAAAAAAAAbA/zHDIlhTa8wg/s1600/Cluster+declaration.bmp"&gt;&lt;span style="font-family: arial; font-size: 85%"&gt;&lt;img style="width: 400px; display: block; float: none; height: 175px; margin-left: auto; cursor: hand; margin-right: auto" id="BLOGGER_PHOTO_ID_5407683166258051778" border="0" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/SwvzMEarusI/AAAAAAAAAbA/zHDIlhTa8wg/s400/Cluster+declaration.bmp" /&gt;&lt;/span&gt;&lt;/a&gt; &lt;/p&gt;      &lt;br /&gt;      &lt;br /&gt;      &lt;br /&gt;      &lt;p&gt;&lt;span style="font-size: 100%"&gt;&lt;span style="color: #000000"&gt;&lt;span style="font-family: arial"&gt;&lt;strong&gt;&lt;u&gt;Creating a job for testing the cluster&lt;/u&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt;&lt;code&gt;&lt;span style="color: #000000; font-size: 85%"&gt;&lt;span style="font-family: arial"&gt;For this exercice, I won't create a job but will use an existing one created by Matt Casters. This transformation is very interesting and will only read data from a flatfile and compute statistics in a target flatfile (rows/sec, throuput ...) for each slave. You can download this transformation here, the job here and the flat file &lt;/span&gt;&lt;a href="http://pagesperso-orange.fr/botools/lineitem.zip"&gt;&lt;span style="font-family: arial"&gt;here&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt; (21 Mo zipped).&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/span&gt;&lt;/span&gt;&lt;span style="font-family: arial"&gt; &lt;/span&gt;          &lt;br /&gt;          &lt;br /&gt;&lt;/span&gt;&lt;span&gt;&lt;code&gt;&lt;span style="font-family: arial"&gt;&lt;span style="color: #000000; font-size: 85%"&gt;I assume you know how link a transformation into a job. Don't forget to change the flatfile location on source (/your_path/lineitem.tbl) and on destination (/your_path/out_read_lineitems). Then, for each of the first four steps, right click and assign the cluster (you named previously, see above) to the step. You will see the caption “Cx2” on top right of each icon. &lt;/span&gt;&lt;span style="color: #000000; font-size: 85%"&gt;There is nothing else to change. Here is a snapshot of the contextual menu when assigning the cluster to the transformation steps (my PDI release is in french, so you have to look at “Clustering” instead of “Partitionnement”).&lt;/span&gt;&lt;/span&gt;&lt;/code&gt;&lt;/span&gt;         &lt;br /&gt;        &lt;br /&gt;        &lt;br /&gt;&lt;/p&gt;      &lt;p&gt;&lt;span style="font-family: arial; color: #000000; font-size: 85%"&gt;&lt;code&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/SxbZcVdH58I/AAAAAAAAAbk/Bjd8fVqWVPc/s1600-h/Clusteringsteps1.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="Clustering steps" border="0" alt="Clustering steps" src="http://lh5.ggpht.com/_hTlcWbt-BP4/SxbZdCwCUMI/AAAAAAAAAbo/ZqNd3eg4fNg/Clusteringsteps_thumb1.png?imgmax=800" width="267" height="284" /&gt;&lt;/a&gt;&lt;/code&gt;&lt;/span&gt;&lt;/p&gt;      &lt;br /&gt;      &lt;p&gt;&lt;span&gt;&lt;code&gt;&lt;span style="font-family: arial; color: #000000; font-size: 85%"&gt;Have a look to the transformation below. The caption “Cx2” on top right of the first four icons means you have assigned your cluster to run these steps. On the contrary, the JavaScript step “calc elapsed time” won’t run on the cluster but on the Master only.&lt;/span&gt;&lt;/code&gt;&lt;/span&gt;         &lt;br /&gt;&lt;/p&gt;      &lt;p&gt;&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/Swv1bmD0rJI/AAAAAAAAAbI/tcm5SS0kjSQ/s1600/Transformation.bmp"&gt;&lt;img style="width: 400px; display: block; float: none; height: 172px; margin-left: auto; cursor: hand; margin-right: auto" id="BLOGGER_PHOTO_ID_5407685632010267794" border="0" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/Swv1bmD0rJI/AAAAAAAAAbI/tcm5SS0kjSQ/s400/Transformation.bmp" /&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt;          &lt;br /&gt;&lt;code&gt;&lt;span style="font-family: arial; font-size: 85%"&gt;And have a look to the job (calling the transformation above). This is a typical job, involving a start step and the “execute transformation” step. We will start this job with Kitchen later.&lt;/span&gt;&lt;/code&gt;&lt;/span&gt;         &lt;br /&gt;&lt;/p&gt;      &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/SxbZdjXMwvI/AAAAAAAAAbs/WvFDoxaGlC8/s1600-h/MainJob3.png"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="Main Job" border="0" alt="Main Job" src="http://lh5.ggpht.com/_hTlcWbt-BP4/SxbZeBDiLuI/AAAAAAAAAbw/JLUydtK_tSQ/MainJob_thumb3.png?imgmax=800" width="426" height="185" /&gt;&lt;/a&gt; &lt;/p&gt;      &lt;br /&gt;      &lt;p&gt;&lt;span style="font-size: 100%"&gt;&lt;strong&gt;&lt;u&gt;Running everything&lt;/u&gt;&lt;/strong&gt; &lt;/span&gt;        &lt;br /&gt;        &lt;br /&gt;&lt;span style="font-size: 85%"&gt;Now it is time to run the job/transformation we made. First we will see how to run the transformation within Spoon, the PDI gui. Then we will see how to run the job (containing the transformation) with pan in the linux console and how to interpret the console output.&lt;/span&gt; &lt;/p&gt;      &lt;p&gt;&lt;span style="font-size: 85%"&gt;First, how to start the transformation within Spoon. Simply click on the green play symbol. The following window will prompt at your screen. Once again, my screen is in french, sorry for that. All you have to do/check is to click on the top right button to select the clustering execution (“Exécution en grappe” in french). I suppose you are already quite familiar with that screen so I won’t continue explaining it.&lt;/span&gt;         &lt;br /&gt;&lt;/p&gt;      &lt;p&gt;&lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/SxbZeo6cJ-I/AAAAAAAAAb0/qF5kIvCAPqU/s1600-h/Start_Transformation3.jpg"&gt;&lt;img style="border-right-width: 0px; display: block; float: none; border-top-width: 0px; border-bottom-width: 0px; margin-left: auto; border-left-width: 0px; margin-right: auto" title="Start_Transformation" border="0" alt="Start_Transformation" src="http://lh3.ggpht.com/_hTlcWbt-BP4/SxbZfFTke6I/AAAAAAAAAb4/4rMDJqOLJYQ/Start_Transformation_thumb1.jpg?imgmax=800" width="400" height="325" /&gt;&lt;/a&gt;&lt;/p&gt;      &lt;br /&gt;      &lt;p&gt;&lt;span style="font-size: 85%"&gt;Then you can run the transformation. Let’s have a look at the Spoon trace (don’t forget to display your output window in PDI, and select the Trace tab).&lt;/span&gt;         &lt;br /&gt;        &lt;br /&gt;&lt;/p&gt;     &lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVkp3EoMI/AAAAAAAAAgI/m5GXxdBJtps/s1600-h/image%5B42%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh4.ggpht.com/_hTlcWbt-BP4/S5YVmaQYqEI/AAAAAAAAAgM/8toXODECL1w/image_thumb%5B24%5D.png?imgmax=800" width="627" height="494" /&gt;&lt;/a&gt; &lt;code&gt;&lt;/code&gt;      &lt;br /&gt;&lt;span style="font-family: arial"&gt;This trace is fairly simple. First we can see that the Master (ip .128)found his two slaves (ip .129 and ip .130) and the connexion is working well. The Master and the two Slaves are communicating all along the process. As soon as the two Slaves have finished their work, we receive a notification '(All transformations in the cluster have finished”), then we can read a small summary (nb of rows).&lt;/span&gt;       &lt;br /&gt;      &lt;br /&gt;&lt;span style="font-family: arial"&gt;Let’s have a look on the Master command line (remember we started Carte by using the Linux command line). For the Master, we have a very short output. The red lines are familiar to you now, they correspond to Carte startup we did a few minutes ago. Have a look below on the green lines : these lines were printed out by Carte while the cluster was processing the job.&lt;/span&gt;       &lt;p&gt;&lt;/p&gt;     &lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5YVmw4MXiI/AAAAAAAAAgQ/uwPe_j1Tlvo/s1600-h/image%5B46%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVnnhUz-I/AAAAAAAAAgU/IiWj05AwtbI/image_thumb%5B26%5D.png?imgmax=800" width="634" height="215" /&gt;&lt;/a&gt; &lt;code&gt;       &lt;br /&gt;&lt;/span&gt;        &lt;p&gt;&lt;/p&gt;        &lt;p&gt;&lt;span style="font-family: arial; color: #000000"&gt;Let’s have a look at Slave 1 output. Here again, the red lines are coming from Carte Startup. The green lines are interesting : you can see Slave 1 receiving its portion of the job to run … and how he did it by reading rows (packets of 50000). You can also notice the step names that were processed by the Slave 1 in cluster mode : lineitem.tbl (reading flatfile), current_time (catch current time), min/max time and slave_name. If you remember well, these steps were flagged with a “Cx2'” on their icon on top right corner (see below) when you assigned your cluster to the transformation steps. &lt;/span&gt;          &lt;br /&gt;          &lt;br /&gt;&lt;/p&gt;        &lt;p&gt;&lt;span style="color: #000000"&gt;&lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/SxbZflOeBtI/AAAAAAAAAb8/0w03LB1pjHQ/s1600-h/Slaveicons4.jpg"&gt;&lt;span style="font-family: arial"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="Slave icons" border="0" alt="Slave icons" src="http://lh6.ggpht.com/_hTlcWbt-BP4/SxbZgJSf3oI/AAAAAAAAAcA/BElW69oGTUc/Slaveicons_thumb2.jpg?imgmax=800" width="147" height="56" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt; &lt;/span&gt;&lt;/span&gt;          &lt;br /&gt;          &lt;br /&gt;&lt;/p&gt;       &lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5YVoIU4G4I/AAAAAAAAAgY/x5XPBnA_oqQ/s1600-h/image%5B56%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5YVpZlypOI/AAAAAAAAAgc/CznDHqP7-NE/image_thumb%5B32%5D.png?imgmax=800" width="642" height="452" /&gt;&lt;/a&gt; &lt;code&gt;         &lt;br /&gt;          &lt;p&gt;&lt;/p&gt;          &lt;p&gt;&lt;span style="font-family: arial; color: #000000"&gt;The output for Slave 2, displayed below, is very similar to Slave 1. &lt;/span&gt;            &lt;br /&gt;&lt;/p&gt;         &lt;a href="http://lh3.ggpht.com/_hTlcWbt-BP4/S5YVq7PpzZI/AAAAAAAAAgg/aPTcHizQJR8/s1600-h/image%5B55%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVr_SvoHI/AAAAAAAAAgk/KLoYQiHhi_4/image_thumb%5B31%5D.png?imgmax=800" width="638" height="499" /&gt;&lt;/a&gt; &lt;code&gt;           &lt;br /&gt;            &lt;p&gt;&lt;/p&gt;            &lt;p&gt;&lt;span style="font-family: arial; color: #000000"&gt;That’s very funny to do ! Once you started Carte and created your cluster, you are ready to execute the job. Then you will see your linux console printing informations while the job is being executed by your slaves. This post is about understanding and creating the whole PDI cluster mecanism, I won’t talk about optimization for the moment.&lt;/span&gt;&lt;/p&gt;            &lt;br /&gt;            &lt;p&gt;&lt;span style="font-size: 100%"&gt;&lt;span style="font-family: arial; color: #000000"&gt;&lt;strong&gt;&lt;u&gt;Hey, what’s the purpose of my transformation ?&lt;/u&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/span&gt;               &lt;br /&gt;              &lt;br /&gt;&lt;span style="font-family: arial; color: #000000"&gt;As I said before, this transformation will only read records from a flatfile (lineitem.tbl) and compute performance statistics for every slave like rows/secs, throuput … The last step of your transformation will create a flatfile containing these stats. Have a look at it.&lt;/span&gt; &lt;/p&gt;           &lt;a href="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVsuM0JZI/AAAAAAAAAgo/p7fUYIXHx6U/s1600-h/image%5B61%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh5.ggpht.com/_hTlcWbt-BP4/S5YVtekB_4I/AAAAAAAAAgs/rFiF7_30B-Y/image_thumb%5B35%5D.png?imgmax=800" width="638" height="136" /&gt;&lt;/a&gt; &lt;code&gt;             &lt;br /&gt;              &lt;p&gt;&lt;span style="font-family: arial; color: #000000"&gt;Once formated with a spreadsheet tool, the stats will look like this.&lt;/span&gt; &lt;/p&gt;              &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/SxbZgqlbanI/AAAAAAAAAcE/oMpHggarXgg/s1600-h/Stat%20file%5B6%5D.png"&gt;&lt;span style="font-family: arial"&gt;&lt;img style="border-right-width: 0px; display: inline; border-top-width: 0px; border-bottom-width: 0px; border-left-width: 0px" title="Stat file" border="0" alt="Stat file" src="http://lh5.ggpht.com/_hTlcWbt-BP4/SxbZhZ81aCI/AAAAAAAAAcI/59anCmUH8Oc/Stat%20file_thumb%5B6%5D.png?imgmax=800" width="727" height="105" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family: arial"&gt; &lt;/span&gt;&lt;/p&gt;              &lt;p&gt;&lt;span style="font-family: arial; color: #000000"&gt;Don’t pay too much attention to the start_time and end_time timestamps : the time setup was not done on my three virtual machines, hence they are not in synch. You will also notice that, in the exemple above, the performances for these two slaves are not homogeneous. That’s normal, don’t forget I’m currently working on a virtualized environment built on a workstation and this tutorial is limited to demontrating how to create and configure a PDI cluster. No optimization was taken in account at that time. On a fully optimized cluster, you will have (almost) homogeneous performance.&lt;/span&gt;                 &lt;br /&gt;                &lt;br /&gt;&lt;span style="font-family: arial"&gt;&lt;span&gt;&lt;span style="font-size: 100%"&gt;&lt;span style="color: #000000"&gt;&lt;strong&gt;&lt;u&gt;Running with the linux Console&lt;/u&gt;&lt;/strong&gt; &lt;/span&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;                &lt;br /&gt;                &lt;br /&gt;&lt;span style="font-family: arial; color: #000000"&gt;If you want to execute your job from the linux command line, no problem. Kitchen is here for you. Here is the syntax for a job execution. Note : VMWARE-SLES10-32_Repo is my PDI repository running on the Master. I’m sure you are already familiar with the other parameters.&lt;/span&gt; &lt;/p&gt;             &lt;code&gt;               &lt;p&gt;&lt;a href="http://lh4.ggpht.com/_hTlcWbt-BP4/S5YVtpxHDTI/AAAAAAAAAgw/ZzRCLL_IBhs/s1600-h/image%5B76%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh6.ggpht.com/_hTlcWbt-BP4/S5YVufb-niI/AAAAAAAAAg0/5xOKsBD5ieo/image_thumb%5B42%5D.png?imgmax=800" width="642" height="91" /&gt;&lt;/a&gt;&amp;#160;&lt;span style="font-family: arial; color: #000000"&gt;&lt;font size="2"&gt;&lt;/font&gt;&lt;/span&gt;&lt;/p&gt;                &lt;p&gt;&lt;span style="font-family: arial; color: #000000"&gt;&lt;font size="2"&gt;For executing your transformation, use pan. Here is the typical command.&lt;/font&gt;&lt;/span&gt;&lt;span&gt;&lt;span&gt; &lt;/span&gt;&lt;/span&gt;&lt;span&gt;&lt;/span&gt;&lt;code&gt;&lt;/code&gt;                  &lt;p&gt;&lt;a href="http://lh6.ggpht.com/_hTlcWbt-BP4/S5YVu4ukSSI/AAAAAAAAAg4/4n0_Is7yxpY/s1600-h/image%5B75%5D.png"&gt;&lt;img style="border-bottom: 0px; border-left: 0px; display: inline; border-top: 0px; border-right: 0px" title="image" border="0" alt="image" src="http://lh3.ggpht.com/_hTlcWbt-BP4/S5YVvYJPg_I/AAAAAAAAAg8/F_mwMRbEGUY/image_thumb%5B41%5D.png?imgmax=800" width="642" height="71" /&gt;&lt;/a&gt;                     &lt;br /&gt;&lt;/span&gt;&lt;span&gt;&lt;span style="font-size: 100%"&gt;&lt;span style="font-family: arial; color: #000000"&gt;&lt;strong&gt;&lt;u&gt;Conclusion and … what’s next ?&lt;/u&gt;&lt;/strong&gt;&lt;/span&gt;&lt;/span&gt;&lt;/span&gt;                     &lt;br /&gt;                    &lt;br /&gt;&lt;span style="font-family: arial; color: #000000"&gt;Well, I hope you found here some explanations and solutions for creating basic PDI clustering. You can create more than 2 slaves is you want, the process is the same. Don’t forget to add these new slaves in the cluster definition in Spoon. As I said, no particular attention was given on optimization. This will be the topic for a next post in the near future. Feel free to contact me if you need further explanations about this post or if you want to add some usefull comments, I will answer with pleasure.&lt;/span&gt;&lt;/p&gt;                  &lt;p&gt;&lt;span style="font-family: arial; color: #000000"&gt;Next post will be about creating the same architecture, with … let’s say 3 or 4 slaves, in the Amazon Cloud Computing infrastructure. It will be a good time to speak about could computing in general (pros, cons, architecture …). &lt;/span&gt;&lt;/p&gt;                  &lt;br /&gt;                  &lt;br /&gt;                  &lt;br /&gt;&lt;/span&gt;                  &lt;br /&gt;&lt;/p&gt;             &lt;/code&gt;&lt;/code&gt;&lt;/code&gt;&lt;/code&gt;&lt;/code&gt;&lt;/span&gt;&lt;/code&gt;  &lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1939733381959425102?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1939733381959425102/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1939733381959425102' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1939733381959425102'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1939733381959425102'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/11/hi-all-i-would-like-to-start-serie-of.html' title='PDI clusters – Part 1 : How to build a simple PDI cluster.'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_hTlcWbt-BP4/SwqEQBKjwHI/AAAAAAAAAaY/UVvqAQ5fZB0/s72-c/PDI+cluster+general.png' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-227917824847371731</id><published>2009-11-18T08:26:00.000-08:00</published><updated>2009-11-18T08:57:04.268-08:00</updated><title type='text'>Spatial, you said spatial ?</title><content type='html'>&lt;span style="font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Spatially enabled datawarehouses, geospatial datatypes, geospatial BI ... etc ... a lot of buzz around this topic over the past year. Maybe the next BI grail ! &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Geospatial data helps users to visualise how usual business data (customer, products, time ...) are impacted by geography.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;For instance, a lot of telcos are building spatially enabled analysis and reporting. Just imagine some advanced analytics like : how is my customer using his cellphone, when is he using his cellphone and WHERE is he using it ? Now imagine the answer displayed on a map instead of a classical dashboards and spreadsheets. Amazing, no ?&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;p&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 300px; DISPLAY: block; HEIGHT: 187px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5405486237791265602" border="0" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/SwQlGAHPo0I/AAAAAAAAAaA/tCrL_kLkYnE/s400/bi_map.gif" /&gt; &lt;/p&gt;&lt;p&gt;&lt;span style="font-size:85%;"&gt;Have a look on this web site : &lt;/span&gt;&lt;a href="http://www.spatialytics.org/projects/geokettle/"&gt;&lt;span style="font-size:85%;"&gt;Spatialytics&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt; &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;You will find some very interesting explanations about geospatial datawarehousing and also a spatially enabled version of Kettle : &lt;strong&gt;GeoKettle&lt;/strong&gt;. Also a release of Mondrian, called &lt;strong&gt;GeoMondrian&lt;/strong&gt; .... spatially enabled. According to Spatialytics, GeoMondrian is the first open source &lt;strong&gt;SOLAP&lt;/strong&gt; player. &lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;/p&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 364px; DISPLAY: block; HEIGHT: 95px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5405486388301999778" border="0" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/SwQlOwzzLqI/AAAAAAAAAaI/clV0KrKVb_o/s400/GeoKettle.JPG" /&gt;I started to test GeoKettle. Quite interesting.&lt;/span&gt; &lt;span style="font-size:85%;"&gt;Of course you need to have some spatial data near you to play with but this should not be an issue for you according to Franklin (1992 - &lt;em&gt;An introduction to Geographic Information Systems : linking maps to databases&lt;/em&gt;) : &lt;strong&gt;"About 80% of all data stored in corporate databases has a spatial component".&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Have a look to your data and become spatially enabled !&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-227917824847371731?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/227917824847371731/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=227917824847371731' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/227917824847371731'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/227917824847371731'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/11/spatial-you-said-spatial.html' title='Spatial, you said spatial ?'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_hTlcWbt-BP4/SwQlGAHPo0I/AAAAAAAAAaA/tCrL_kLkYnE/s72-c/bi_map.gif' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3245668979120270234</id><published>2009-11-16T23:52:00.001-08:00</published><updated>2009-11-16T23:58:21.832-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Readings'/><title type='text'>Another reading ...</title><content type='html'>&lt;span style="font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Another interesting reading (at least for me) about BI and Open Source.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;This one is coming from Claudia Imhoff. Usually I'm not a great fan of her and her visions but I think she wrote quite a good white paper here.&lt;/span&gt;&lt;br /&gt;&lt;p&gt;&lt;a href="http://pagesperso-orange.fr/botools/Imhoff_Open_Sesame.pdf"&gt;&lt;span style="font-size:85%;"&gt;BI and Open Source&lt;/span&gt;&lt;/a&gt;&lt;/p&gt;&lt;p&gt;&lt;span style="font-size:85%;"&gt;Happy reading.&lt;/span&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3245668979120270234?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3245668979120270234/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3245668979120270234' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3245668979120270234'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3245668979120270234'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/11/another-reading.html' title='Another reading ...'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-224917512533426054</id><published>2009-11-16T12:22:00.000-08:00</published><updated>2009-11-16T23:58:08.832-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Readings'/><title type='text'>Massive but agile : very good article from Forrester</title><content type='html'>&lt;span style="font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;I just finished reading this study from Forrester.&lt;br /&gt;Massive but Agile, the next generation Enterprise Datawarehouse.&lt;br /&gt;Very good reading on how to make a big elephant move like a ballet dancer ;)&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;a href="http://pagesperso-orange.fr/botools/Forrester_Massive_But_Agile.pdf"&gt;&lt;span style="font-size:85%;"&gt;Forrester Massive but Agile&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;br /&gt;Enjoy and ... discuss if you want.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-224917512533426054?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/224917512533426054/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=224917512533426054' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/224917512533426054'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/224917512533426054'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/11/massive-but-agile-very-good-article.html' title='Massive but agile : very good article from Forrester'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1641461079949124011</id><published>2009-11-14T13:14:00.000-08:00</published><updated>2009-11-14T13:28:55.050-08:00</updated><title type='text'>Cleaning strings</title><content type='html'>&lt;span style="font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;A javascript, usefull for a string manipulation toolbox.&lt;br /&gt;Puts the first character in uppercase, the others in lowercase. Example : Vincent instead of vincent.&lt;br /&gt;This code is specific to Kettle (Input.getString) but can be used in Talend with little change. &lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;br /&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 437px; DISPLAY: block; HEIGHT: 210px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5404073582422068050" border="0" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/Sv8gSoxXC1I/AAAAAAAAAZ4/EIw0u-Auihg/s400/PDI_javascript_1.bmp" /&gt; &lt;pre style="BORDER-BOTTOM: #999999 1px dashed; BORDER-LEFT: #999999 1px dashed; PADDING-BOTTOM: 5px; LINE-HEIGHT: 14px; BACKGROUND-COLOR: #eee; PADDING-LEFT: 5px; WIDTH: 100%; PADDING-RIGHT: 5px; FONT-FAMILY: Andale Mono, Lucida Console, Monaco, fixed, monospace; COLOR: #000000; FONT-SIZE: 12px; OVERFLOW: auto; BORDER-TOP: #999999 1px dashed; BORDER-RIGHT: #999999 1px dashed; PADDING-TOP: 5px"&gt;&lt;p&gt;&lt;code&gt;//First letter in uppercase, others in lowercase&lt;br /&gt;&lt;br /&gt;var c = Input.getString().substr(0,1);&lt;br /&gt;if (parseInt(Input.getString().length)==1)&lt;/code&gt;&lt;/p&gt;&lt;p&gt;&lt;code&gt;&lt;/code&gt;&lt;code&gt;{&lt;br /&gt;var cc = upper(c);&lt;br /&gt;}&lt;/code&gt;&lt;code&gt;&lt;br /&gt;else&lt;br /&gt;{&lt;br /&gt;var cc = upper(c) + lower(Input.getString().slice(1));&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;/p&gt;&lt;/pre&gt;&lt;p&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1641461079949124011?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1641461079949124011/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1641461079949124011' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1641461079949124011'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1641461079949124011'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/11/cleaning-strings.html' title='Cleaning strings'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/Sv8gSoxXC1I/AAAAAAAAAZ4/EIw0u-Auihg/s72-c/PDI_javascript_1.bmp' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8973458800656507790</id><published>2009-11-14T11:48:00.000-08:00</published><updated>2009-11-14T12:03:00.106-08:00</updated><title type='text'>My new NAS</title><content type='html'>&lt;span style="font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;I'm currently working on personnal developments around BI and I needed a little NAS to store everything.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Finally, last week, I chose the Digitus one. A really nice piece of hardware. I think Digitus is a German company.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Let's have a look closer : &lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Dual SATA disks,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Raid 1 (mirror),&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Embedded HTTP and FTP server,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Multiple filesystems (ext, ntfs, fat ...),&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;File sharing,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Ethernet,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Torrent features (continue to download from torrents when your PC is off),&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Dual USB2 plugs and special features to copy from usb to internal disks.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span style="font-size:85%;"&gt;I also bought two SATA disks with 1 To. These two disks are now in Raid 1 (mirror) in order to secure all my data.&lt;/span&gt;&lt;/p&gt;&lt;span style="font-size:85%;"&gt;Speed is good, both with reading and writing. I just tried to open and use a vmware virtual machine from this NAS and everything worked well. The NAS is provided with an external PSU and is not very noisy (ok to stand on a corner of my desk).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Here are some pics.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 320px; DISPLAY: block; HEIGHT: 240px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5404051169754304146" border="0" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/Sv8L6DE9TpI/AAAAAAAAAZo/d3ES9vRUNNw/s320/Pictures+881.jpg" /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;p align="left"&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 240px; DISPLAY: block; HEIGHT: 320px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5404050798329738114" border="0" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/Sv8LkbacM4I/AAAAAAAAAZg/IQdu4qWzzpc/s320/Pictures+880.jpg" /&gt;&lt;/p&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; WIDTH: 240px; DISPLAY: block; HEIGHT: 320px; CURSOR: hand" id="BLOGGER_PHOTO_ID_5404050591081905890" border="0" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/Sv8LYXWtPuI/AAAAAAAAAZY/NpRhw3owyzM/s320/Pictures+879.jpg" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8973458800656507790?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8973458800656507790/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8973458800656507790' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8973458800656507790'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8973458800656507790'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/11/my-new-nas.html' title='My new NAS'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/Sv8L6DE9TpI/AAAAAAAAAZo/d3ES9vRUNNw/s72-c/Pictures+881.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-7469619004228149511</id><published>2009-11-13T09:00:00.000-08:00</published><updated>2009-11-14T11:42:26.689-08:00</updated><title type='text'>Quick list of ETL tools</title><content type='html'>&lt;span style="font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;A lot of work these days ... I'm back.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Soon to come : a complete overview of my current BI work on the &lt;strong&gt;Amazon Cloud (EC2)&lt;/strong&gt;. &lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;For now, a quick list of - more or less - free ETL tools.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;PDI (Kettle) from Pentaho : &lt;/span&gt;&lt;a href="http://www.pentaho.com/"&gt;&lt;span style="font-size:78%;"&gt;http://www.pentaho.com&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Talend Open Studio : &lt;/span&gt;&lt;a href="http://www.talend.com/"&gt;&lt;span style="font-size:78%;"&gt;http://www.talend.com&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;KETL : &lt;/span&gt;&lt;a href="http://sourceforge.net/projects/ketl/"&gt;&lt;span style="font-size:78%;"&gt;http://sourceforge.net/projects/ketl/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Apatar : &lt;/span&gt;&lt;a href="http://www.apatar.com/"&gt;&lt;span style="font-size:78%;"&gt;http://www.apatar.com&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Pequel : &lt;/span&gt;&lt;a href="http://sourceforge.net/projects/pequel/"&gt;&lt;span style="font-size:78%;"&gt;http://sourceforge.net/projects/pequel/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Jasper ETL (= Talend Open Studio) : &lt;/span&gt;&lt;a href="http://www.jaspersoft.com/jasperetl"&gt;&lt;span style="font-size:78%;"&gt;http://www.jaspersoft.com/jasperetl&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Clover ETL : &lt;/span&gt;&lt;a href="http://www.cloveretl.com/"&gt;&lt;span style="font-size:78%;"&gt;http://www.cloveretl.com/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Octopus : &lt;/span&gt;&lt;a href="http://octopus.enhydra.org/"&gt;&lt;span style="font-size:78%;"&gt;http://octopus.enhydra.org/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-size:85%;"&gt;Benetl : &lt;/span&gt;&lt;a href="http://www.benetl.net/"&gt;&lt;span style="font-size:78%;"&gt;http://www.benetl.net/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-7469619004228149511?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/7469619004228149511/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=7469619004228149511' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7469619004228149511'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7469619004228149511'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/11/quick-list-of-etl-tools.html' title='Quick list of ETL tools'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-5542243315544451690</id><published>2009-03-11T11:35:00.000-07:00</published><updated>2009-03-11T11:43:50.878-07:00</updated><title type='text'>Free/open source DB Modelling tools</title><content type='html'>&lt;span style="font-size:85%;"&gt;&lt;span style="font-family:arial;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;A quick overview of free and/or open source DB modelling tools.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;I use them quite often, depending on the work I have to do and features I need.&lt;/span&gt;&lt;br /&gt;&lt;/span&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;ARGO UML : &lt;/span&gt;&lt;a href="http://www.argouml.org/"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;http://www.argouml.org/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;AZZURI : &lt;/span&gt;&lt;a href="http://www.azzurri.jp/en/software/clay/index.jsp"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;http://www.azzurri.jp/en/software/clay/index.jsp&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;DB DESIGNER 4 : &lt;/span&gt;&lt;a href="http://www.fabforce.net/dbdesigner4/"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;http://www.fabforce.net/dbdesigner4/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;DIA : &lt;/span&gt;&lt;a href="http://www.gnome.org/gnome-office/dia.shtml"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;http://www.gnome.org/gnome-office/dia.shtml&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;DRUID : &lt;/span&gt;&lt;a href="http://druid.sourceforge.net/"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;http://druid.sourceforge.net/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;TCM : &lt;/span&gt;&lt;a href="http://wwwhome.cs.utwente.nl/~tcm/"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;http://wwwhome.cs.utwente.nl/~tcm/&lt;/span&gt;&lt;/a&gt;&lt;/li&gt;&lt;/ul&gt;&lt;p&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Feel free to send a comment if you use another tool.&lt;/span&gt;&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-5542243315544451690?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/5542243315544451690/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=5542243315544451690' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5542243315544451690'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5542243315544451690'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/03/db-modelling-tools.html' title='Free/open source DB Modelling tools'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8703719520736559139</id><published>2009-02-13T14:48:00.000-08:00</published><updated>2009-02-14T00:57:39.070-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Kettle : transforming number into hh:mm:ss</title><content type='html'>&lt;div&gt;&lt;span style="font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Another little code for closing the week.&lt;br /&gt;I'm gathering data coming out of PABX systems. Time is stored with numbers.&lt;br /&gt;Sometimes you just don't want to see 3662 seconds but 01:01:02, with [hh:mm:ss] format.&lt;br /&gt;&lt;br /&gt;Be carefull, hh:mm:ss is a string format : you won't be able to do calculations nor apply maths functions on it anymore.&lt;/span&gt; &lt;/div&gt;&lt;div&gt; &lt;/div&gt;&lt;img id="BLOGGER_PHOTO_ID_5302574469749245746" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 400px; CURSOR: hand; HEIGHT: 250px; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/SZaHYtB1hzI/AAAAAAAAAX0/BHuh9GRvB3w/s400/PDI_time_conversion.JPG" border="0" /&gt; &lt;p&gt;&lt;span style="font-size:85%;"&gt;Javascript code (sorry for poor blogspot code formating) :&lt;br /&gt;&lt;/span&gt;&lt;/p&gt;&lt;pre style="BORDER-RIGHT: #999999 1px dashed; PADDING-RIGHT: 5px; BORDER-TOP: #999999 1px dashed; PADDING-LEFT: 5px; FONT-SIZE: 12px; PADDING-BOTTOM: 5px; OVERFLOW: auto; BORDER-LEFT: #999999 1px dashed; WIDTH: 100%; COLOR: #000000; LINE-HEIGHT: 14px; PADDING-TOP: 5px; BORDER-BOTTOM: #999999 1px dashed; FONT-FAMILY: Andale Mono, Lucida Console, Monaco, fixed, monospace; BACKGROUND-COLOR: #eee"&gt;&lt;code&gt;var time_nb = time.getNumber();&lt;br /&gt;&lt;br /&gt;var hh,mm,ss;&lt;br /&gt;&lt;br /&gt;var TimeFormat;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;ss = time_nb % 60;&lt;br /&gt;&lt;br /&gt;mm = time_nb / 60;&lt;br /&gt;&lt;br /&gt;hh = mm / 60;&lt;br /&gt;&lt;br /&gt;mm = mm % 60;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;if(hh&amp;lt;10)(hh="0" + hh);&lt;br /&gt;&lt;br /&gt;if(mm&amp;lt;10)(mm="0" + mm);&lt;br /&gt;&lt;br /&gt;if(ss&amp;lt;10)(ss="0" + ss);&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;TimeFormat = hh + ":" + mm + ":" + ss;&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8703719520736559139?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8703719520736559139/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8703719520736559139' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8703719520736559139'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8703719520736559139'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/02/kettle-transforming-number-into-hhmmss.html' title='Kettle : transforming number into hh:mm:ss'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_hTlcWbt-BP4/SZaHYtB1hzI/AAAAAAAAAX0/BHuh9GRvB3w/s72-c/PDI_time_conversion.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4492096957403223568</id><published>2009-02-13T09:31:00.001-08:00</published><updated>2009-02-14T00:58:55.159-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Email spelling check with Kettle !</title><content type='html'>&lt;div&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Today, I had to check email spelling across some very large files before loading into database.&lt;br /&gt;Kettle helped me here, with some javascript, regular expression and a conditional branch to route the good and bad emails.&lt;br /&gt;Below is the transformation :&lt;br /&gt;&lt;img id="BLOGGER_PHOTO_ID_5302574848226477794" style="DISPLAY: block; MARGIN: 0px auto 10px; WIDTH: 400px; CURSOR: hand; HEIGHT: 224px; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/SZaHuu91xuI/AAAAAAAAAX8/zmOXO9lxUPM/s400/PDI_email_check.JPG" border="0" /&gt; &lt;p&gt;And the javascript code ....&lt;/p&gt;&lt;pre style="BORDER-RIGHT: #999999 1px dashed; PADDING-RIGHT: 5px; BORDER-TOP: #999999 1px dashed; PADDING-LEFT: 5px; FONT-SIZE: 12px; PADDING-BOTTOM: 5px; OVERFLOW: auto; BORDER-LEFT: #999999 1px dashed; WIDTH: 100%; COLOR: #000000; LINE-HEIGHT: 14px; PADDING-TOP: 5px; BORDER-BOTTOM: #999999 1px dashed; FONT-FAMILY: Andale Mono, Lucida Console, Monaco, fixed, monospace; BACKGROUND-COLOR: #eee"&gt;&lt;code&gt;//Javascript with regular expressions to test email spelling&lt;br /&gt;&lt;br /&gt;// Code by Vincent Teyssier&lt;br /&gt;&lt;br /&gt;var email_string = email.getString();&lt;br /&gt;&lt;br /&gt;var pattern=/^([a-zA-Z0-9_\.\-])+\@(([a-zA-Z0-9\-])+\.)+([a-zA-Z0-9]{2,4})+$/;&lt;br /&gt;&lt;br /&gt;if(pattern.test(email_string))&lt;br /&gt;&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt;var Emails_status = "good";&lt;br /&gt;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;else&lt;br /&gt;&lt;br /&gt;{&lt;br /&gt;&lt;br /&gt;var Emails_status = "bad";&lt;br /&gt;&lt;br /&gt;}&lt;br /&gt;&lt;br /&gt;&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4492096957403223568?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4492096957403223568/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4492096957403223568' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4492096957403223568'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4492096957403223568'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/02/hi-all-today-i-had-to-check-email.html' title='Email spelling check with Kettle !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_hTlcWbt-BP4/SZaHuu91xuI/AAAAAAAAAX8/zmOXO9lxUPM/s72-c/PDI_email_check.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1086090390541465597</id><published>2009-01-31T09:40:00.000-08:00</published><updated>2009-01-31T09:45:13.805-08:00</updated><title type='text'>Database comparison</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;I want to start a large post dedicated to relational database comparison.&lt;br /&gt;I will focus on &lt;strong&gt;Open Source&lt;/strong&gt;, of course, and will try to do something graphical.&lt;br /&gt;For the moment, have a look to this very interesting &lt;a href="http://en.wikipedia.org/wiki/Comparison_of_relational_database_management_systems"&gt;wiki&lt;/a&gt;, showing some very large comparison matrix.&lt;br /&gt;Worth a look !&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1086090390541465597?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1086090390541465597/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1086090390541465597' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1086090390541465597'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1086090390541465597'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2009/01/database-comparison.html' title='Database comparison'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4151276097463143791</id><published>2008-11-09T10:40:00.000-08:00</published><updated>2008-11-09T10:51:35.539-08:00</updated><title type='text'>Hiring again !!</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;I'm hiring again for a CDI contract in France : a Business Objects developper.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Mission :&lt;/strong&gt;&lt;br /&gt;Reporting and BI Development for Belgium clients and day to day support for internal clients.&lt;br /&gt;Mainly Business Objects.&lt;br /&gt;Development is 70% and day to day support is 30%.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Activity :&lt;/strong&gt;&lt;br /&gt;Call center.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Location :&lt;/strong&gt;&lt;br /&gt;Paris, Malakoff. With travels to Belgium.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Profile :&lt;/strong&gt;&lt;br /&gt;Junior, experienced with 2 years using BO in BI projects.&lt;br /&gt;Good knowledge of BO 6.5.x, Webi, Infoview, Designer, Reporter and Supervisor.&lt;br /&gt;Databases : Oracle 9i (development)&lt;br /&gt;ETL : BO DataIntegrator (BODI XI) is a must have or any other significant experience with other ETL (Powercenter, Datastage ...).&lt;br /&gt;VB, .Net ... would be nice.&lt;br /&gt;&lt;br /&gt;If you want to apply / meet me, please reach me here (cut nospam) :&lt;br /&gt;&lt;a href="mailto:vincent.teyssierNOSPAM@sitel.com"&gt;vincent.teyssierNOSPAM@sitel.com&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4151276097463143791?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4151276097463143791/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4151276097463143791' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4151276097463143791'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4151276097463143791'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/11/hiring-again.html' title='Hiring again !!'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4951319620990469068</id><published>2008-04-18T06:21:00.000-07:00</published><updated>2008-04-18T06:25:14.273-07:00</updated><title type='text'>Hiring !</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;I'm looking for an &lt;strong&gt;ESSBASE&lt;/strong&gt; consultant for a 15 days mission in Madrid.&lt;br /&gt;Objectives are :&lt;br /&gt;- Analysis and diagnosis of existing architecture,&lt;br /&gt;- Add multi currency features in 2 cubes,&lt;br /&gt;- Training for actual administrator.&lt;br /&gt;English spoken mandatory.&lt;br /&gt;If you want to apply, please reach me here (cut nospam) : &lt;a href="mailto:vincent.teyssierNOSPAM@sitel.com"&gt;vincent.teyssierNOSPAM@sitel.com&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4951319620990469068?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4951319620990469068/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4951319620990469068' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4951319620990469068'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4951319620990469068'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/04/hiring.html' title='Hiring !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8626841962260886558</id><published>2008-03-25T10:25:00.000-07:00</published><updated>2008-04-19T03:29:00.087-07:00</updated><title type='text'>Publishing data from datawarehouse to HTTPS</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;Last week, I had to publish a datafile to a HTTPS location, for one of our clients here in France.&lt;br /&gt;Easy to do with gui, but how to automate and embed in a DTS or ETL process under Windows ? Not easy.&lt;br /&gt;What did I use ? &lt;strong&gt;Open source cURL&lt;/strong&gt; !!&lt;br /&gt;Here is the recipe I followed.&lt;br /&gt;&lt;br /&gt;1 - Install OpenSSL on your machine, Windows compilation of course.&lt;br /&gt;2 - Take cURL, a famous Unix / Linux tool. Take a Windows compilation with SSL support and forget the cygwin version (beurk).&lt;br /&gt;3 - The certificate part, the moste tricky :&lt;br /&gt;Windows / IE use certificates in the form P12 (PKCS12), which is different from the cURL Unix certificate : PEM.&lt;br /&gt;So you have to convert the certificate and extract the userkey file, the usercertificate file and the ca file !&lt;br /&gt;To do the conversion, use the following commands, with OpenSSL. I did it on a Linux box :&lt;br /&gt;&lt;br /&gt;First, extract the userkey.pem from the certificate :&lt;br /&gt;&lt;a href="mailto:vince@fc"&gt;&lt;span style="font-size:85%;"&gt;vince@fc&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;: openssl pkcs12 -nocerts -in cert.p12 -out userkey.pemEnter Import Password: (insert your certificate password)MAC verified OKEnter PEM pass phrase: (insert your Enter PEM pass phrase)Verifying - Enter PEM pass phrase: (reinsert your Enter PEM pass phrase)&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Then extract the usercert.pem file:&lt;br /&gt;&lt;a href="mailto:vince@fc"&gt;&lt;span style="font-size:85%;"&gt;vince@fc&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;: openssl pkcs12 -clcerts -nokeys -in cert.p12 -out usercert.pemEnter Import Password: (insert your certificate password)MAC verified OK&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;Finally, extract the ca.pem file :&lt;br /&gt;&lt;a href="mailto:vince@fc"&gt;&lt;span style="font-size:85%;"&gt;vince@fc&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt;: openssl pkcs12 -cacerts -nokeys -in cert.p12 -out ca.pemEnter Import Password: (insert your certificate password)MAC verified OK&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;4 - Then you can type your cURL command :&lt;br /&gt;&lt;span style="font-size:85%;"&gt;curl -u toto:tata -E certif.pem:passphrase --cacert ca.pem --proxy-ntlm -U FRLAR01\toto:tata -x proxyserver:8080 &lt;/span&gt;&lt;a href="https://213.41.176.116/quix"&gt;&lt;span style="font-size:85%;"&gt;https://213.41.176.116/quix&lt;/span&gt;&lt;/a&gt;&lt;span style="font-size:85%;"&gt; -v -T D:\FLATFILES\Client\Export\File_to_export.txt&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;Explanation of the command :&lt;br /&gt;curl is the command, of course&lt;br /&gt;-u is for user:password to the HTTPS site. Here I anonymised of course ...&lt;br /&gt;-E is for the new converted client certificate in pem format&lt;br /&gt;--cacert is for giving the ca certificate in pem format&lt;br /&gt;--proxy-ntlm is for allowing proxy to connect in case of a proxy usage&lt;br /&gt;-U is the credentials for the proxy. Here again I anonymised the data&lt;br /&gt;-x is the proxy:port adress&lt;br /&gt;-https ... is the destination adress on https&lt;br /&gt;-v is for verbose mode. Usefull.&lt;br /&gt;-T is for indicating the file to transfert. Here the file is name File_to_export.txt and is a txt datafile&lt;br /&gt;&lt;br /&gt;You can see here I choose to use the proxy on each command, instead of setting up the proxy on my server. This is why I don't need any ISA proxy client on my datawarehouse server, so by using a direct proxy connection when needed, I save some server resources.&lt;br /&gt;&lt;br /&gt;I hope this will help the community for any next HTTPS data push.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8626841962260886558?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8626841962260886558/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8626841962260886558' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8626841962260886558'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8626841962260886558'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/03/publishing-data-from-datawarehouse-to.html' title='Publishing data from datawarehouse to HTTPS'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-5076676582921939631</id><published>2008-03-20T13:21:00.000-07:00</published><updated>2009-04-06T12:10:10.073-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BOBatchConverter'/><title type='text'>BOBatchConverter release 2.2</title><content type='html'>&lt;a href="http://3.bp.blogspot.com/_hTlcWbt-BP4/R-LKVC7GmKI/AAAAAAAAAQQ/XC4D6iK7dEY/s1600-h/BobatchConvertorLOGO.jpg"&gt;&lt;img id="BLOGGER_PHOTO_ID_5179924984340322466" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/R-LKVC7GmKI/AAAAAAAAAQQ/XC4D6iK7dEY/s200/BobatchConvertorLOGO.jpg" border="0" /&gt;&lt;/a&gt;Hi all,&lt;br /&gt;&lt;br /&gt;After some very interesting feed back from users, I made a new release of the BOBatchConverter. We are now in version 2.2.&lt;br /&gt;What was done / fixed :&lt;br /&gt;&lt;div&gt;&lt;li&gt;&lt;div align="justify"&gt;&lt;span style="font-size:85%;"&gt;handling pathnames with spaces ... like C:\BO DOCS\Report v2.rep,&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="justify"&gt;&lt;span style="font-size:85%;"&gt;mailing process was hardened,&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="justify"&gt;&lt;span style="font-size:85%;"&gt;new syntax : you have to prefix your command line arguments with the token "--" like --user or --REFRESH,&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="justify"&gt;&lt;span style="font-size:85%;"&gt;you can use ; to separate the different emails in the profile manager (previous was a stupid ",").&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;div&gt;&lt;br /&gt;Please, find the release &lt;a href="http://pagesperso-orange.fr/botools/Release2.2.zip"&gt;here&lt;/a&gt;.&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-5076676582921939631?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/5076676582921939631/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=5076676582921939631' title='5 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5076676582921939631'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5076676582921939631'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/03/bobatchconverter-release-22.html' title='BOBatchConverter release 2.2'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/R-LKVC7GmKI/AAAAAAAAAQQ/XC4D6iK7dEY/s72-c/BobatchConvertorLOGO.jpg' height='72' width='72'/><thr:total>5</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-975730934626910631</id><published>2008-03-11T14:17:00.001-07:00</published><updated>2008-03-20T13:32:51.660-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BOBatchConverter'/><title type='text'>BOBatchConverter V2 : now more reliable process management.</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;Here is new release of my Business Objects converter utility : BOBatchConverter&lt;br /&gt;As usual, you can use this command line tool as a Broadcast Agent (BCA) replacement : refresh, convert to any format (XLS, PDF, CSV, HTML), publish to WebIntelligence and / or send by mail.&lt;br /&gt;This new feature offers a more reliable process management. You can now run several and concurrent instances of BoBatchConverter.&lt;br /&gt;&lt;br /&gt;You can download it &lt;a href="http://www.decisionsystems-studio.fr/Progs/BOBatchConvertorV2.zip"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;span style="color:#ff0000;"&gt;WARNING : the release 2.2 is available with major fixes.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-975730934626910631?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/975730934626910631/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=975730934626910631' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/975730934626910631'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/975730934626910631'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/03/bobatchconverter-v2-now-more-reliable.html' title='BOBatchConverter V2 : now more reliable process management.'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-7091638510087844823</id><published>2008-02-19T08:52:00.000-08:00</published><updated>2009-12-03T13:08:30.204-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='ColorMyTail tail unix linux NET log'/><title type='text'>COLORmyTail : Unix Tail for Windows !</title><content type='html'>&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto; DISPLAY: block; CURSOR: hand" id="BLOGGER_PHOTO_ID_5168735318872843906" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/R7sJYytuZoI/AAAAAAAAANY/wJ7EO_6rY9Q/s200/ColorMyTailLOGO2.bmp" /&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Today I will give a little program I've coded in .NET and named ColorMyTail.&lt;br /&gt;&lt;br /&gt;You all know the tail command on Unix : great command to parse a log file in real time.&lt;br /&gt;I decided to code my own on Windows platform because this command does not exist.&lt;br /&gt;&lt;br /&gt;But I also decided to include some value added : colorization !&lt;br /&gt;&lt;br /&gt;To be short, my tail command works the same as on Unix but colors can be applied to the lines according some keywords.&lt;br /&gt;Exemple : red when a log line contains ERROR, or green when the line contains SUCCESS.&lt;br /&gt;The keywords / color association can be decided in a Formating.ini file.&lt;br /&gt;&lt;br /&gt;Very usefull to monitor some sensitive logs on Windows platforms.&lt;br /&gt;Caution : some troubles with Vista.&lt;br /&gt;The install package is &lt;a href="http://pagesperso-orange.fr/botools/ColorMyTail.zip"&gt;here&lt;/a&gt;.&lt;br /&gt;&lt;br /&gt;Below some snapshots. The first one is really running in some prod environment for file routing, the second is to illustrate the full colorization.&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; DISPLAY: block; CURSOR: hand" id="BLOGGER_PHOTO_ID_5168757244680890034" border="0" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/R7sdVCtuZrI/AAAAAAAAANw/j4WaWSKr-tA/s320/ColorMyTailSnap1.jpg" /&gt; &lt;img style="TEXT-ALIGN: center; MARGIN: 0px auto 10px; DISPLAY: block; CURSOR: hand" id="BLOGGER_PHOTO_ID_5168757395004745410" border="0" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/R7sddytuZsI/AAAAAAAAAN4/uNjvnQV3Flg/s320/ColorMyTailSnap2.jpg" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-7091638510087844823?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/7091638510087844823/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=7091638510087844823' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7091638510087844823'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7091638510087844823'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/02/hi-all-today-i-will-give-little-program.html' title='COLORmyTail : Unix Tail for Windows !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/R7sJYytuZoI/AAAAAAAAANY/wJ7EO_6rY9Q/s72-c/ColorMyTailLOGO2.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8709160777602803784</id><published>2008-02-18T00:50:00.000-08:00</published><updated>2008-02-18T00:56:09.492-08:00</updated><title type='text'>HALOGEN : olap4j viewer for Mondrian</title><content type='html'>Hi all,&lt;br /&gt;&lt;br /&gt;Look at this promising HALOGEN web page.&lt;br /&gt;An olap viewer that plugs to Mondrian or XMLA.&lt;br /&gt;Still in early dev stages, but promising.&lt;br /&gt;&lt;br /&gt;&lt;a href="http://code.google.com/p/halogen/"&gt;http://code.google.com/p/halogen/&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8709160777602803784?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8709160777602803784/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8709160777602803784' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8709160777602803784'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8709160777602803784'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/02/halogen-olap4j-viewer-for-mondrian.html' title='HALOGEN : olap4j viewer for Mondrian'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2071941677134974963</id><published>2008-02-18T00:01:00.000-08:00</published><updated>2008-02-18T00:47:46.511-08:00</updated><title type='text'>FOSDEM !</title><content type='html'>&lt;a href="http://www.fosdem.org/"&gt;&lt;img alt="I’m going to FOSDEM, the Free and Open Source Software Developers’ European Meeting" src="http://www.fosdem.org/promo/going-to" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Don't forget the FOSDEM : &lt;a href="http://www.fosdem.org/2008/"&gt;http://www.fosdem.org/2008/&lt;/a&gt;&lt;br /&gt;The Free Open Source Software's Developpers European Meeting will take place in Brussels (Belgium) from 23 to 24 February.&lt;br /&gt;&lt;br /&gt;Matt Casters (Kettle /PDI) will be giving a talk about Kettle.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2071941677134974963?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2071941677134974963/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2071941677134974963' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2071941677134974963'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2071941677134974963'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/02/fosdem.html' title='FOSDEM !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4803526786675449426</id><published>2008-02-17T23:48:00.001-08:00</published><updated>2008-02-18T00:47:16.071-08:00</updated><title type='text'>Talend Open Studio 2.3</title><content type='html'>&lt;span style="font-family:arial;font-size:78%;"&gt;&lt;/span&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;Hi all,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;Talend Open Studio 2.3 is now available for download here : &lt;/span&gt;&lt;a href="http://www.talend.com/download.php?src=HomepageTechNews"&gt;&lt;span style="font-family:arial;"&gt;http://www.talend.com/download.php?src=HomepageTechNews&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;Currently in testing on my infrastructure : great product.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;Will come back soon with more feed back.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;img id="BLOGGER_PHOTO_ID_5168226116140164690" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/R7k6RStuZlI/AAAAAAAAAM8/Z9RI2mzH8Mc/s320/talend.bmp" border="0" /&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4803526786675449426?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4803526786675449426/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4803526786675449426' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4803526786675449426'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4803526786675449426'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2008/02/talend-open-studio-23.html' title='Talend Open Studio 2.3'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/R7k6RStuZlI/AAAAAAAAAM8/Z9RI2mzH8Mc/s72-c/talend.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-4515481875297742228</id><published>2007-05-30T06:59:00.000-07:00</published><updated>2008-02-18T00:48:01.475-08:00</updated><title type='text'>Kettle V3.0</title><content type='html'>&lt;span style="font-family:arial;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;I'm happy to come back on my blog (I had hard job these days, so no more time to write) and post a message for Kettle V3.0 announcement !&lt;br /&gt;V3.0 is a major breakthrough, according to Matt Casters "we’re aiming for a strict separation of data and metadata".&lt;br /&gt;Performance gain ? Around 15-20% !! Some speed transformations show a x5 factor !&lt;br /&gt;Nice job, guys.&lt;br /&gt;Here you can find a quick speedup comparison made during regression tests against version 2.5.&lt;br /&gt;&lt;/span&gt;&lt;a href="http://kettle.pentaho.org/svn/Kettle/trunk/experimental_test/org/pentaho/di/run/RunResults-Matt-20070516.txt"&gt;&lt;span style="font-family:arial;"&gt;V2.5 - V3.0 Speedup comparison&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;font-size:78%;"&gt; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-4515481875297742228?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/4515481875297742228/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=4515481875297742228' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4515481875297742228'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/4515481875297742228'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/05/kettle-v30.html' title='Kettle V3.0'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3148947394834419874</id><published>2007-04-12T05:18:00.000-07:00</published><updated>2008-02-18T00:48:18.753-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>New Look &amp; Feel for Kettle 2.5.0 !!</title><content type='html'>&lt;span style="font-family:arial;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Just a quick message to show you how wonderfull the new Kettle 2.5.0 gui looks like !&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Nice colors, nice menus, more profesionnal screens.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;"&gt;Still not as nice as Talend Studio, but getting closer ...&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;&lt;img id="BLOGGER_PHOTO_ID_5052515591592125250" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/Rh4kKUL4o0I/AAAAAAAAAGs/e2TkwX2bqqA/s320/Kettle25.bmp" border="0" /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3148947394834419874?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3148947394834419874/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3148947394834419874' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3148947394834419874'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3148947394834419874'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/04/new-look-feel-for-kettle-250.html' title='New Look &amp; Feel for Kettle 2.5.0 !!'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_hTlcWbt-BP4/Rh4kKUL4o0I/AAAAAAAAAGs/e2TkwX2bqqA/s72-c/Kettle25.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3363642493934455095</id><published>2007-03-27T04:53:00.000-07:00</published><updated>2008-02-18T00:48:36.113-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Music'/><title type='text'>No BI for today ... just great music !</title><content type='html'>&lt;span style="font-family:arial;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Today I'm a bit tired.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Yesterday, I went to see TOTO in concert, in Paris.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Great moment, great music, great band, great Lukather ...&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;It was really cool to forget BI, servers, data, ETL ... during a while and been transported back in the 80's.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Here is a nice shot of the concert.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Arial;font-size:78%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;img id="BLOGGER_PHOTO_ID_5046571880602461074" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/RgkGYzTI45I/AAAAAAAAAGg/QZgHwFLMCjU/s320/toto.jpg" border="0" /&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3363642493934455095?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3363642493934455095/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3363642493934455095' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3363642493934455095'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3363642493934455095'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/03/no-bi-for-today-just-great-music.html' title='No BI for today ... just great music !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/RgkGYzTI45I/AAAAAAAAAGg/QZgHwFLMCjU/s72-c/toto.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1433127251222596111</id><published>2007-03-26T06:07:00.000-07:00</published><updated>2008-02-18T00:48:51.974-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Storage'/><title type='text'>My personnal storage system</title><content type='html'>&lt;span style="font-family:arial;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;I've just received my new storage system : a nice Promise UltraTrak RM4000.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Loaded with &lt;strong&gt;1,5 To&lt;/strong&gt; disk space in raid5 ... coooool.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Now I can work with really nice amout of data for ETL processes, DB optimization and OLAP cubes (and maybe some mp3 too ...).&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;"&gt;Here is a snapshot of the system.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;&lt;img id="BLOGGER_PHOTO_ID_5046220036881572738" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/RgfGYzTI44I/AAAAAAAAAGY/UBVwEaJevu8/s400/promise.jpg" border="0" /&gt;&lt;br /&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1433127251222596111?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1433127251222596111/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1433127251222596111' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1433127251222596111'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1433127251222596111'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/03/my-personnal-storage-system.html' title='My personnal storage system'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_hTlcWbt-BP4/RgfGYzTI44I/AAAAAAAAAGY/UBVwEaJevu8/s72-c/promise.jpg' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-3430584501549875421</id><published>2007-03-26T03:22:00.000-07:00</published><updated>2008-02-18T00:49:10.331-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Talend'/><title type='text'>Talend Open Studio 2.0.0 new features</title><content type='html'>&lt;a href="http://1.bp.blogspot.com/_hTlcWbt-BP4/RgegpzTI43I/AAAAAAAAAGQ/JBPVG1w7upg/s1600-h/talend-open-data-solution.gif"&gt;&lt;img id="BLOGGER_PHOTO_ID_5046178547497493362" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/RgegpzTI43I/AAAAAAAAAGQ/JBPVG1w7upg/s400/talend-open-data-solution.gif" border="0" /&gt;&lt;/a&gt; &lt;div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;Hi all,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;Today, let's see what happening with TALEND Open Studio (TOS) 2.0.0.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;According to talend.com, "Java Generation language is the new core feature added to Milestone 2 of Talend Open Studio v2.0 release. A number of Perl connectors now have their Java counterparts."&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;Great to see java is now available !&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;More new features : &lt;/span&gt;&lt;/div&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;New specific MySql and Oracle components,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;Mysql "Bulk" components,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;tFileFetch component,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;tRowGenerator2 component,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;Technical components: tFor &amp;amp; tSleep to implement a loop !&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;tAggregateRow component enhancements for better agregation,&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;span style="font-family:arial;"&gt;External components, to link to your own production.&lt;/span&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;More to read here : &lt;/span&gt;&lt;a href="http://www.talend.com/products/whats-new.htm"&gt;&lt;span style="font-family:arial;"&gt;http://www.talend.com/products/whats-new.htm&lt;/span&gt;&lt;/a&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-3430584501549875421?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/3430584501549875421/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=3430584501549875421' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3430584501549875421'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/3430584501549875421'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/03/talend-open-studio-200-new-features.html' title='Talend Open Studio 2.0.0 new features'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_hTlcWbt-BP4/RgegpzTI43I/AAAAAAAAAGQ/JBPVG1w7upg/s72-c/talend-open-data-solution.gif' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2130225914568947051</id><published>2007-03-22T08:50:00.000-07:00</published><updated>2007-03-22T08:55:17.768-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mondrian'/><title type='text'>A complete Kettle + Mondrian feed back</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;After more than 4 months spent on Kettle and Mondrian, I wrote a review on the french developper network.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;This review is written in french and you can find it here : &lt;/span&gt;&lt;a href="http://www.developpez.net/forums/showthread.php?t=253349"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;http://www.developpez.net/forums/showthread.php?t=253349&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Please, feel free to answer and share your personnal feed back.&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2130225914568947051?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2130225914568947051/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2130225914568947051' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2130225914568947051'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2130225914568947051'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/03/complete-kettle-mondrian-feed-back.html' title='A complete Kettle + Mondrian feed back'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-8341659783884351437</id><published>2007-03-12T08:28:00.000-07:00</published><updated>2007-03-12T08:59:42.749-07:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>My Unix way to parallelize PDI/Kettle jobs ...</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Today I want to share some code I used in order to parallelize jobs, at OS level (for me, it's Solaris).&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;My needs are : &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;- run 2 jobs in parallel, each on a separate engine,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;- run a third job when first 2 parallel jobs are both complete.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;I decided to design 2 separate jobs, then manage the execution with simple bash scripting using temp files as flags.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Here is the final code (extract from larger sh file) : &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;img id="BLOGGER_PHOTO_ID_5041062775399476306" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/RfVz4rIhbFI/AAAAAAAAAGA/eoHFvvcPZm0/s400/script.JPG" border="0" /&gt;&lt;/span&gt;&lt;br /&gt;&lt;ul&gt;&lt;li&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;First 2 blue lines : 2 jobs to run in parallel, create a flag file for each command.&lt;/span&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Red lines : a flag control to detect if both jobs are complete (= presence of flag file).&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;li&gt;&lt;div align="left"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Last blue line : the third job to run after the completion of the parallel ones.&lt;/span&gt;&lt;/div&gt;&lt;/li&gt;&lt;/ul&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-8341659783884351437?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/8341659783884351437/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=8341659783884351437' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8341659783884351437'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/8341659783884351437'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/03/my-unix-way-to-parallelize-jobs.html' title='My Unix way to parallelize PDI/Kettle jobs ...'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/RfVz4rIhbFI/AAAAAAAAAGA/eoHFvvcPZm0/s72-c/script.JPG' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2248209901760819385</id><published>2007-02-26T01:29:00.000-08:00</published><updated>2007-02-26T01:43:00.395-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Calling a DB procedure within PDI/Kettle</title><content type='html'>&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Today, I want to share an interesting trick to run a DB procedure in PDI/Kettle.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;But anyway, where's the trick, since there is a dedicated transformation to run such procedure ?&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;I discovered the following : when trying to run a DB procedure with PDI/Kettle using this dedicated job, nothing happens, the procedure does not run and no error is logged.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt; &lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;The answer is : first, you need to call a "Row generation" transformation and link it to your "Call DB Procedure" transformation.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;But hey, don't forget to write "1" in the Limit field for the "Call DB Procedure", otherwise the procedure will run more than once !!!&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;I don't know if this is a real issue, but anyway who cares ? The job is finally done !!&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Here is the snapshot.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;a href="http://3.bp.blogspot.com/_hTlcWbt-BP4/ReKquCZaZ2I/AAAAAAAAAF0/s377OSlU9Ms/s1600-h/CallDBProc.JPG"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;img id="BLOGGER_PHOTO_ID_5035775041247799138" style="FLOAT: left; MARGIN: 0px 10px 10px 0px; CURSOR: hand" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/ReKquCZaZ2I/AAAAAAAAAF0/s377OSlU9Ms/s400/CallDBProc.JPG" border="0" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2248209901760819385?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2248209901760819385/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2248209901760819385' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2248209901760819385'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2248209901760819385'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/02/calling-db-procedure-within-pdikettle.html' title='Calling a DB procedure within PDI/Kettle'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/ReKquCZaZ2I/AAAAAAAAAF0/s377OSlU9Ms/s72-c/CallDBProc.JPG' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2458385078537105264</id><published>2007-02-19T07:01:00.000-08:00</published><updated>2007-02-19T07:30:10.790-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mondrian'/><title type='text'>Interesting tip for Mondrian with XML/A</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;A few days ago, I discovered an interesting tip when using Mondrian with XML/A.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;For the moment, I ignore if this issue is documented (and if it is a real issue ...).&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;Let me explain : when using Mondrian + XML/A on a schema containing dimensions with "name" properties in hierarchy, the schema validation crashes, so the query execution.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;Example of faulty code, in XML/A mode : &lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/RdnA6IjgbbI/AAAAAAAAAFM/EkW_cSBABRk/s1600-h/faulty.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5033266163524595122" style="FLOAT: left; MARGIN: 0px 10px 10px 0px; CURSOR: hand" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/RdnA6IjgbbI/AAAAAAAAAFM/EkW_cSBABRk/s400/faulty.JPG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Now, the working code. &lt;strong&gt;Note : replace Hierarchy name with Hierarchy caption ...&lt;/strong&gt;&lt;/span&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_hTlcWbt-BP4/RdnBDojgbcI/AAAAAAAAAFU/gGh6pelZcww/s1600-h/Good.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5033266326733352386" style="FLOAT: left; MARGIN: 0px 10px 10px 0px; CURSOR: hand" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/RdnBDojgbcI/AAAAAAAAAFU/gGh6pelZcww/s400/Good.JPG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-size:85%;"&gt;Just keep in mind I'm using Mondrian in XML/A mode on a Sun Sparc server with Solaris 8 and SunONE as application server (Oracle 8i as database).&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2458385078537105264?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2458385078537105264/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2458385078537105264' title='3 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2458385078537105264'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2458385078537105264'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/02/interesting-tip-for-mondrian-with-xmla.html' title='Interesting tip for Mondrian with XML/A'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_hTlcWbt-BP4/RdnA6IjgbbI/AAAAAAAAAFM/EkW_cSBABRk/s72-c/faulty.JPG' height='72' width='72'/><thr:total>3</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2375594673353793715</id><published>2007-02-15T00:44:00.000-08:00</published><updated>2007-02-26T01:29:02.479-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Another cool new feature for PDI / Kettle !!</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Here is a snapshot of a long awaited feature for PDI / Kettle.&lt;br /&gt;This is an advanced error handling feature !! A step further toward Data Quality Management in PDI !&lt;br /&gt;Have a look to the transformation below, to see how to implement the process. Very simple.&lt;br /&gt;More to learn on &lt;/span&gt;&lt;a href="http://www.ibridge.be/"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Matt Casters Data Integration Blog&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;.&lt;/span&gt;&lt;br /&gt;&lt;a href="http://2.bp.blogspot.com/_hTlcWbt-BP4/RdQew4jgbWI/AAAAAAAAAEc/OVoNCKzeVOE/s1600-h/error-handling-top.bmp"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;img id="BLOGGER_PHOTO_ID_5031680508843552098" style="FLOAT: left; MARGIN: 0px 10px 10px 0px; CURSOR: hand" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/RdQew4jgbWI/AAAAAAAAAEc/OVoNCKzeVOE/s400/error-handling-top.bmp" border="0" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2375594673353793715?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2375594673353793715/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2375594673353793715' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2375594673353793715'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2375594673353793715'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/02/another-cool-new-feature-for-pdi-kettle.html' title='Another cool new feature for PDI / Kettle !!'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_hTlcWbt-BP4/RdQew4jgbWI/AAAAAAAAAEc/OVoNCKzeVOE/s72-c/error-handling-top.bmp' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-6786783078160729559</id><published>2007-02-08T05:28:00.000-08:00</published><updated>2007-02-08T05:43:34.189-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Marketae'/><title type='text'>That's no BI, but have a look at it ...</title><content type='html'>&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/RcsoWojgbUI/AAAAAAAAAEI/mvhVDno3gpQ/s1600-h/marketae.JPG"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;img id="BLOGGER_PHOTO_ID_5029157778197867842" style="DISPLAY: block; MARGIN: 0px auto 10px; CURSOR: hand; TEXT-ALIGN: center" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/RcsoWojgbUI/AAAAAAAAAEI/mvhVDno3gpQ/s320/marketae.JPG" border="0" /&gt;&lt;/span&gt;&lt;/a&gt; &lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Today I want to introduce you to &lt;/span&gt;&lt;a href="http://www.marketae.com"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;marketae&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;.&lt;br /&gt;Marketae will soon become one of a major web 2.B actor.&lt;br /&gt;Did I say Web 2.b ? Well, that's for &lt;strong&gt;Web To Business&lt;/strong&gt; !&lt;br /&gt;The goal of marketae is to &lt;strong&gt;create, manage and develop preferred links between a buyer and a seller&lt;/strong&gt;. See it as a specialized market place (a kind of clever mix between Ebay, Viadeo and LinkedIn), where it will be possible to note to your partner in business.&lt;br /&gt;&lt;br /&gt;&lt;strong&gt;Christophe Vigliano&lt;/strong&gt; is the founder of Marketae, with who I had pleasure to work during our SITEL adventure.&lt;br /&gt;&lt;br /&gt;Feel free to visit &lt;/span&gt;&lt;a href="http://www.marketae.com"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;marketae&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;. Official opening in a few days.&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-6786783078160729559?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/6786783078160729559/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=6786783078160729559' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6786783078160729559'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/6786783078160729559'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/02/thats-no-bi-but-have-look-at-it.html' title='That&apos;s no BI, but have a look at it ...'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_hTlcWbt-BP4/RcsoWojgbUI/AAAAAAAAAEI/mvhVDno3gpQ/s72-c/marketae.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2096297359600643040</id><published>2007-01-29T06:25:00.000-08:00</published><updated>2007-01-29T07:09:38.941-08:00</updated><title type='text'>A complete fact capture job</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt; &lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Here is a job used to capture fact data on a daily basis.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Data is coming from a source table where : &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;- time is stored in columns, 1 column for 1 day = 31 columns for a month, from 1 to 31,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;- indicators are stored on rows : an indicator code for each indicator type.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;This source structure is not mine, the challenge to integrate and manage, is ;)&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;Here is the DDL of source table : &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;HOTEL_CODE,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;INDICATOR_CODE,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;YEAR,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;MONTH,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;ACTIVITY_TYPE,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;CONCEPT_CODE,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;DAY1&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;DAY 31&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;Here is the DDL of my destination table : &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;ID,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;HOTEL_CODE,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;TIME_KEY,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;YEAR,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;MONTH,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;INDICATOR1&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;...&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;INDICATOR12&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:78%;"&gt;TIMESTAMP&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;The job begins with a GetVariable job to capture new timestamp for new data from a previous job used to calculate the gap.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Then a first normalization puts data from "31 columns for a month" to one column named "Duration" (coz we deal duration data). At this moment, I chose to store an intermediate level of data in a temporary table, for performance purpose. Then, you can see a fork to load two types of tables :&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;- an agregated one, with denormalized data for indicators (each code indicator, we have 12, becomes a column),&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;- normalized one, used to print very specific reporting, back on the intranet application.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;Job is working great in prod environment. The SORT operation is a bit slow, but nothing really dangerous for our timing.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;For informations, please feel free to contact me : &lt;a href="mailto:vteyssier@decisionsystems-studio.fr"&gt;mailto:vteyssier@decisionsystems-studio.fr&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;br /&gt;&lt;div&gt;&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/Rb4NpklxsEI/AAAAAAAAADw/XYk0EqZE5HE/s1600-h/data_jour.JPG"&gt;&lt;/a&gt;&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/Rb4NpklxsEI/AAAAAAAAADw/XYk0EqZE5HE/s1600-h/data_jour.JPG"&gt;&lt;/a&gt;&lt;br /&gt;&lt;a href="http://3.bp.blogspot.com/_hTlcWbt-BP4/Rb4N2UlxsFI/AAAAAAAAAD4/3OQAtcnyXuw/s1600-h/data_jour.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5025469461083304018" style="CURSOR: hand" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/Rb4N2UlxsFI/AAAAAAAAAD4/3OQAtcnyXuw/s320/data_jour.JPG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;br /&gt;&lt;br /&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2096297359600643040?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2096297359600643040/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2096297359600643040' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2096297359600643040'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2096297359600643040'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/01/complete-fact-capture-job.html' title='A complete fact capture job'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/Rb4N2UlxsFI/AAAAAAAAAD4/3OQAtcnyXuw/s72-c/data_jour.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-5734843895548850004</id><published>2007-01-19T02:29:00.000-08:00</published><updated>2007-01-23T03:30:15.666-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>A complete dimension update job with PDI</title><content type='html'>&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;/span&gt;&lt;div&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt; &lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Here is my dimension update job.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;A lot of processes here, as you can see. Works great under PDI 2.4 and I still rely on target database functions to better manage timestamps and sequences.&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;&lt;/span&gt; &lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/RbXxn0lxsBI/AAAAAAAAADM/D5BTC17_BuI/s1600-h/dimensions.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5023186625835937810" style="CURSOR: hand" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/RbXxn0lxsBI/AAAAAAAAADM/D5BTC17_BuI/s320/dimensions.JPG" border="0" /&gt;&lt;/a&gt;&lt;/span&gt;&lt;/div&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-5734843895548850004?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/5734843895548850004/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=5734843895548850004' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5734843895548850004'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5734843895548850004'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/01/hi-all-here-is-my-dimension-update-job.html' title='A complete dimension update job with PDI'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_hTlcWbt-BP4/RbXxn0lxsBI/AAAAAAAAADM/D5BTC17_BuI/s72-c/dimensions.JPG' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-5614143553873596252</id><published>2007-01-09T12:24:00.000-08:00</published><updated>2007-01-10T00:36:00.472-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Kettle / PDI release 2.4 !</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Just received my &lt;b&gt;2.4 release of Kettle / PDI (&lt;/span&gt;&lt;/b&gt;&lt;a href="www.pentaho.org"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Pentaho&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;).&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Seems quite exciting, look at these nice new features (sorry, in french).&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;I'm going to have some testing as soon as tomorow with a +22 million rows table ...&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/RaP6d8fLbLI/AAAAAAAAAB4/XniKdehNb5I/s1600-h/kettle1.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5018129802180914354" style="CURSOR: hand" alt="" src="http://4.bp.blogspot.com/_hTlcWbt-BP4/RaP6d8fLbLI/AAAAAAAAAB4/XniKdehNb5I/s200/kettle1.JPG" border="0" /&gt;&lt;/a&gt; &lt;a href="http://2.bp.blogspot.com/_hTlcWbt-BP4/RaP6pcfLbMI/AAAAAAAAACA/0xlbyi_-6H0/s1600-h/kettle2.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5018129999749409986" style="CURSOR: hand" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/RaP6pcfLbMI/AAAAAAAAACA/0xlbyi_-6H0/s200/kettle2.JPG" border="0" /&gt;&lt;/a&gt; &lt;a href="http://2.bp.blogspot.com/_hTlcWbt-BP4/RaP6xcfLbNI/AAAAAAAAACI/zJF61lncMKg/s1600-h/kettle3.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5018130137188363474" style="CURSOR: hand" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/RaP6xcfLbNI/AAAAAAAAACI/zJF61lncMKg/s200/kettle3.JPG" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-5614143553873596252?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/5614143553873596252/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=5614143553873596252' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5614143553873596252'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/5614143553873596252'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/01/kettle-pdi-release-24.html' title='Kettle / PDI release 2.4 !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://4.bp.blogspot.com/_hTlcWbt-BP4/RaP6d8fLbLI/AAAAAAAAAB4/XniKdehNb5I/s72-c/kettle1.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-900810846093851570</id><published>2007-01-08T08:46:00.000-08:00</published><updated>2007-01-08T09:20:53.934-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='BOBatchConverter'/><title type='text'>BOBatchConverter : release 1.4</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Here is the last release of &lt;b&gt;BOBatchConverter&lt;/b&gt;. This tool will allow you to &lt;b&gt;run&lt;/b&gt;, &lt;b&gt;refresh&lt;/b&gt;, &lt;b&gt;convert&lt;/b&gt;, &lt;b&gt;post&lt;/b&gt;, &lt;b&gt;send&lt;/b&gt;, &lt;b&gt;email&lt;/b&gt;, &lt;b&gt;publish&lt;/b&gt; any single Business Objects report or multiple reports. &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Each action can be done with specific profiles (emailing ...) and I also provide a gui to manage your profiles (stored as XML files).&lt;br /&gt;It's a command line tool, runs under XP or Server 2003 and can be used in conjunction of a good scheduler to become a BCA like (BCA is the BroadCast Agent, the batch mode of Business Objects).&lt;br /&gt;Sure Business Objects is not free but this tool, which I keep developing since last October in .NET technology, is.&lt;br /&gt;&lt;br /&gt;Here is the link to download the archive with documentation on how to use it : &lt;/span&gt;&lt;a href="http://www.decisionsystems-studio.fr/BOBatchConverter_V1.4.zip"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;b&gt;BoBatchConverter&lt;/b&gt;&lt;/span&gt;&lt;/a&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;br /&gt;Here are 3 snapshots : the profile manager, the xml file and the console output.&lt;/span&gt;&lt;br /&gt;&lt;br /&gt;&lt;a href="http://1.bp.blogspot.com/_hTlcWbt-BP4/RaJ8BcfLbII/AAAAAAAAABQ/yvmDsvfimfU/s1600-h/Profile_Manager_Exemple.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5017709299112832130" style="CURSOR: hand" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/RaJ8BcfLbII/AAAAAAAAABQ/yvmDsvfimfU/s200/Profile_Manager_Exemple.JPG" border="0" /&gt;&lt;/a&gt; &lt;a href="http://3.bp.blogspot.com/_hTlcWbt-BP4/RaJ8H8fLbJI/AAAAAAAAABY/efOJu0WNZUM/s1600-h/XML_MailProfile_Exemple.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5017709410781981842" style="CURSOR: hand" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/RaJ8H8fLbJI/AAAAAAAAABY/efOJu0WNZUM/s200/XML_MailProfile_Exemple.JPG" border="0" /&gt;&lt;/a&gt; &lt;a href="http://1.bp.blogspot.com/_hTlcWbt-BP4/RaJ8ScfLbKI/AAAAAAAAABg/93d-3GRZ4a4/s1600-h/Snapshot_Output.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5017709591170608290" style="CURSOR: hand" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/RaJ8ScfLbKI/AAAAAAAAABg/93d-3GRZ4a4/s200/Snapshot_Output.JPG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;br /&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-900810846093851570?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/900810846093851570/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=900810846093851570' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/900810846093851570'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/900810846093851570'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/01/bobatchconverter-release-14.html' title='BOBatchConverter : release 1.4'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://1.bp.blogspot.com/_hTlcWbt-BP4/RaJ8BcfLbII/AAAAAAAAABQ/yvmDsvfimfU/s72-c/Profile_Manager_Exemple.JPG' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-2050964361196141970</id><published>2007-01-08T07:41:00.000-08:00</published><updated>2007-01-10T07:20:41.498-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='JRubik'/><title type='text'>Mondrian + JRubik : a good team</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Here is (first picture) a snapshot of a nice cube analysis done with &lt;b&gt;Mondrian&lt;/b&gt; in conjunction with &lt;b&gt;JRubik&lt;/b&gt;, a very nice MDX editor. On the second picture, you can see the same analysis rendered with &lt;b&gt;JPivot&lt;/b&gt; within Internet Explorer.&lt;br /&gt;As you can see, with little code, things can be quite good looking !&lt;br /&gt;&lt;br /&gt;This analysis contains 2 time series (A and A-1) and 2 calculated members (difference between A and A-1, and Evolution). As you can see, time is hardcoded for the moment on MDX code, that's because parameters from user will be refreshed at run time directly in the jsp (JPivot architecture).&lt;br /&gt;Some captions are hidden for confidentiality.&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;a href="http://4.bp.blogspot.com/_hTlcWbt-BP4/RaJmzMfLbFI/AAAAAAAAAAw/c_pO4xhkYVE/s1600-h/JRubik.JPG"&gt;&lt;/a&gt;&lt;a href="http://1.bp.blogspot.com/_hTlcWbt-BP4/RaUD1Elxr9I/AAAAAAAAACc/tpSnBzUtiT4/s1600-h/JRubik.JPG"&gt;&lt;/a&gt;&lt;a href="http://2.bp.blogspot.com/_hTlcWbt-BP4/RaUEAUlxr-I/AAAAAAAAACk/PS45pBVhgBo/s1600-h/JRubik.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5018421763347951586" style="CURSOR: hand" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/RaUEAUlxr-I/AAAAAAAAACk/PS45pBVhgBo/s320/JRubik.JPG" border="0" /&gt;&lt;/a&gt;&lt;a href="http://1.bp.blogspot.com/_hTlcWbt-BP4/RaJovcfLbGI/AAAAAAAAAA8/bMjrpa4-_sQ/s1600-h/Mondrian_web.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5017688099154259042" style="CURSOR: hand" alt="" src="http://1.bp.blogspot.com/_hTlcWbt-BP4/RaJovcfLbGI/AAAAAAAAAA8/bMjrpa4-_sQ/s320/Mondrian_web.JPG" border="0" /&gt;&lt;/a&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-2050964361196141970?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/2050964361196141970/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=2050964361196141970' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2050964361196141970'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/2050964361196141970'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/01/mondrian-jrubik-good-team.html' title='Mondrian + JRubik : a good team'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://2.bp.blogspot.com/_hTlcWbt-BP4/RaUEAUlxr-I/AAAAAAAAACk/PS45pBVhgBo/s72-c/JRubik.JPG' height='72' width='72'/><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-7720089383325627742</id><published>2007-01-07T02:59:00.000-08:00</published><updated>2007-01-07T12:08:48.458-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Kettle'/><title type='text'>Kettle / PDI new version to be released !</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Here is link to a flash animation showing one of the finest Kettle / PDI new funtionnality, soon to be released (2.4).&lt;br /&gt;You will have the ability to load, execute, monitor transformations AND jobs within Spoon !&lt;br /&gt;&lt;br /&gt;Flash demo is here, from Matt Casters' blog : &lt;/span&gt;&lt;a href="http://www.kettle.be/swf/new%20240%20trans%20and%20jobs.htm"&gt;&lt;b&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Flash new feature&lt;/span&gt;&lt;/b&gt;&lt;/a&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;br /&gt;Matt Casters's blog on data integration : &lt;/span&gt;&lt;a href="http://www.ibridge.be"&gt;&lt;b&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;http://www.ibridge.be&lt;/span&gt;&lt;/a&gt;&lt;/b&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;br /&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;More on new Kettle / PDI 2.4 release : &lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;- remote transformation execution and monitoring,&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;- cluster environment for transformation run.&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;Kettle / PDI now allows to "scale beyond single server processing and into the massive parallel world" (cf Pentaho).&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;&lt;/span&gt;&lt;br /&gt;&lt;span style="font-family:Arial;font-size:85%;"&gt;Release date is expected to be during the second half of January.&lt;/span&gt;&lt;span style="font-size:0;"&gt; &lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-7720089383325627742?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/7720089383325627742/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=7720089383325627742' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7720089383325627742'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/7720089383325627742'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/01/kettle-pdi-new-version-to-be-released.html' title='Kettle / PDI new version to be released !'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-1617710662356917059</id><published>2007-01-02T08:46:00.000-08:00</published><updated>2007-01-02T12:41:07.234-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Mondrian'/><title type='text'>A cool feature of Mondrian Schema</title><content type='html'>&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;br /&gt;&lt;br /&gt;Today I tried to write SQL embedded in a Mondrian schema. This feature is really handy when it comes to design very small dimensions on the fly, without creating any object in the DB.&lt;br /&gt;Use it for small dimensions, with few distinct values.&lt;br /&gt;&lt;/span&gt;&lt;div&gt;&lt;span style="font-family:arial;"&gt;&lt;span style="font-size:85%;"&gt;&lt;/span&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Here is the XML code I used today : &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;a href="http://3.bp.blogspot.com/_hTlcWbt-BP4/RZqTkq6QE-I/AAAAAAAAAAg/sSaaohIWWxI/s1600-h/MondrianSchema+sql.JPG"&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;img id="BLOGGER_PHOTO_ID_5015483393233654754" style="WIDTH: 350px; CURSOR: hand; HEIGHT: 95px" height="103" alt="" src="http://3.bp.blogspot.com/_hTlcWbt-BP4/RZqTkq6QE-I/AAAAAAAAAAg/sSaaohIWWxI/s320/MondrianSchema%2Bsql.JPG" width="345" border="0" /&gt;&lt;/span&gt;&lt;/a&gt;&lt;br /&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Then you can access your dimension with simple MDX statements : &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;[FAMILLE DE SERVICE].[Tous Services], &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;[FAMILLE DE SERVICE].[Tous Services].[Bar], &lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;[FAMILLE DE SERVICE].[Tous Services].[Buanderie] ...&lt;br /&gt;&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Very nice.&lt;/span&gt;&lt;/div&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/1890171231785089767-1617710662356917059?l=open-bi.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://open-bi.blogspot.com/feeds/1617710662356917059/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=1890171231785089767&amp;postID=1617710662356917059' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1617710662356917059'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/1890171231785089767/posts/default/1617710662356917059'/><link rel='alternate' type='text/html' href='http://open-bi.blogspot.com/2007/01/cool-feature-of-mondrian-schema.html' title='A cool feature of Mondrian Schema'/><author><name>Vincent Teyssier</name><uri>http://www.blogger.com/profile/16528540800692703553</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='21' height='32' src='http://4.bp.blogspot.com/-_xemsn06UxQ/TiWcnZ-nIxI/AAAAAAAAA0E/IC9ep3Em8qQ/s220/Me_color_small.jpg'/></author><media:thumbnail xmlns:media='http://search.yahoo.com/mrss/' url='http://3.bp.blogspot.com/_hTlcWbt-BP4/RZqTkq6QE-I/AAAAAAAAAAg/sSaaohIWWxI/s72-c/MondrianSchema%2Bsql.JPG' height='72' width='72'/><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-1890171231785089767.post-9035310418343468937</id><published>2007-01-01T09:42:00.000-08:00</published><updated>2007-01-01T10:15:20.317-08:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Data Management'/><title type='text'>Free Data Generator</title><content type='html'>&lt;a href="http://2.bp.blogspot.com/_hTlcWbt-BP4/RZlLNq6QE8I/AAAAAAAAAAM/Cnhgj4PJb7U/s1600-h/datagenerator.JPG"&gt;&lt;img id="BLOGGER_PHOTO_ID_5015122358282752962" style="FLOAT: right; MARGIN: 0px 0px 10px 10px; CURSOR: hand" alt="" src="http://2.bp.blogspot.com/_hTlcWbt-BP4/RZlLNq6QE8I/AAAAAAAAAAM/Cnhgj4PJb7U/s200/datagenerator.JPG" border="0" /&gt;&lt;/a&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Hi all,&lt;/span&gt;&lt;/div&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;We sometimes need real data to test an ETL process, a DB procedure, a special report or whatever involving data management.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;Here is a nice and free data generator coded by Benjamin Keen, a talented coder from Vancouver.&lt;/span&gt;&lt;/div&gt;&lt;br /&gt;&lt;div&gt;&lt;span style="font-family:arial;font-size:85%;"&gt;This tools uses MySQ
