Monday, 8 March 2010

Sending tweets with Kettle !

Hi all !

Recently, while meeting clients, I was speaking a lot about pervasive BI, operational BI, real time BI …. well all these “new” tendancies about BI. On top of that, I’m currently delivering some mobile BI features for one of my clients, based on Cognos tools.

Then I thought to myself : “Why not sending *intelligent* tweets with an ETL tool ?”.

Let’s imagine we are a very successfull company and we want to tweet on how fantastic are our sales. Or imagine that, for any reason, you regularly gather some data and put it on twitter for public release (I’m doing this with some weather probe data from the family farm in the south, while I’m living in Paris …).

Ok, just ask and it’s done.

Sending tweets with Kettle

The process is the following :

  • Query a table to retrieve the nb of items sold and the total amount of sales. For simplicicty you can also generate rows in kettle.
  • Build the tweet and the command that will send it,
  • Send the tweet.

I will use one transformation and one job. The transformation will retrieve the data from the table, then send this data (1 row, 2 columns) to the job. This job will build the tweet and send it to twitter. Easy.

The transformation

First, a Table input. This step holds the SQL query to retrieve the data I need (1 row, 2 colunms) :

Select sum(sold_items), sum(amount) from sales_facts


Then, the Set Variables step will capture and map the query results with variables. These variables will be accessible into the job. Use the button Get Fields for simplicity.


The Job

The job has 3 steps : the usual Start step, the Transformation step linked to the transformation we just created above and a Shell step. Have a look.


Nothing special here, except I checked the option “Copy previous results to args” inside the transformation settings.

Then, we have the Shell step. This step will call a famous linux utility : cURL. cURL is a command line tool for transferring data with URL syntax, supporting FTP, FTPS, HTTP, HTTPS, SCP, SFTP, TFTP, TELNET, DICT, LDAP, LDAPS, FILE, IMAP, SMTP, POP3 and RTSP. curl supports SSL certificates, HTTP POST, HTTP PUT, FTP uploading, HTTP form based upload, proxies, cookies, user+password authentication (Basic, Digest, NTLM, Negotiate, kerberos...), file transfer resume, proxy tunneling.

I recommand using cURL under Linux. The windows release is a pain in the neck (installation issues, dll issues, issue with libeay32.dll, etc ….).

The command to send a tweet with cURL is quite easy : curl --basic --user yourtwitteraccount:yourtwitterpasswd --data status=”Your tweet” http://twitter.com/statuses/update.xml

You can build this command with the Shell step with the following configuration :


Some explanations here :

I’m not calling cURL but I use a wrapper script file. I had some issues calling directly the cURL executable with /usr/bin/curl. Using a wrapper script file seems more appropriate. Look at my file below. Parameter $2 has to be embrassed with double quotes ; usefull if you need to send tweets with more than one word ;)


As you can see, my curl wrapper file is waiting for 3 params, and these 3 params are passed in the Fields zone of the Shell window step. I obfuscated my Twitter account, in red. Here is the detail for these parameters :

  • $1 : The Twitter username and password with the syntax [username:password],
  • $2 : The tweet message, with double quotes,
  • $3 : The endpoint : the Twitter destination aka http://twitter.com/statuses/update.xml

In my exemple, I will send a tweet saying : “We sold [total of items sold] items for a total amount of [total amount of sales] USD.”

Sending the tweet !

Let’s save the work and start the job. You will see a verbose XML command output, this is good sign. Have a look below :


Is my tweet really published ?

Yes, of course it is ! Connect to your Twitter profile, and check for the tweet.


The goodies

The transformation can be downloaded HERE.

The job can be downloaded HERE.

Enjoy and find new usage for this !!


Dan said...

Can't you just connect directly to the twitter webservice from kettle and post it directly without using curl?

Vincent Teyssier said...

Really I don't know ...
I didn't want to get into Twitter API and I had cURL under my hand. I'm a lazy guy.I guess this is possible. Also some Twitter javascript APIs are available on the net and can do the job with minor changes. Example : http://www.geekdaily.net/2007/08/31/twitter-javascript-api-the-beginning/

Matt Casters said...

Kettle 4 will have the UDJC as well as the jtwitter library on board. Sending twitter messages from Kettle is very easy that way actually. At the same time I can't bring myself to write an actual "Twitter Output" step. Somehow that would seem like the wrong thing to do :-)

Rahul Trivedi said...

Nice Implementation, I also tried it on windows system But here it is not working. It give 401 'Bad authentication' error, where as when I run the cURL directly in cmd then it work.
does any body have any Idea why it is not working via Kettle step...
also tried a java script approach using UDJC step but the problem remains same.
My authentication tokens are OK for twitter, as I tested cURL made by these tokens manually in cmd, it is working as I want.
->I am using Twitter streaming API with Oauth 1.1 version, with both "GET" and "POST" method.

Please Please Help... Spent over a week on this now...
I have no way out of it now...
Please Help...

Thanks and Regards,
Rahul Trivedi