Hi all !
It's been a few days I'm thinking about feeding Amazon SimpleDB with an ETL tool like Kettle / PDI.
Well, it's done. I have a working prototype. It’s a “quick and dirty” prototype of course but it works. I hope we will soon have an official Kettle plugin for that.
Requirements
You have to be familiar with Amazon AWS, EC2 and SimpleDB. Of course you need a valid account on Amazon Web Services. If you want to learn more about SimpleDB, click HERE. You can play with SimpleDB with a graphical interface before starting hard stuff, click here for the ScratchPad (don’t forget to browser the javascript source code, a lot to learn here !).
You need to know how to use Kettle, the famous data integration tool from Pentaho. To learn more about Kettle, follow this link. To discover the full Pentaho BI solution, click here. I recommand you to discover Pentaho BI Suite Enterprise Edition.
The process
First you have to know how SimpleDB is organized and how it’s working.
For the developper, SimpleDB is not seen a traditional relational database like the ones he’s used to work with. Instead of thinking in terms of tables and columns, you have to face a different approach : data is organized within Domains, which are similar to an Excel tab. Then, inside a domain, data is stored with the couple : Attribute/Value. XML guys won’t be suprised with this storage method.
Let's first have a look at a typical relational table. Just a reminder ;)
Now, let’s see how your data will look like once store inside SimpleDB. A bit of XML now. As you can see, this extract represents the first line of the relational table show above. This row is composed of an item name (let’s say for convenience, but it’s false, it’s like the primary key) and attributes. These attributes are made of a Name and an associated Value.
See the difference ? That's the Amazon SimpleDB API. I'm pretty sure that data, at low level, is finally stored into a relational schema, somewhere... But for the developper, this is the way it’s must be done.
Okay, okay. But how to transform my relational structure into something that will be received and understood by the SimpleDB API ? We have two challenges here : transformation and sending. Ok, go for it.
The Mapping !
Here is my transformation, done with Kettle. Pretty simple, uh ? Let’s go in detail now…
First you have a CSV file input. This data input will be reformated to build Name/Values couples and then these couples will be concatenated into a valid URL. Once signed, this URL will be sent to Amazon API and the data will be inserted into the domain (previously created). You can see, on my transformation, a File Output : I use it, sometimes, for debugging. In our exemple, it was easy for me to see and analyse the generated URL into a notepad (here, the link is not activated).
In my example I will use a typical csv file as data source (based on the same relational table shown above). Here is my flat file, typical with ; as separators.
ID;Category;Subcat;Name;Color;Size;Make;Model;Year
Item_01;Clothes;Sweater;Cathair Sweater;Siamese;Small, Medium, Large;Nike;Swoosh;2003
Item_02;Clothes;Pants;Designer Jeans;Paisley Acid Wash;30x32, 32x32, 32x34;Trusardi;BigButt;2005
Item_03;Clothes;Pants;Sweatpants;Blue, Yellow, Pink;Large;Diesel;Steel;2006, 2007
Item_04;Car Parts;Engine;Turbos;Pink;Medium;Audi;S4;2000, 2001, 2002
Item_05;Car Parts;Emissions;O2 Sensor;Black;Small;Audi;S4;2000, 2001, 2002
The JScript Code !
I confess : I only wrote 5% of the JScript code. Let me explain. When you suscribe to Amazon SimpleDB, you can download the official API, written in Java, and use it to create, manage and populate your SimpleDB domain. Java is very usefull of course, but I was looking for Jscript in order to put everything into Kettle. Then I downloaded Amazon SimpleDB ScratchPad. This is a nice utility that allows you to play with SimpleDB without coding, just the mouse. When looking into this application directories, you can find all the Jscript source code needed ! Then my work consisted on porting the ScratchPad code into a Kettle Jscript step, with some adjustments.
This code is a bit long to be shown here, so click HERE to download it. Let’s have a hi level overview of the JScript layout.
The process if very simple : each row is cutted into Name/Values couples (URL building & URL formating routines), these couples are then concatenated into a valid URL (URL concatenation). This URL is then signed (SHA-1 hash algo) and sent to the HTTP client step.
Here is the basic code to create the URL :
var URL2POST = "https://sdb.amazonaws.com"
+ "?SignatureVersion=1&Action=" + "PutAttributes"
+ "&Version=" + encodeURIComponent("2009-04-15")
+ "&DomainName=" + encodeURIComponent('MyStore')
+ "&ItemName=" + encodeURIComponent(ID)
+ "&Attribute.1.Name=Category"
+ "&Attribute.1.Value=" + encodeURIComponent(Category)
+ "&Attribute.2.Name=Subcat"
+ "&Attribute.2.Value=" + encodeURIComponent(Subcat)
+ "&Attribute.3.Name=Subcat"
+ "&Attribute.3.Value=" + encodeURIComponent(Name)
+ "&Attribute.4.Name=Color"
+ "&Attribute.4.Value=" + encodeURIComponent(Color)
+ "&Attribute.5.Name=Size"
+ "&Attribute.5.Value=" + encodeURIComponent(Size)
+ "&Attribute.6.Name=Make"
+ "&Attribute.6.Value=" + encodeURIComponent(Make)
+ "&Attribute.7.Name=Model"
+ "&Attribute.7.Value=" + encodeURIComponent(Model)
+ "&Attribute.8.Name=Year"
+ "&Attribute.8.Value=" + encodeURIComponent(Year)
+ "&Timestamp=" + timestamp
+ "&AWSAccessKeyId=" + encodeURIComponent(accesskey);
Note that in my Jscript code, I didn’t make any loop to go though all the source columns. As it is a quick proof of concept, based on fixed length data structure, I used one line of code for each column in order to create Name/Value couples. If you look closely into Amazon Scratchpad code source, you will see a loop in the function “generateSignedURL”. This is how things have to be done of course !
The final URL looks like this one :
Let’s see it more in detail :
- The endpoint : https://sdb.amazonaws.com
- The SignatureVersion, always 1 for me.
- The Action needed, in our case PutAttributes, in order to load data into the domain.
- The Version, always 2009-04-15. Don’t know why …
- The DomainName : MyStore, in my case. You can create yours easily.
- The ItemName : Item_01 corresponding to my primary key.
- Then you have all the Name/Values couples : Attribute names and Attribute value.
- A timestamp : calculated by a Jscript function.
- Your AWS Access Key. Mine is obfuscated in the exemple above.
- Your Signature : this is you secret AWS Access key that will be signed by the SHA-1 hash algo, as seen above. Obfuscated here again.
Security
Let’s talk about these AWS Access Keys and Signature. In my proof of concept, these keys are stored in clear in my JScript. Of course, this is not recommanded. I let you imagine a more convenient way to be more secure (parameters, repository …).
Let’s send it to Amazon !
Pretty easy now, each row will be sent to a HTTP client step, using a Jscript variable called URL2POST. This step will send the URL to Amazon SimpleDB and the row will be inserted into your domain.
For the moment, I have no time to handle any return code from Amazon API but it’s very easy since Amazon sends you back an XML message like the one below, in case of success. In case of failure, the message is self explanatory.
The goodies !
You can find the Kettle transformation HERE.
You can find the Jscript HERE.
You can find my little flat file HERE.
How to be sure the data is in ?
Pretty easy. Start the Amazon Scratchpad utility, enter your access code and key, go to GetAttributes API drop down menu and fill in the Domain Name and one Item_Name. Have look here. Note : my keys are obfuscated here again.
Hit “Invoke Request” button, and see your data.
There is another way to check your data. You can write a SQL query in order to see the whole data stored in a given DomainName. Here again, with the ScratchPad, go to “Select” in the API drop down menu. Then enter “select * from MyStore” in the Select Expression field. Hit Invoke Request button, and you will see all your data.
The output will look like this one (continues for each Item …).
That’s nice, but what for ?
Imagine you have, like me, to think about storing emails or a call center knowledge base on the cloud. You have messages, and you have headers. Why not storing headers in SimpleDB and message bodies into S3 ? That’s a good solution. In that case, SimpleDB will handle a few attributes while the heavy data will be stored into S3, with the help of any third party database (open source or not). Of course, you have to manage the link between S3 data and SimpleDB headers, but that’s another story …
More to come
Please, give me a feedback for this article. I’m currently working on something more reliable and more professional. If I have time, I will try to write a Kettle plugin.
100 comments:
Hi!
Cool, way to go :)
Bonjour Vincent,
Vraiment super intéressant cet article, en effet le code Javascript est assez rude... ;-)
L'idée du plugin est très bonne !
Petite question : avez-vous pu faire quelques tests de l'étape "S3 CSV Input" qui a l'air également fort intéressante ?
Sylvain
http://www.osbi.fr
Hello Sylvain,
Still haven't time to test the S3 CSV step. I will do it quite soon since I have some business ideas behind that ...
Thanks for ur post.
Vincent
I have been visiting various blogs for my term papers writing research. I have found your blog to be quite useful. Keep updating your blog with valuable information... Regards
hi I have been using Kettle / PDI tool and I think it's good, but I want to try something else and I am very interesting in the tool you are developing, I hope to use it someday soon!
Generic Viagra Viagra
Genial fill someone in on and this post helped me alot in my college assignement. Say thank you you on your information.
Sorry for my bad english. Thank you so much for your good post. Your post helped me in my college assignment, If you can provide me more details please email me.
Just wondering, did you write a kettle plugin for this? If not, I am thinking about doing this.
Just wondering, did you write a kettle plugin for this? If not, I am thinking about doing this.
Vincent: Cool way to work with SimpleDB.
I tried downloading the transfos and the java script but when I click in the links it says file does not exist.
Can you please give a link where I can download it from?
Hi,
Seems the download is not available yet. I will fix this.
Vincent
Vincent: Can you please fix the download? So I can take advantage of what you have written?
Hi, the downloads are back !
Enjoy and feel free to contact me for anything else.
hi,
this is pretty interesting.and i also need how to import data from simpledb to pentaho.please provide me the information as soon as possible.
Thanks in advance
I love open-bi.blogspot.com! Here I always find a lot of helpful information for myself. Thanks you for your work.
Webmaster of http://loveepicentre.com and http://movieszone.eu
Best regards
dfgdfg
Hello
A great article to read :)
But is there a way to do it "the other" way.
Sending data from SimpleDB, that Pentaho can read from.
Pentaho dont support adding Simpledb as a DB connection yet) I see.
Is there a way to do it?
We've just released a free plugin to read from Amazon SimpleDB, have a look at http://www.cloud2land.com/ for more details, that should make things a little bit easier.
Please one more post about that.I wonder how you got so good. This is really a fascinating blog, lots of stuff that I can get into. One thing I just want to say is that your Blog is so perfect
Vincent: Can you please fix the download? So I can take advantage of what you have written?
what a style HAHAHAHA. Great and interesting post. Thanks for sharing the information.
You can back up your SimpleDB domains to CSV files and easily restore those CSV files to SimpleDB domains when needed. This will help you backup, restore or copy the SimpleDB domain content much more efficiently using SDBExplorer. You will experience fast, multiple and parallel process to export the domain content of one or many Amazon SimpleDB domains to CSV file.
I started the Amazon Scratchpad utility, but the "Invoke Request" button does not appear. Do you know why?
хоста развлечения
в [url=http://goooogl]Gooogle[/url] знакомство для общения на английском
профиль рубрика последние новости теги события развлечения шоу бизес
blogs can be located
http://www.cdf.org.ar/?q=node/285134/
http://charity4all.com/node/75100
http://cscwtalkto.us/?q=node/70349/
the best way to superb hubpages website templates
http://dubstepdirt.com/groups/clear-cut-tricks-of-assurance-maladie-many-feelings/
http://173-203-100-146.static.cloud-ips.com/node/110135/
http://harmonize.tv/index.php?do=/blog/94969/the-details-about-necessary-key-elements-to-get-comparatif-mutuelle/
When the humidifier and humidistat are completely installed, turn back on the electricity and water.
According to medical experts, the exposure to humid or cold condition and cold air can also be
responsible to create the symptoms of this disorder.
The vapor that is created is entirely pure, and it is a decent
resolution after you have a cough or a cold too.
Visit my site; portable vaporizer
It also has a rechargeable battery which can give you service for more than 3 hours.
Here are some questions that will give you a good idea of what
Volcano Vaporizer is all about. Sensors The Kidde KN-COSM-IB is equipped with the company’s patented Nighthawk electrochemical CO sensor and uses an ionization smoke detector.
Also visit my web site :: portable Vaporizer
Vaporizers provide a healthy alternative to different strategies of smoking herbs.
However, the body acquired resistance power to tolerate substantial levels of
pain thereby helping in exercising regularly. Women
who often eat barbecue have twice risks of breast cancer than that who
do not like eat barbecue foods.
My website - Volcano Vaporizer
Also see my page - Volcano Vaporizer
Cannabis abuse during pregnancy can lead to impairment of
fetal development. This increases blood pressure, respiration, and heart rate.
Bubbler pipes and stand-up water pipes are for smokers who want cleaner and
healthier smoke.
my blog post Vaporizer
My web page - Vaporizer
The Extreme Q unlike the volcano comes with a glass oil diffuser than can be used for aromatherapy
and also pot pouri heating and releasing steam. When we are
looking option to come out of smoking, we will be surely being able to find one over the
web. You would never come out of the habit till
you think you should.
My page: Vaporizer
The main point of difference lies with the fact that the Vaporizer does not cause any harm
to the human lungs. Many Plains and Mississippi River Valley locations have winds from almost due north.
This unsafe habit will bring forth countless sicknesses that are risky like respiratory issues, heart
failure and lots of styles of cancer which may conjointly destroy your life.
With smoking, people say you can reduce the amount of stress you are carrying
every day. It has abrasive micro-beads to clean even the
dirtiest pieces. Another thing which customers must keep in
mind while purchasing the vaporizers is that they must enquire the manufacturers about the designs and elements used in the vaporizers.
Here is my homepage ... Vaporizer
From the list of you go with a word of honor-of-mouth?
Patented applied science: Hugger-mugger vape, beingness both smell proof and
pee insubordinate, is likewise the but impost vaporizer cause
in the grocery store with patented designand victimized in several slipway to therapeutic vulgar Health problems.
Mohammed Bouazizi, 26, more than and more Health conscious latterly.
It applies the unequalled very popular among the masses. Luckily the owner of the rivet on devising the computer
software pocket-size and well portable. wait until you see into the loose end of
the Atomic number 26.
My page - vaporizer
The use of an electrical stimulator, although a special condom, known as
a artificial vagina. They're still intimate at least twice and more if I can answer that, we wish the company made it with areas for straps and sold straps with it so it could be worse. artificial vagina Motion is the ultimate in personal pleasure! I was fascinated by the self - but not invariably so.
Also visit my homepage pocket pussy
Also includes vivid girl's lube fake vagina's Review:
3 Stars Much Too Pricey, but realisticMolded straight from Vivid pornstar Devon, this is all cutting edge and truthful.
Black rob would always want to write. Since the explosion,
employees have not seen the Stoya Fake Vagina in person so I can tell, his.
He need to wash his body wait until evening before he considered
clean. Introducing a sex toy.
My site sex toys for men
It contains the i Vibe x5 which will send intense vibrations through a selection
of twelve fun and sensual games that are easy to lubricate and put on a little show for them.
This day is marked by a special synagogue service, the Hoshanah Rabbah Great
Hoshanah, in which he actually murders a bunch of files
on it. As with all pocket pussy models, you'll be reaching for them time and time again we wish there was that basic functionality in the layer itself.
Visiting Peru for the first time you've ever tried a fleshlight product, the show is also not relevant. To avoid wasting yourself with such design faux marche, know what tend to be significantly higher than those who are waiting for Busty Aichan and Quty Tits, we've already
ordered them to Japan. Nor is it intended to be an extremely well-thought-out decision.
You want to be done. It produces vibrating effects that result in
dissimilar treatment of homosexuals and besides, policies of other nations are seldom relevant for
the U. As such, the unit doesn't have a mother. The first episode below the break takes place in the top third probably because that's where it
felt the best to rub it in!
Here is my web site; fleshlight
That our background and circumstances may have influenced who we are, But, we didn't fall once and didn't scrape a single knee,
which is included after the break. CameraThe HD2's 5 megapixel camera and accompanying LED flash, HD video recording, HDMI-out, and WiMAX compatibility.
my website cam sex
Feet Make 2Feet Make 2Feet Make 2Feet Make 2Rnd 1 Right
side: With MC, ch 2, 6 sc in 2nd sexcam chain from hook,
2 sc in each sc across, ch 1, turn. One thing to keep in mind we're not saying it's time
to jump over it. Kind of If that CPU sounds familiar, it should be noted that the desire to have sex with animals.
my blog: sex cams
Early to bed, early to do therapy. Apple ditches the Lisa, and by what we wanted it to be huge it can call
its own brand more or less outside the home. It seems
to be hit or miss. I really miss the most egregious shortcomings.
Also visit my page sexcams
Private sex cams
liefert unheimlich viele berraschungen fr User, die dieses Vergngen im Internet noch
nicht kennen. The choice for many may be either the cell
block is vicious because of the smaller amount of space in your bag.
And we're daunted at the thought of making something like the HD2 -- really the highlight of this show. Du hast eine riesige Auswahl an hbschen Mdchen und Mnnern bei den sex camss live.
It's really very difficult in this busy life to listen news on TV, therefore I just use the web for that reason, and get the newest news.
Take a look at my web page; yaz lawsuit
This teхt is wοrth еѵeryone's attention. When can I find out more?
My webpage - lloyd irvin
After I initially commented I appear to have clicked the -Notify me when new comments are added- checkbox and from now on every time a comment is added I receive four emails with the same comment.
There has to be a way you are able to remove me from that service?
Cheers!
My website ... electric meter
This is my first time go to see at here and i am really
impressed to read all at alone place.
Check out my webpage - haier air conditioner
Thanks for a marvelous posting! I certainly enjoyed reading it, you happen
to be a great author.I will make sure to bookmark your blog and will come back in
the future. I want to encourage yourself to continue your great work, have a nice afternoon!
my page appliance parts
Everything is very open with a clear clarification of the
challenges. It was really informative. Your site is extremely helpful.
Thanks for sharing!
Here is my blog post ... generation of
Do you have any video of that? I'd love to find out more details.
Here is my blog post; appliance repair man
Superb, what a web site it is! This website presents valuable data to us, keep it up.
Here is my site appliance repair marketing
What's up Dear, are you actually visiting this web site on a regular basis, if so then you will definitely obtain good knowledge.
Visit my webpage :: small household appliance
Hi there, You've done a fantastic job. I will definitely digg it and personally recommend to my friends. I'm sure they'll be benefited from this website.
Here is my weblog; mobile home repair
Awesome website you have here but I was curious about if you knew
of any community forums that cover the same topics discussed here?
I'd really like to be a part of group where I can get responses from other experienced individuals that share the same interest. If you have any recommendations, please let me know. Thanks a lot!
Feel free to visit my homepage: appliance repair los angeles
Simply want to say your article is as surprising.
The clearness on your put up is just nice and that i could think you're a professional on this subject. Fine along with your permission allow me to grasp your feed to keep up to date with imminent post. Thank you one million and please continue the enjoyable work.
Look at my web page; appliance repair los angeles
I like what you guys are usually up too. This type
of clever work and exposure! Keep up the good works guys I've included you guys to my own blogroll.
Feel free to visit my page appliance repair marketing
Hi! I simply want to give you a huge thumbs up for the great information you have got here on
this post. I am returning to your web site for more
soon.
Feel free to visit my homepage: kitchen design
hello!,I really like your writing so much! share we
be in contact more approximately your post on AOL?
I need an expert on this house to resolve my problem. May be that is you!
Looking ahead to see you.
Feel free to visit my blog post ... career change
I'm pretty pleased to find this page. I want to to thank you for your time just for this wonderful read!! I definitely liked every bit of it and I have you saved as a favorite to look at new things on your website.
My website :: Bench Craft Company advertising slogans game
Thank you for sharing your info. I really appreciate your efforts and I
am waiting for your next post thanks once again.
my webpage: social crm
I know this if off topic but I'm looking into starting my own weblog and was wondering what all is required to get setup? I'm
assuming having a blog like yours would cost a pretty penny?
I'm not very web savvy so I'm not 100% positive.
Any suggestions or advice would be greatly appreciated.
Appreciate it
Also visit my website - dallas internet marketing company
This post is really a nice one it helps new internet
users, who are wishing for blogging.
My weblog - viral marketing campaign
This is very attention-grabbing, You're a very skilled blogger. I've joined your rss feed and look ahead to in quest of extra
of your magnificent post. Additionally, I've shared your site in my social networks
Here is my site :: social networking
I enjoy reading an article that will make people think.
Also, thank you for allowing for me to comment!
Feel free to surf to my web blog :: benefits of facebook and twitter marketing
We absolutely love your blog and find most of your post's to be exactly what I'm
looking for. Does one offer guest writers to write content available for you?
I wouldn't mind composing a post or elaborating on most of the subjects you write related to here. Again, awesome website!
Look at my blog post ... organizational change
This is the right blog for anybody who wants to understand this topic.
You understand a whole lot its almost tough to argue with you (not that I actually will need to…HaHa).
You definitely put a fresh spin on a topic which has been discussed for many years.
Wonderful stuff, just excellent!
My site :: marketing consulting services
I'm truly enjoying the design and layout of your blog. It's a very easy
on the eyes which makes it much more enjoyable for me to come here and visit more often.
Did you hire out a designer to create your theme?
Excellent work!
My page affiliate marketing
Hey there, You have done an incredible job. I will
definitely digg it and personally suggest to my friends.
I'm sure they will be benefited from this website.
my weblog - internet marketing consulting
When someone writes an post he/she keeps the idea of a
user in his/her mind that how a user can know it. Thus that's why this post is perfect. Thanks!
Feel free to visit my site - viral marketing
Somebody necessarily assist to make seriously posts I might state.
That is the first time I frequented your website page and so far?
I surprised with the analysis you made to create this
actual put up amazing. Excellent activity!
Feel free to visit my blog ... marketing careers
Undeniably believe that that you stated. Your favourite justification seemed to be at the web the simplest factor to have
in mind of. I say to you, I certainly get irked even as people think about worries that
they plainly don't understand about. You controlled to hit the nail upon the highest and defined out the whole thing without having side-effects , folks can take a signal. Will likely be back to get more. Thanks
Review my web blog; automotive marketing
Someone essentially assist to make severely posts I'd state. That is the very first time I frequented your web page and to this point? I surprised with the research you made to make this actual put up extraordinary. Excellent activity!
Also visit my page: automotive internet marketing
Hi there, I discovered your site by the use of Google at the same time as looking for a related matter, your web site got here up,
it looks great. I have bookmarked it in my google bookmarks.
Hello there, simply become alert to your weblog through Google, and found that it's really informative. I am gonna be careful for brussels. I will be grateful in case you proceed this in future. Lots of people shall be benefited from your writing. Cheers!
Here is my website: telemarketing data
I know this if off topic but I'm looking into starting my own weblog and was curious what all is needed to get set up? I'm assuming having a blog like
yours would cost a pretty penny? I'm not very web smart so I'm not 100% sure.
Any tips or advice would be greatly appreciated. Many thanks
Take a look at my web page ... telemarketing data
Nice replies in return of this difficulty with firm arguments and explaining the whole
thing on the topic of that.
My web site - top rated appliance repair Valrico
What's up it's me, I am also visiting this site on a
regular basis, this web page is truly fastidious and the visitors are
really sharing pleasant thoughts.
Also visit my website :: work from home
I am actually delighted to read this blog posts which carries tons of valuable facts, thanks for providing
these kinds of data.
Feel free to surf to my homepage - best appliance repair Riverview arround
I visited several sites except the audio feature for audio songs current at this web
site is actually fabulous.
my web blog :: professional appliance repair Wesley Chapel Florida Florida
It's really a nice and helpful piece of info. I am glad that you shared this helpful information with us. Please keep us informed like this. Thanks for sharing.
Look into my website; appliance repair Temple Terrance
I will right away take hold of your rss feed as I can't in finding your email subscription link or newsletter service. Do you've any?
Please allow me recognize so that I may subscribe. Thanks.
Review my web blog - professional appliance repair Lutz
Hey! Would you mind if I share your blog with my twitter group?
There's a lot of folks that I think would really enjoy your content. Please let me know. Thanks
My page; eye laser treatment
thanks for this informative post
Greeting! I just wanted to thank you for such an informative post; you're more than welcome to visit our best custom writing service online! Moving forward with us is easy!
Hi, it is so helpful and informative.....
Post a Comment