I’m terribly late with this article, initially scheduled for January 2011 … sorry. Maybe it is a bit outdated now, anyway, I publish it …
Let’s talk about EC2 cloud computing, Talend, Postgresql and JasperServer. Basic setup.
You already know all the pros and cons with cloud computing, I won’t talk about that. As to me, I love cloud computing and use it everyday, because of these particular advantages :
- Scalablity : scale up or down any instance, according to your needs,
- Flexibility : create your own instances, boot them, create quick sandboxes, replicate data …
- Pay per use : you pay for what you use (cpu, storage, security …),
- Opex, no capex !
First issue
Let’s imagine we have a single server, hosting Postgresql. No big deal with that as long as we use this instance in a simple way : I can start my instance, host data on a persistent EBS, connect to it and stop it whenever I want. By using elastic IPs, I can assign a “fixed” IP address to this server and can easily set up a connection string. Note on 16/12/2010 : Amazon is now offering a DNS service.Now let’s imagine we need a typical BI architecture (tiers) : one ETL (Talend or Pentaho of course !), a Postgresql database in the middle and Jaspersoft for reporting.
That’s a bit more complex because we need our Postgresql server to allow connections from the ETL and from the reporting tool. On top of that, we want to fully leverage all cloud computing features : stop the servers when they are not used, boot them when the service is needed, maybe change their network properties ... eventually we want this to be fully automated and working without any human actions like changing the connection strings, starting/stopping the servers …
Let’s have a look to a little schema now. As you can see, we have now our architecture up and running. We are also using elastic IPs for each server, which is mandatory for the following demonstration. IPs are fake.
How to read Public DNS, Private DNS and Elastic IPs on AWS EC2 ?
Imagine we have an instance running. This instance has an Elastic IP which is 46.52.186.25 and the private IP address is 11.235.33.6.
The Private DNS name is : ip-11-235-33-6.eu-west-1.compute.internal
The Public DNS name is : ec2-46-52-186-25.eu-west-1.compute.amazonaws.com
You see the relationship ?
Ok, now, how do you think we will configure Postgresql server to allow connexions from the ETL server and from the Reporting server ? Easy, here is one answer :
- By making the ETL Server and the reporting server point to Postgresql. For that, we will use this nice little Elastic IP we previously set up for Postgresql server because it’s soooo easy to do that way …
- By writing the ETL server Elastic IP and reporting server Elastic IP into Postgresql pg_hba.conf of course … because here again it is soooo easy natural to do so.
- Don’t forget to open the corresponding ports in your security groups (see picture above).
Jasper server connexion screen : Postgresql database <===> Jasperserver
Talend client connexion screen : your client <===> Talend server
Talend server connexion screen : Talend server <===> Postgresql database
And then we write down the Elastic IPs into the pg_hba file like this, in order to allow Talend server and JasperServer to connect to the postgresql database. This is a basic pg_hba.conf, I encourage you to add stronger authentication.
We are done. Don’t forget to adjust the security groups like this :
- Talend Server : allow 8080, allow 22
- Postgresql Server : allow 5432, allow 22
- Jasperserver : allow 80 (or 443 if https), allow 22
But wait … that’s not the good way to do ! By using the elastic IPs to set up communication between each server/node, we just created a weird monster that makes the traffic going OUT of the cloud and going BACK INTO the cloud. Don’t forget you are paying for that. Look at this schema.
First solution
The best practice is to avoid using elastic IPs in order to set up network traffic between servers that are hosted inside the EC2 cloud. Instead, use EC2 internal adresses.Ok, but … wait a minute.
- How do I do to retrieve the internal address from inside EC2 ?
For instance, if you query your ETL Server from your your Postgresql server, by using the famous host command, you will have :
You see what you have to do ? Replace all elastic IPs, except for your Talend client, by internal IPs. Like that, your internal data won’t leave the cloud, like below.
After using the internal addressing, the connexion screens will look like this :
Jasper server connexion screen : Postgresql database <===> Jasperserver
Talend server connexion screen : Talend server <===> Postgresql database
Second issue
Well, ok, we solved our first issue : using internal addresses between the ETL server and the Postgresql server. But, I can see two other issues :- Postgresql still does not accept DNS names in the pg_hba.conf ! Only IP addresses allowed. So We can’t ask Postgresql and pg_hba.conf to resolve the dns for us.
- What if I decide to reboot the ETL server, or the Reporting server ? These internal adresses are nice but they are changing each time I reboot / restart server in EC2. Then, how to keep my Postgreqsl pg_hba.conf updated with frequently changing adresses ?
Second solution
No, there is still no support for DNS entries in the pg_hba.conf. I know this is a long awaited feature, at least by me. But, unless I’m wrong (tell me), writing down a DNS name in pg_hba.conf won’t work and the server won’t start.We need to find a way to update the pg_hba.conf with the last / current ec2 internal addresses corresponding to the ETL server and the Reporting server. Easy, we will use a bit of shell code here. This script will retrieve the internal IP Address for each server (ETL and JasperServer) by using the command host and will update this address in the pg_hba.conf by using some sed or awk. Then, by using a sighup, Postgresql server will apply the new address configuration.
Nothing complex, but the success rely on a good timing.
Note here : I created an ORCHESTRATOR, a specialized instance in EC2, to monitor all my servers. This orchestrator will run this kind of script as soon as it detects any change in the internal addressing schema. This ORCHESTRATOR will be detailed in a future article (I made several public presentations, and a lot of people seem interested …).
And the shell script. This shell asks for the internal address, then updates the corresponding line. For that, you must maintain your file in a tidy way : labels are needed.
################################
# #
# IP adress lookup #
# #
################################
# POSTGRES (DATABASE) Server
# Public DNS : ec2-12-345-678-999.eu-west-1.compute.amazonaws.com
# TALEND (ETL) Server
ETL_SERVER=`host ec2-11-222-33-444.eu-west-1.compute.amazonaws.com | sed 's/.*has address //g'`
# JASPER (BI & reports) Server
JASPER_SERVER=`host ec2-22-33-444-555.eu-west-1.compute.amazonaws.com | sed 's/.*has address //g'`
# Echoing all
echo ""
echo "################## EC2 Addresses Update ######################"
echo "Will update EC2 Talend Server address with : " $ETL_SERVER
echo "Will update EC2 Jasper Server address with : " $JASPER_SERVER
echo ""
# Find and replace line Talend Server
TALEND_NB=`grep -n "Talend server connexion" /mnt/postgres/data/pg_hba.conf | cut -d":" -f1`
TALEND_NB=$((TALEND_NB+1))
sed -i "$TALEND_NB s%.*%host all all $ETL_SERVER/32 md5%" /mnt/postgres/data/pg_hba.conf
# Find and replace line Jasper Server
JASPER_NB=`grep -n "JasperServer connexion" /mnt/postgres/data/pg_hba.conf | cut -d":" -f1`
JASPER_NB=$((JASPER_NB+1))
sed -i "$JASPER_NB s%.*%host all all $JASPER_SERVER/32 md5%" /mnt/postgres/data/pg_hba.conf
The end
Having a small (or even big) BI architecture up and running into EC2 is not a big deal. Having it properly set – in order not to pay extra fees – is something different and need some basic thinking before doing. The addressing issue which is technically simple, can have negative impact on your project if you don’t manage it from the start.I will recommand any AWS / EC2 user (BI or not) to create their own admin tools and scripts, based on the various available APIs, in order to :
- reduce reaction time,
- be fully independent,
- spare time (graphical tools are nice but need clicks, clicks and clicks …)
29 comments:
Hi , I've read a few things on this site and I really do think that it has helped tremendously. There's still a heap I need to learn thus can continue learning and can keep coming back. Thanks.
Hello friends,
Amazon Route53 is a great way to manage the DNS entries of cloud services. DNS30 Professional Edition provides desktop tool for route53 services.It can be used to manage hosted zone.
http://www.dns30.com/
This is great stuff.... keep posting.. thanks a lot
Unquestionably believe that which you said. Your favorite justification seemed to be
on the internet the easiest thing to be aware of.
I say to you, I certainly get irked while people
think about worries that they plainly do not know
about. You managed to hit the nail upon the top and defined out the whole thing without
having side-effects , people can take a signal. Will
probably be back to get more. Thanks
my weblog bestcloudcomputingoffers.com
Feel free to surf my web-site ; netdepot Evaluations
Hi there, I wish for to subscribe for this website to obtain newest updates, thus where can i
do it please help out.
Here is my blog ... cheapwebhostingfirms.com
Hi there, I wish for to subscribe for this website to obtain newest updates, thus
where can i do it please help out.
Feel free to visit my site :: cheapwebhostingfirms.com
Also see my webpage :: Hosting Reviews
Hello this is kind of of off topic but I was wanting to know if blogs use WYSIWYG editors
or if you have to manually code with HTML. I'm starting a blog soon but have no coding skills so I wanted to get advice from someone with experience. Any help would be enormously appreciated!
Have a look at my web-site; iwebhostingreviews.Com
Howdy! I could have sworn I've visited this site before but after looking at many of the posts I realized it's new to me.
Nonetheless, I'm certainly delighted I discovered it and I'll be bookmarking it and checking back often!
My page ipage Reviews
Howdy! I know this is kind of off topic but I was wondering if you
knew where I could find a captcha plugin for my comment form?
I'm using the same blog platform as yours and I'm having trouble finding one?
Thanks a lot!
Have a look at my page - Fatcow Reviews
I have been surfing online more than three hours today, yet
I never found any interesting article like yours.
It is pretty worth enough for me. Personally, if all webmasters and bloggers made
good content as you did, the net will be much more useful than ever before.
My homepage; web hosting services dedicated server
Hi vinc,
Can you post us some nice things on hadoop.
Hamdi
Post a Comment