I recently had to manage all my backups from Postgresql instances on the cloud. For all these cloud-based backup jobs, I’m using a three way approach. Maybe you think it’s a bit redundant, but I don’t like surprises.
- Database dump to S3 : daily, depends on criticity …
- EC2 instance snapshot : daily or weekly, depends … I like to know I can keep several generation of instance image, even if I do not change them quite often now …
- EBS volume snapshot : hourly or daily, depends on criticity,
For the moment let’s go back to db dump to s3. My process is a simple shell script that will :
- Connect to Postgresql,
- Start a database dump (database name is passed as script argument),
- Split the dump files into several pieces (chunk size is passed as script argument),
- Send the dump files into S3.
Tools of the trade : pgdump and S3CMD
I’m using a wonderful tool called s3cmd, you can find it here. This tool allows you to send, retrieve and manage data on Amazon S3. It also offers bucket management, GPG encryption and https transfer. A very good command line tool that will find its place in your EC2 toolbox.
Installing s3cmd is quite easy and, when using s3cmd –config you will be prompted with your S3 account credentials and other params. You can later retrieve this data into the file called .s3cmd.
The commands are quite simple :
- Send file to S3 : s3cmd put file_to_send your_s3_endpoint
- List buckets : s3cmd ls
- List content of a bucket : s3cmd ls “Bucket_name” : Bucket_name will have the form : s3://bucket_name
- Retrieve a file from a bucket : s3cmd get s3://“Bucket_name”/”file_name”.
Pg_dump is the regular backup client application for Posgresql database. Its syntax, simplified here, is : pg_dump “dbname” > outfile. You can learn more about pg_dump here. For this current example, I will use the following command :
- pg_dump db_name | gzip | split -b 50m – dump_file
- pg_dump : the command,
- db_name : the db name you want to dump,
- gzip : create gzipped archives,
- split : split the dump into several pieces,
- -b 50m : split size = 50 Mo
- dump_file : the dump file (the split option will create a collection with extensions like gzaa, gzab …).
- Add –v if you want a verbose log.
Process prerequisites
First, I created an internal account for my Postgresql database. This account has only rights to connect from the database server itself (or localhost). All passwords were disabled for this account, only “trust” is present in pg_hba.conf. Select rights were also granted.Then, I created a dedicated directory on the database server to hold the dump files. This directory will only hold the dump files for very limited time, they will be deleted after been pushed on S3 with s3cmd. I recommend to store these temp dump files on a separate and dedicated EBS volume.
Last, we need a minimum setup on S3 side. Create the appropriate bucket hierarchy you need. Mine is quite simple for this backup process :
- 1st level : /postgresql-dumps
- 2nd level : product name : “pilotroi”, the product name of my company (we have several products)
- 2nd level : /database name, one bucket by database. This is the second parameter I will pass to the shell script. For flexibility, I use the exact same name as for the postgres database I wan to dump. This allows me to build my bucket endpoint like :
- s3://postgresql-dumps/pilotroi/$BASE/"
- 2nd level : /database name, one bucket by database. This is the second parameter I will pass to the shell script. For flexibility, I use the exact same name as for the postgres database I wan to dump. This allows me to build my bucket endpoint like :
- 2nd level : product name : “pilotroi”, the product name of my company (we have several products)
Custom script
Below is my custom script. A lot of “echo”, hum ? As I said, I don’t like surprises and prefer having verbose logfiles. This script is cronted and needs two parameters :- Database name : the name of the db you need to dump.
- Splits : the size of the chunks you want to generate. For instance, 50m is for generating files having a size of 50 Mo max.
#!/bin/bash
##################################
# #
# POSTGRES DUMPS FOR PILOTROI #
# Vincent Teyssier #
# 19/11/2010 #
# #
##################################
echo "******************************************************"
echo "Database dump is starting at : " `date`
echo "******************************************************"
PG_DUMP="/usr/lib/postgresql/8.4/bin/pg_dump"
S3PUT="/mnt/postgres/s3tools/s3cmd-1.0.0-rc1/s3cmd"
BASE=$1
SPLITS=$2
DUMP_FILE="/mnt/postgres/dumps/$BASE.gz"
S3ENDPOINT="s3://postgresql-dumps/pilotroi/$BASE/"
echo "*****************************"
echo "Parameters : "
echo "Base : " $BASE
echo "Splits : " $SPLITS
echo "S3 Endpoint : " $S3ENDPOINT
echo "*****************************"
echo "Dump started at " `date`
su - postgres -c "$PG_DUMP $BASE | gzip | split -b $SPLITS - $DUMP_FILE"
echo "Dump ended at " `date`
echo "*****************************"
echo "Send to S3 started at " `date`
$S3PUT put $DUMP_FILE* $S3ENDPOINT
echo "Send ended at " `date`
echo "*****************************"
echo "Deleting local dump files"
rm /mnt/postgres/dumps/$BASE.gz*
echo "******************************************************"
echo "Database dump is finished at : " `date`
echo "******************************************************"
Process output
This script file will produce the following output.
Data is now on S3
Using S3 Explorer Firefox plugin, you can see all our files are now stored in their dedicated bucket on S3.
As usual, drop me a message to learn more if needed.
5 comments:
О! Este es un maravilloso recurso útil que usted está proporcionando y que le dan gratis ausente en el precio. Me encanta ver los sitios web, que comprenden el valor de proporcionar un recurso de calidad útiles de forma gratuita. Se? S el viejo lo que va, vuelve el programa.
Je veux juste dire que c'est un super blog vous êtes arrivé ici! J'ai fait le tour de pas mal de temps, mais a finalement décidé de montrer mon appréciation de votre travail! Thumbs up, et continuer!
Hey, I attempted to email you pertaining to this post but aren?t able to reach you. Please e-mail me when get a moment. Thanks..
Thank you, nice job! This was the stuff I had to have.
Very good post. I am going through a few of these issues as well..
Post a Comment