Open BI: Postgresql dumps and storage on S3

Hi all,
I recently had to manage all my backups from Postgresql instances on the cloud. For all these cloud-based backup jobs, I’m using a three way approach. Maybe you think it’s a bit redundant, but I don’t like surprises.

Database dump to S3 : daily, depends on criticity …
EC2 instance snapshot : daily or weekly, depends … I like to know I can keep several generation of instance image, even if I do not change them quite often now …
EBS volume snapshot : hourly or daily, depends on criticity,

Today, I want to share with you what I did as first backup process : database dump to S3. Of course, you need to have knowledge of Amazon S3 (as well as an account) for that. In the next coming articles, I will show you what I created in order to automate and manage all EC2 instances & volume snapshots.
For the moment let’s go back to db dump to s3. My process is a simple shell script that will :

Connect to Postgresql,
Start a database dump (database name is passed as script argument),
Split the dump files into several pieces (chunk size is passed as script argument),
Send the dump files into S3.

Note : your postgresql server (from which you want to dump) does not have to be hosted on AWS/EC2. This process also works with on premises architecture.

Tools of the trade : pgdump and S3CMD

I’m using a wonderful tool called s3cmd, you can find it here. This tool allows you to send, retrieve and manage data on Amazon S3. It also offers bucket management, GPG encryption and https transfer. A very good command line tool that will find its place in your EC2 toolbox.
Installing s3cmd is quite easy and, when using s3cmd –config you will be prompted with your S3 account credentials and other params. You can later retrieve this data into the file called .s3cmd.
The commands are quite simple :

Send file to S3 : s3cmd put file_to_send your_s3_endpoint
List buckets : s3cmd ls
List content of a bucket : s3cmd ls “Bucket_name” : Bucket_name will have the form : s3://bucket_name
Retrieve a file from a bucket : s3cmd get s3://“Bucket_name”/”file_name”.

As you can see, really easy usage and fits perfectly in a shell script.

Pg_dump is the regular backup client application for Posgresql database. Its syntax, simplified here, is : pg_dump “dbname” > outfile. You can learn more about pg_dump here. For this current example, I will use the following command :

pg_dump db_name | gzip | split -b 50m – dump_file
- pg_dump : the command,
- db_name : the db name you want to dump,
- gzip : create gzipped archives,
- split : split the dump into several pieces,
- -b 50m : split size = 50 Mo
- dump_file : the dump file (the split option will create a collection with extensions like gzaa, gzab …).
Add –v if you want a verbose log.

Process prerequisites

First, I created an internal account for my Postgresql database. This account has only rights to connect from the database server itself (or localhost). All passwords were disabled for this account, only “trust” is present in pg_hba.conf. Select rights were also granted.
Then, I created a dedicated directory on the database server to hold the dump files. This directory will only hold the dump files for very limited time, they will be deleted after been pushed on S3 with s3cmd. I recommend to store these temp dump files on a separate and dedicated EBS volume.
Last, we need a minimum setup on S3 side. Create the appropriate bucket hierarchy you need. Mine is quite simple for this backup process :

1st level : /postgresql-dumps
- 2nd level : product name : “pilotroi”, the product name of my company (we have several products)
  - 2nd level : /database name, one bucket by database. This is the second parameter I will pass to the shell script. For flexibility, I use the exact same name as for the postgres database I wan to dump. This allows me to build my bucket endpoint like :
    - s3://postgresql-dumps/pilotroi/$BASE/"

Custom script

Below is my custom script. A lot of “echo”, hum ? As I said, I don’t like surprises and prefer having verbose logfiles. This script is cronted and needs two parameters :

Database name : the name of the db you need to dump.
Splits : the size of the chunks you want to generate. For instance, 50m is for generating files having a size of 50 Mo max.

Call the script like that, using two params ($1 and $2) : sh s3_pg_dump.sh client_base 50m

#!/bin/bash

################################## 
#                                # 
# POSTGRES DUMPS FOR PILOTROI    # 
# Vincent Teyssier               # 
# 19/11/2010                     # 
#                                # 
##################################

echo "******************************************************" 
echo "Database dump is starting at : " `date` 
echo "******************************************************"

PG_DUMP="/usr/lib/postgresql/8.4/bin/pg_dump" 
S3PUT="/mnt/postgres/s3tools/s3cmd-1.0.0-rc1/s3cmd"

BASE=$1 
SPLITS=$2 
DUMP_FILE="/mnt/postgres/dumps/$BASE.gz" 
S3ENDPOINT="s3://postgresql-dumps/pilotroi/$BASE/"

echo "*****************************" 
echo "Parameters : " 
echo "Base : " $BASE 
echo "Splits : " $SPLITS 
echo "S3 Endpoint : " $S3ENDPOINT 
echo "*****************************"

echo "Dump started at " `date` 
su - postgres -c "$PG_DUMP $BASE | gzip | split -b $SPLITS - $DUMP_FILE" 
echo "Dump ended at " `date` 
echo "*****************************" 
echo "Send to S3 started at " `date` 
$S3PUT put $DUMP_FILE* $S3ENDPOINT 
echo "Send ended at " `date` 
echo "*****************************" 
echo "Deleting local dump files" 
rm /mnt/postgres/dumps/$BASE.gz*

echo "******************************************************" 
echo "Database dump is finished at : " `date` 
echo "******************************************************"

Process output

This script file will produce the following output.

Data is now on S3

Using S3 Explorer Firefox plugin, you can see all our files are now stored in their dedicated bucket on S3.

As usual, drop me a message to learn more if needed.

5 comments:

Anonymous said...: О! Este es un maravilloso recurso útil que usted está proporcionando y que le dan gratis ausente en el precio. Me encanta ver los sitios web, que comprenden el valor de proporcionar un recurso de calidad útiles de forma gratuita. Se? S el viejo lo que va, vuelve el programa.; 5/12/10 02:37
Anonymous said...: Je veux juste dire que c'est un super blog vous êtes arrivé ici! J'ai fait le tour de pas mal de temps, mais a finalement décidé de montrer mon appréciation de votre travail! Thumbs up, et continuer!; 15/12/10 13:40
Anonymous said...: Hey, I attempted to email you pertaining to this post but aren?t able to reach you. Please e-mail me when get a moment. Thanks..; 22/1/11 07:11
Anonymous said...: Thank you, nice job! This was the stuff I had to have.; 15/2/11 13:16
Anonymous said...: Very good post. I am going through a few of these issues as well..; 20/1/13 23:20

Open BI

Monday, 22 November 2010

Postgresql dumps and storage on S3

Tools of the trade : pgdump and S3CMD

Process prerequisites

Custom script

Process output

Data is now on S3

5 comments:

Who am I ?

Open Source BI Tools

Open Source BI Sites

Articles

Popular Posts