Open BI: Deploy Mongodb replica set

Hi all,
Today, I’m going to summarize some easy steps to create a Mongodb replica set. Well, this won’t be a very detailed article, only a quick reminder on how to do. This won’t be a tutorial on how to install MongoDB since it is pretty easy and the web has already tons of tutorials about this.
This article is a serie about scaling MongoDB. The serie will cover all typical steps I had to go through over the past year :

Step 1 : creating a replica set from a single machine server (dev / test environment)
- this current article
Step 2 : creating a replica set on 2 and more servers (small infra / integration …))
- coming soon …
Step 3 : scaling mongo with sharding (high availability production environment)
- to come a bit later …

Quick reminder

MongoDB ?

Mongodb is and open source document database, part of the NoSQL paradigm. Main key features are :

document oriented storage,
full index support
replication and high availability
auto sharding
map/reduce
gridFS

Document oriented storage ?

Data is not stored in term of rows with a fixed schema. Data is stored under the form of document (json documents) and each of these documents can have their own schema. We talk about schema-less documents.

Our scenario

Imagine : we have a single AWS/EC2 server running a single MongoDB instance. For any good reason, we want to deploy a replica set. For instance, to plug an ElasticSearch river (article to come soon !) in order to feed a search index !
Let’s go from THIS …

… to THIS …

Let’s go for it

First, I assume you have an up and running MongoDB instance on a linux box, running one mongodb process. Here is what we are going to do :

1 - Stop the single running instance
2 - Duplicate the configuration file
3 - Update the configuration files
4 - Prepare the filesystem for MongoDB secondary instance
5 - Restart primary and secondary MongoDB instances
6 - Set up and activate the replication
7 – Play with it

1 - Let’s stop this currently running instance :

ps aux | grep mongod[b] : will show up mongodb pid ([pid])
- Trick : using grep mongod[b] will prevent your shell to print the grep itself.
kill –2 [pid] : will kill mongodb process properly

2 - Duplicate the configuration file called mongodb.conf :

mongodb is located here : /etc/mongodb.conf
cp mongodb.conf mongodb1.conf : will create a local copy, and allow you to have 2 mongos instances : mongodb (initial=primary) and mongodb1 (new=secondary)
Give the appropriate rights, depending of your installation (give same rights as the initial file is easy and quick for a simple deployment).

3 - Update the configuration files :

make the following change, in order to create a completely new mongodb ecosystem :
- dbpath=[path to your directory where the data will be written]
- logpath=[path to your directory where the logs will be written]
- port = 27018 : the first instance is running on 27017, so adding 1 and using 27018 is easy for your second instance
- replSet=rs1 : name for your replica set
- nojournal=true : optional

Now you have 2 configurations file : mongodb.conf (original one) and mongodb1.conf (newly created). Don’t forget to add the following to your original mongodb.conf file :

replSet=rs1 : name for your replica set. This name should be the same in the two configuration files.

Let’s have a quick overview of the two configuration files (simplified) :

4 – Prepare the filesystem for the new instance

Of course, you need to create the new directories for you secondary (new) instance. That means, according the picture above, creating :

/mnt/mongodb/mongodb1
/mnt/mongodb/mongodb1/logs

5 - Restart primary and secondary MongoDB instances

Easy. Here is the simple command line to start each instance :

For primary : mongod --fork --rest --config /etc/mongodb.conf
For secondary : mongod --fork --rest --config /etc/mongodb1.conf

Using –-fork will allow you to start mongo as a background task.
Then you shoud see your processes running from a simple top –c, like this (only one process here but you should have two) :

6 - Set up and activate the replication

Now it’s time to setup the replication process. For that purpose we will now connect to the primary mongo instance ! Here is the complete process.

mongo
rs.initiate()
rs.conf()
{
    "_id" : "rs1"
    "version" : 1,
    "members" : [
        {
            "_id" : 0,
            "host" "127.0.0.1:27017"
        }
    ]
}
rs.add("127.0.0.1:27018")

{ "ok" : 1 }

mongo, will start mongo shell for the primary instance, default port is 17017. Note that if you want to connect to the secondary instance, you need to specifiy the port like : mongo –port 27018
rs.initiate() will start the replica set configuration.
rs.conf() will print the replica set configuration. Here we can see we have replicat set named “rs1”, having only one member “_id” = 0 (which is the current primary instance).
rs.add(“127.0.0.1:27018”) will add a new member for the “rs1” replica set. You can see I named the new member with a string composed of the localhost IP adress and the port. Sometimes, you may need to use the hostname instead of the IP, especially if you work on AWS.
- As soon as the rs.add command has been processed, with success, the answer says “ok”:1

Now let’s check we have a replica set up and running. Let’s type the rs.conf() once again. It should display the output below. We can see we really have a replica set.

rs.conf()
{
    "_id" : "rs1"
    "version" : 1,
    "members" : [
        {
            "_id" : 0,
            "host" "127.0.0.1:27017"
        },
                {
            "_id" : 1
            "host" "127.0.0.1:27018"
        }
    ]
}

6 - Set up and activate the

Now you can play with the replication. Just create a database, a collection and insert some data/documents. Then connect to the secondary instance and tadam … your data has been replicated. If you are working on AWS, don’t forget to whitelist your 27018 port in order to have access to the secondary instance.

Conclusion

This was a VERY simple way to create a Mongo replica set on a single machine. Of course, this is not suitable for production environments. Consider this setup for prototyping, training or small devs. This was just the beginning of the journey.

In the next chapters, I will explain how to :

Add an Arbiter : this is highly recommended !! The Arbiter will be responsible to elect a primary instance for the replica set.
Adjust priority for a replica set member,
Create and use a replica set accross several instances
I’m also writing some text about setting up an ElasticSearch River to feed ElasticSearch indexes from a Mongodb database.

2 comments:

Dharshan said...: Thanks for the tips. One important thing that I would call out is that it is important to distribute your replica sets across availability zones for high availability. Some other thoughts when deploying on AWS - http://blog.mongodirector.com/10-questions-to-ask-and-answer-when-hosting-mongodb-on-aws/; 1/4/14 11:48
Anonymous said...: You should also elaborate on
- removing a replica
- switching a failed replica
- configuring client to load-balance between replica
- adding a new replica in a live set

Thanks; 2/4/14 01:44

Open BI

Tuesday, 1 April 2014

Deploy Mongodb replica set