Monday, 27 January 2014

ElasticSearch index migration

Hi all,
I’ve been working with ElasticSearch for one year now. It’s a great index and distributed search engine, based on Apache Lucene. Behind the scene, ElasticSearch is document oriented datastore, with schemaless model, a restfull API and offering high availability on real time data. More to read here for more infos.

I’ve been loading data from relational or noSQL (MongoDB), using both API or bulk features. Another interesting way of loading data from a noSQL database is the river. Loading data is fine, but you will probably face the challenge to synchronise / replicate data from one environment to another (from PROD to DEV for instance, in order to allow development on fresh and meaning full data).
I was working on that challenge when I finally found something really good : the ElasticSearch Exporter !

How to easily move / copy indexes from one cluster/machine to another ?

Ravi Gairola developped and released the ElasticSearch Exporter. This small script is available on github here.
ElasticSearch Exporter will allow you, with only a single line of shell, to :
  • Export to ElasticSearch or (compressed) flat files
  • Recreates mapping on target
  • Filter source data by query
  • Specify scope as type, index or whole cluster
  • Sync Index settings along with existing mappings
  • Run in test mode without modifying any data

Install and usage

ElasticSearch Exporter needs a nodeJS server (v0.10 minimum) with the following modules : nomnom, colors. Install can be done with npm.
Let’s go, install node (ubuntu) :
sudo apt-get update
sudo apt-get install -y python-software-properties python g++ make
sudo add-apt-repository ppa:chris-lea/node.js
sudo apt-get update
sudo apt-get install nodejs

Let’s add some needed modules :

npm install colors
npm install nomnom

Download and unpack ElasticSearch Exporter (master from github) :

wget https://github.com/mallocator/Elasticsearch-Exporter/archive/master.zip
unzip master.zip
cd Elasticsearch-Exporter-master

Start migrating an index from machine A to machine B :

node exporter.js -a -p 1201 -i source_idx_name -b -p 1201 -j dest_idx_name
  • -a : Source IP (machine A)
  • -p : Source port (machine A)
  • -i : Source index (machine A)
  • -b : Destination IP (machine B)
  • -j : Destination index (machine B)
You will see some progress lines (warn : could go deeeeep down on your shell window) and a summary at the end :

Processed 118100 of 119923 entries (98%)
Processed 118200 of 119923 entries (99%)
Processed 118300 of 119923 entries (99%)
Processed 118400 of 119923 entries (99%)
Processed 118500 of 119923 entries (99%)
Processed 118600 of 119923 entries (99%)
Processed 118700 of 119923 entries (99%)
Processed 118800 of 119923 entries (99%)
Processed 118900 of 119923 entries (99%)
Processed 119000 of 119923 entries (99%)
Number of calls:    2430
Fetched Entries:    119923 documents
Processed Entries:    119923 documents
Source DB Size:        119923 documents

Then, you are done : your index has moved from machine A to machine B.

Of course, you have plenty of other configurations, have a look to the github page.

Easy, simple, efficient and free. As we love it.

A big thanks to Ravi Gairola.

1 comment:

R. Flores said...

What advantages over ElasticSearch tool named stream2es: https://github.com/elasticsearch/stream2es ?