I’ve been working with ElasticSearch for one year now. It’s a great index and distributed search engine, based on Apache Lucene. Behind the scene, ElasticSearch is document oriented datastore, with schemaless model, a restfull API and offering high availability on real time data. More to read here for more infos.
I’ve been loading data from relational or noSQL (MongoDB), using both API or bulk features. Another interesting way of loading data from a noSQL database is the river. Loading data is fine, but you will probably face the challenge to synchronise / replicate data from one environment to another (from PROD to DEV for instance, in order to allow development on fresh and meaning full data).
I was working on that challenge when I finally found something really good : the ElasticSearch Exporter !
How to easily move / copy indexes from one cluster/machine to another ?
Ravi Gairola developped and released the ElasticSearch Exporter. This small script is available on github here.ElasticSearch Exporter will allow you, with only a single line of shell, to :
- Export to ElasticSearch or (compressed) flat files
- Recreates mapping on target
- Filter source data by query
- Specify scope as type, index or whole cluster
- Sync Index settings along with existing mappings
- Run in test mode without modifying any data
Install and usage
ElasticSearch Exporter needs a nodeJS server (v0.10 minimum) with the following modules : nomnom, colors. Install can be done with npm.Let’s go, install node (ubuntu) :
sudo apt-get update sudo apt-get install -y python-software-properties python g++ make sudo add-apt-repository ppa:chris-lea/node.js sudo apt-get update sudo apt-get install nodejs
Let’s add some needed modules :
npm install colors npm install nomnom
wget https://github.com/mallocator/Elasticsearch-Exporter/archive/master.zip unzip master.zip cd Elasticsearch-Exporter-master
node exporter.js -a 12.23.45.67 -p 1201 -i source_idx_name -b 34.54.23.13 -p 1201 -j dest_idx_name
- -a : Source IP (machine A)
- -p : Source port (machine A)
- -i : Source index (machine A)
- -b : Destination IP (machine B)
- -j : Destination index (machine B)
(…) Processed 118100 of 119923 entries (98%) Processed 118200 of 119923 entries (99%) Processed 118300 of 119923 entries (99%) Processed 118400 of 119923 entries (99%) Processed 118500 of 119923 entries (99%) Processed 118600 of 119923 entries (99%) Processed 118700 of 119923 entries (99%) Processed 118800 of 119923 entries (99%) Processed 118900 of 119923 entries (99%) Processed 119000 of 119923 entries (99%) Number of calls: 2430 Fetched Entries: 119923 documents Processed Entries: 119923 documents Source DB Size: 119923 documents
Then, you are done : your index has moved from machine A to machine B.
Of course, you have plenty of other configurations, have a look to the github page.
Easy, simple, efficient and free. As we love it.
1 comment:
What advantages over ElasticSearch tool named stream2es: https://github.com/elasticsearch/stream2es ?
Post a Comment