Easy Backup and Restore for you ElasticSearch Entities

data-475553_640

Elastic search is a great search/discovery engine from elastic.co. It is simple to set up, [What isn’t nowadays thanks to Docker] feature rich Apache Lucene based search engine in many ways similar to Apache Solr.

Traditionally Solr & ES are mostly used to provide more visibility to your data though adding search and browse capabilities. Often this is done by indexing an already existing application data to the search engine. In this kind of setup it might not be a big deal whether the index data is backed up or not because it is possible to rebuild the index from the master data when the bug in your application deletes the index data.

Recently,my gut tells me that the trend seems to be moving towards leaner approach where people are starting to use ES also as not only a search engine but also as the primary data storage layer for business application. I am not surprised because both Solr and ES are fast compared to many other data stores around, they are feature rich and ridiculously scalable data slicing and dicing platforms that have been proven to be production ready for ages now.

In this blog post I  am going to present a simple, still in many cases usable way to do backups of your index data to a remote server.

Elasticsearch offers nice snapshot/restore api that can be used to take backups of the index data. By default it can make remote backups to Shared File System Repository. Additional plugins also provide support also to store backups  to AWS, HDFS  and Azure. We’re going to use the Shared File System repository since it allows us to use any ssh accessible remote box as a backup server. I am also going to use Docker here which is used just for convenience.

Pull in ES container

docker pull elasticsearch:latest

Start it

docker run --cap-add SYS_ADMIN --device /dev/fuse -d -p 9200:9200 -p 9300:9300 elasticsearch

You should be able to see the elastic search server running as a container:

[sam@localhost srvr]$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
16f87173f7e9 elasticsearch:1 "/docker-entrypoint. 4 seconds ago Up 3 seconds 0.0.0.0:9200->9200/tcp, 0.0.0.0:9300->9300/tcp romantic_bell

The data storage is now ready to be used, Let’s index some content with httpie:

[sam@localhost srvr]$ http PUT http://127.0.0.1:9200/myapp/users/sam id=sam firstName=Sami lastName=Siren password=secret
[sam@localhost srvr]$ http PUT http://127.0.0.1:9200/myapp/users/foo id=foo firstName=Foo lastName=Bar password=secret

After indexing the data is searchable

[sam@localhost srvr]$ http http://127.0.0.1:9200/myapp/users/_search?q=bar
HTTP/1.1 200 OK
Content-Length: 277
Content-Type: application/json; charset=UTF-8

{
 "_shards": {
 "failed": 0, 
 "successful": 5, 
 "total": 5
 }, 
 "hits": {
 "hits": [
 {
 "_id": "foo", 
 "_index": "myapp", 
 "_score": 0.2169777, 
 "_source": {
 "firstName": "Foo", 
 "id": "foo", 
 "lastName": "Bar", 
 "password": "secret"
 }, 
 "_type": "users"
 }
 ], 
 "max_score": 0.2169777, 
 "total": 1
 }, 
 "timed_out": false, 
 "took": 4
}

And the entities can also be retrieved by id:

[sam@localhost srvr]$ http http://127.0.0.1:9200/myapp/users/sam
HTTP/1.1 200 OK
Content-Length: 161
Content-Type: application/json; charset=UTF-8
{
 "_id": "sam", 
 "_index": "myapp", 
 "_source": {
 "firstName": "Sami", 
 "id": "sam", 
 "lastName": "Siren", 
 "password": "secret"
 }, 
 "_type": "users", 
 "_version": 13, 
 "found": true
}

Excellent. Now let’s setup the shared filesystem that we use for backups. First we attach to the running container:

[sam@localhost srvr]$ docker exec -it 16f87173f7e9 bash

Next install ssh, sshfs, vim to the container

root@16f87173f7e9:/# apt-get update
root@16f87173f7e9:/# apt-get install openssh-client sshfs vim

Generate rsa keypair

root@16f87173f7e9:/# ssh-keygen

Deploy the generated public key to remote server

root@16f87173f7e9:/# ssh-copy-id es@x.x.x.x

Edit /etc/fstab to contain:

es@x.x.x.x:/home/es /backup fuse.sshfs noauto,x-systemd.automount,_netdev,users,idmap=user,IdentityFile=/root/.ssh/id_rsa,allow_other,reconnect 0 0

Create mount point for backup dir inside container and mount it

root@16f87173f7e9:/# mkdir /backup
root@16f87173f7e9:/# mount /backup
root@16f87173f7e9:/# chmod a+w /backup/

Now the docker container is fully up to speed and we can exit the container shell. Next we generate the repository container in ES so that it knows where to store backups.

[sam@localhost srvr]$ http PUT http://127.0.0.1:9200/_snapshot/backups type=fs settings:='{"location":"/backup/userbackup","compress":true}'
HTTP/1.1 200 OK
Content-Length: 21
Content-Type: application/json; charset=UTF-8

{
 "acknowledged": true
}

And create a backup from the current state:

[sam@localhost srvr]$ http PUT http://127.0.0.1:9200/_snapshot/backups/snapshot_1?wait_for_completion=true
HTTP/1.1 200 OK
Content-Length: 314
Content-Type: application/json; charset=UTF-8
{
 "snapshot": {
 "duration_in_millis": 16147, 
 "end_time": "2015-05-22T20:26:19.838Z", 
 "end_time_in_millis": 1432326379838, 
 "failures": [], 
 "indices": [
 "myapp"
 ], 
 "shards": {
 "failed": 0, 
 "successful": 5, 
 "total": 5
 }, 
 "snapshot": "snapshot_1", 
 "start_time": "2015-05-22T20:26:03.691Z", 
 "start_time_in_millis": 1432326363691, 
 "state": "SUCCESS"
 }
}
list all available snapshots:
http http://127.0.0.1:9200/_snapshot/backups/_all

Now that the index data is safely transferred to external box we can delete entity from out data storage

http DELETE http://localhost:9200/myapp/users/foo

…and verify it’s gone

[sam@localhost srvr]$ http get http://localhost:9200/myapp/users/foo
HTTP/1.1 404 Not Found
Content-Length: 60
Content-Type: application/json; charset=UTF-8
{
 "_id": "foo", 
 "_index": "myapp", 
 "_type": "users", 
 "found": false
}

The restore procedure starts with closing the index:

[sam@localhost srvr]$ http POST localhost:9200/myapp/_close
HTTP/1.1 200 OK
Content-Length: 21
Content-Type: application/json; charset=UTF-8

{
 "acknowledged": true
}

And then we can restore the data from the backup we took earlier

[sam@localhost srvr]$ http POST http://127.0.0.1:9200/_snapshot/backups/snapshot_1/_restore
HTTP/1.1 200 OK
Content-Length: 17
Content-Type: application/json; charset=UTF-8

{
 "accepted": true
}

And after opening the index again

[sam@localhost srvr]$ http POST localhost:9200/myapp/_open
HTTP/1.1 200 OK
Content-Length: 21
Content-Type: application/json; charset=UTF-8
{
 "acknowledged": true
}

We can verify that the data is indeed there

[sam@localhost srvr]$ http get http://localhost:9200/myapp/users/foo
HTTP/1.1 200 OK
Content-Length: 157
Content-Type: application/json; charset=UTF-8

{
 "_id": "foo", 
 "_index": "myapp", 
 "_source": {
 "firstName": "Foo", 
 "id": "foo", 
 "lastName": "Bar", 
 "password": "secret"
 }, 
 "_type": "users", 
 "_version": 1, 
 "found": true
}

As we can see the ES snapshot/restore functionality is pretty sleek and easy to use. I recommend you to read through the documentation as it contains lot more details and options than I have covered here.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s