elasticsearch

Configuring Elasticsearch to Use S3 for Snapshots

By March 2, 2017 August 18th, 2022 No Comments

As data platforms continue to expand and transform there’s one aspect that never seems to change; everyone still wants a backup copy of their data! The evolution of technology still demands a need to have access to backups in order to restore to a local development environment, a copy for safe keeping or compliance, and various other use cases. This walkthrough will show you how to leverage the S3 repository plugin with your ObjectRocket for Elasticsearch instance.

Snapshot Components

Elasticsearch snapshots consist of three main components: a repository, snapshot(s), and a unique snapshot name. A repository is going to contain specific details about where and how the snapshot gets stored. Your default nightly ObjectRocket backups are going to be type: fs while S3 backups will be a type: S3. Each of these will have a slightly different settings structure. Here’s an example snippet of a repository:

GET /_snapshot?pretty
...
{
  "s3_repository" : {
    "type" : "s3",
    "settings" : {
      "bucket" : "MYBUCKETNAME",
      "server_side_encryption" : "false",
      "region" : "us-east-1",
      "compress" : "false"
    }
  }
}
...

Each repository displayed can consist of multiple snapshots which will be listed in an array:

GET /_snapshot/s3_repository/_all?pretty
[... 
  {
    "snapshot" : "20170208225601",
    "uuid" : "t6R6jxLJTIueQizv9clJYg",
    "version_id" : 5010499,
    "version" : "5.1.1",
    "indices" : [ ".triggered_watches", ".watch_history-2016.10.26", "elastalert_status", "coffee-2016.10.301", ".kibana", "coffee-2016.10.305", "coffee-2016.10.304", "coffee-2016.10.303", "coffee-2016.10.302", ".watches" 
    ],
    "state" : "SUCCESS",
    "start_time" : "2017-02-09T06:56:01.191Z",
    "start_time_in_millis" : 1486623361191,
    "end_time" : "2017-02-09T06:56:12.179Z",
    "end_time_in_millis" : 1486623372179,
    "duration_in_millis" : 10988,
    "failures" : [ ],
    "shards" : {
      "total" : 57,
      "failed" : 0,
      "successful" : 57
    }
  }]

This is important to keep in mind for all _snapshot operations as you’ll need to be able to reference the correct repository, snapshot, and snapshot identifier. From our example above, you would want to utilize “snapshot” : “20170208225601” as the unique identifier.

Sending Backups to S3

Getting your S3 repository setup with Elasticsearch is a relatively easy process. There are only a few prerequisites to sending your backups to your S3 bucket: install the repository-s3 plugin, ensure your cluster can externally reach S3, and have proper credentials (bucket, secret, and key) for S3. If you have an ObjectRocket Elasticsearch instance, the first two of these should be in place by default and you’ll need to “Bring Your Own Creds” for the third component. If you wish to restrict the Elasticsearch snapshot process, you can create a Custom Policy with the AWS IAM console. The Policy Document would need to look like the following (replacing MYBUCKETNAME):

{
  "Statement": [
    {
      "Action": [
        "s3:ListBucket",
        "s3:GetBucketLocation",
        "s3:ListBucketMultipartUploads",
        "s3:ListBucketVersions"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::MYBUCKETNAME"
      ]
    },
    {
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:DeleteObject",
        "s3:AbortMultipartUpload",
        "s3:ListMultipartUploadParts"
      ],
      "Effect": "Allow",
      "Resource": [
        "arn:aws:s3:::MYBUCKETNAME/*"
      ]
    }
  ],
  "Version": "2012-10-17"
}

With the prerequisites out of the way, the first step is to create your S3 repository:

PUT /_snapshot/s3_repository
{
  "type": "s3",
  "settings": {
    "bucket": "MYBUCKETNAME",
    "region": "us-east-1",
    "access_key": "KEY",
    "secret_key": "SECRET"
  }
}'

Once the repository has been created, you should be able to perform all of the standard _snapshot operations. If you wish to take a new snapshot simply hit the following endpoint and define your SNAPSHOT_NAME:

PUT /_snapshot/s3_repository/SNAPSHOT_NAME?wait_for_completion=false

Since snapshots can sometimes take a while to run, you may want to checkup on the status of an in-flight snapshot. Using this endpoint will show you more details about any currently running snapshot or restore operations:

GET /_snapshot/_status

Now let’s say “hypothetically” we deleted all of our indexes right after our last snapshot completed and quickly regretted our decision. To restore all indexes from a S3 snapshot perform the following:

POST /_snapshot/s3_repository/SNAPSHOT_NAME/_restore

If you need to selectively restore indexes you’ll want to modify the format a bit:

POST /_snapshot/s3_repository/SNAPSHOT_NAME/_restore
{
  "indices": "myindex_1,myindex_2",
  "ignore_unavailable": true
}

It is worth pointing out that only one snapshot or restore operation can be running on a cluster at a time. Additionally, snapshots have a slight performance impact on your cluster so please ensure your backup policy is not too aggressive! As always, if you run into any issues with these steps using your ObjectRocket Elasticsearch instance feel free to reach out to us at support@objectrocket.com.