There’s probably been a time when you said to yourself I wish Elasticsearch did _____ by itself. Luckily for you, Curator fills in a lot of those blanks. Curator has the ability to remove or close old indexes, create and modify aliases, control your snapshots, and many other handy features. We’ll be highlighting a few key uses of Curator here, but feel free to read through the official documentation if you’re looking for additional guidance.
Curator At a High Level
Curator has been called a number of different things over time, but the goal has always been an external tool that could be used to automate the cleanup of Elasticsearch indexes. More functions have been added over time, but it still remains a very lightweight tool for managing the contents of the cluster.
Curator can be installed external to the Elasticsearch cluster on any system that can connect to an Elasticsearch client node. There are two main ways to run it:
- The curator command which runs a sequence of actions defined in a YAML “action file”
- The curator_cli command which runs a single command/action against the cluster
Whether running from an action file or running a single action on the command line, telling Curator what to do consists of three main components:
- Action: The action is the command to run against the index(es). Actions are things like close, delete, open, snapshot, etc.
- Filters: Filters define which indexes to perform the action on. Indexes can be filtered by age, name, aliases they have, whether they’re opened or closed, etc. Filters can be chained together and the action is performed against indexes that match all filters.
- Options: Options are just settings that can be passed to modify the way an action is performed. These can be specific to certain actions or general, like wait_for_completion
Once you have Curator configured, you can run from the command line, but it has no built in scheduling. Any schedules or repeated runs of Curator have to be handled by some other task scheduler or a local daemon, like cron.
Curator in the ObjectRocket UI
In the ObjectRocket platform, we’ve integrated Curator configuration into our UI and manage running Curator behind the scenes. In its current iteration, you’re able to close, delete, or create indexes at a desired interval based on the age and pattern filters. We are continuously adding new features; so if you need something that isn’t there, let us know.
To help illustrate this, I’ve created an instance and have setup our Twitter connector to capture data for any #coffee hashtags. Every day this Connector creates a new time-series index with the following naming convention coffee-*. As the days go by, disk usage has significantly grown, my heap usage is creeping higher, and there are loads of indexes named coffee-*.
While I could continue to scale my instance vertically or horizontally, I’ve determined that all coffee hashtags after 14 days are no longer relevant to me. To help save on paying for additional capacity or resources, I’ve decided to setup Curator through the UI to automatically remove any coffee-* indexes that are older than 14 days.
What you see in the UI is that:
- I’ve named this task “Coffee cleanup”
- The selected action is to delete indexes (vs. just closing them)
- Set the filters to only include indexes named coffee-* and older than 14 days
- This task is set to run every 5 minutes. I could also put it on a schedule and run once a day or less if I wanted to
- I don’t want to create a new index when we take the close/delete action, since the connector/Logstash will automatically do that
Once this has been setup, you don’t have to do anything else as our automation takes care of the rest behind the scenes.
We’re continuing to build out more Curator actions and filters in the near future, so stay tuned to our documentation page and this blog for further feature releases!
Running Curator Yourself
The ObjectRocket UI doesn’t (yet) have every feature of Curator built in. If you need a feature that we have not implemented yet, here’s how you can run it yourself.
You could replicate the actions above by following the steps below:
- Install the right version of Curator on whatever system you want (it will just need access to a Client node for your cluster)
- Create a configuration file that sets the hosts, port, SSL, and http auth settings.
- Create your action file with the delete action and using the age and pattern filters
- Set up a recurring task to run Curator on a schedule using cron, etc.
When all is said and done, you will need an action file similar to:
actions: 1: action: delete_indices description: "Coffee Cleanup" options: ignore_empty_list: True disable_action: False filters: - filtertype: kibana - filtertype: age source: creation_date direction: older unit: days unit_count: 14 - filtertype: pattern kind: regex value: 'coffee-*'
You can set the disable_action option to True if you want to test the effect of the action file.
Why Use It?
Hopefully by now you’ve started to see the power of Curator and how this can greatly aid in managing your instance. There are a whole bunch of benefits to Curator, but the main ones are:
- Easy Automation Curator gives you an easy way to automate management of your cluster without having to write your own tools or manually perform these tasks.
- Space Curator will allow you to tune and maintain your data retention policy and avoid paying for and managing a larger cluster than you need.
- Performance Curator can help you manage aliases and narrow down the indexes included in queries. Rather than querying across all of your timestamped indexes, you can have Curator manage an alias of just the index timeframes you care about and use that alias to narrow your searches.
As an example, when testing new product features we will increase our logging frequency. Curator has allowed us to help reduce index sprawl and kept our clusters healthy.
To review, we’ve talked about how Curator can help you manage your indexes, about how you can configure and run Curator both on the ObjectRocket platform or on your own, and finally the main benefits of Curator. The bottom line is that it’s just a dead simple tool to help you manage your data with minimal overhead and the more you use it, the more applications you’ll find.