Today, I'm excited to announce that we're starting to roll out our RocketScale™ autoscaling technology to all ObjectRocket for Elasticsearch customers. We will be rolling out in phases, so some of our users may see the feature pop up before others. Once we're complete, data nodes will be automatically added to your Elasticsearch clusters when they're short on capacity.
Uh... I thought you already did that?
Well, sort of. On the ObjectRocket platform, we've always relied on our support team to make sure that your cluster status stays in the green. Sometimes that would require adding a new data node to help a full cluster recover, so we would perform that addition manually for you. This relied on monitoring that looks for clusters in trouble and alerted our support team to help.
RocketScale not only automates the whole process, but it can also be more proactive to add capacity before the cluster ends up in trouble. It is also fully configurable, so you can have full control over how and when you would like new data nodes added to your cluster to meet your growth needs.
How it works
By default, all Elasticsearch clusters on the ObjectRocket platform are created with RocketScale enabled and set to scale at 85% usage. This means that if any data node hovers above 85% utilization for 10 minutes or more (which is configurable), a new data node will be added and you will be notified that a node has been added.
Once enabled on your cluster, the settings for your instance(s) are configurable in the ObjectRocket UI by clicking the "RocketScale" button at the top of your instance details screen. For starters, it's is super easy to configure and consists of two main settings:
- RocketScale Enabled: This checkbox allows you to turn RocketScale on/off for the instance
- Disk Usage Threshold: When any data node exceeds this % of storage used for an extended time, it will trigger adding a new node
However, if you want a little more control, you can tweak RocketScale even further by clicking on Advanced Settings, which will allow you to adjust three more values:
- Number of Minutes in Violation (Default = 10 minutes) - The time that a node must be over the Disk Usage Threshold before a new node is added
- Max Node Limit (Default = 12 nodes) - If the cluster size has exceeded this limit, RocketScale will not add any more nodes.
- Cooldown time (Default = 600 seconds) - The amount of time to wait between checks of disk usage.
Best Practices and Notes
We've built this feature to enable us to be even more responsive to the health of your cluster and to give you more control over how your instances are scaled. We've chosen defaults that will help keep your cluster out of red status and make sure that we're not scaling too aggressively, but there are some other best practices and notes you should be aware of.
If you don't want us to add nodes automatically
That's fine and you can just turn RocketScale off, but we still may manually add nodes if your cluster goes into a red status to help you avoid downtime. If you'd like to lock your cluster at a certain size with no exceptions, feel free to contact support and we can discuss your options and the implications.
If you're concerned about the growth of your cluster, perhaps a better solution would be to use Curator to keep your data set in check. Then, you can leave RocketScale enabled to avoid any trouble if you do see growth.
If your utilization is not even across data nodes
If you have one data node that's very full and the rest are not, that could be a sign that there's an issue with the size of your shards, your cluster settings, or your cluster plan size. RocketScale will detect most of these scenarios and alert us internally rather than adding a new data node, so one of our engineers can investigate the large imbalance in node utilization. Unless we can completely fix it on our end, we'll usually reach out to you when we get these alerts to propose solutions specific to your cluster.
This is common in customers that started small with the default number of shards (5) and have grown considerably without reindexing and/or modifying shard count. You end up with massive shards that are difficult to place and therefore you need an empty node to fit a shard even if you still have capacity on other nodes.
If this is the case for you, we will reach out to you to discuss the options. However, if you want to proactively discuss with us, feel free to reach out to ObjectRocket Support. We'll give you guidance on a good indexing strategy, Curator configuration, number of shards, and plan size to ensure you can minimize unused capacity.
Tuning the defaults
Our default settings should meet most clusters' needs, and our support team will actively tune these to meet your cluster's needs, but here are some example scenarios where you'd want to tune them:
- Quickly growing clusters
You may have a cluster that can grow very quickly with a spike of activity. You can tune the Number of Minutes in Violation and Cooldown time down a bit to ensure that we are able to respond very quickly.
- Larger clusters
You may have a cluster with a large number of data nodes beyond our default max of 12 nodes for RocketScale. We generally advise customers to go up a plan size and average less than 12 nodes per cluster, but if you have a specific case where more nodes is desirable, you can raise the default for Max Node Count to ensure we still scale your cluster as you grow.
- Large shards
If you've got an index with very large shards (>20% of node capacity per shard), we will generally recommend a plan resize and potentially reindexing. However, if for some reason you need to keep that layout, you can tune the Disk Usage Threshold down a bit to make sure you always have enough space to reallocate a shard.
This is a huge new feature for us and just an intro to what we want to do with RocketScale in the future. Check out our official RocketScale documentation, and as always send any feedback our way at firstname.lastname@example.org .