ObjectRocket

ObjectRocket for Elasticsearch – Our Architecture

Anybody who looks at the ObjectRocket for Elasticsearch page should (hopefully) notice that we mention dedicated containers and our high-performance hardware environment a number of times. Last week we posted a performance comparison between ourselves and a couple of the other Elasticsearch services to give you some insight into the results of our design. Now that you’ve seen the results, I thought it was about time that we walked through the architecture, where we designed for performance, and the other considerations we had to keep in mind.

In this blog I’ll provide a quick overview of the resources you get with every ObjectRocket for Elasticsearch cluster, the function of those resources and why we’ve designed things the way we have. First up is the different roles that make up an Elasticsearch cluster.

Node Roles

If you’ve only run Elasticsearch on a local machine for testing purposes, you may not have realized that it has the ability baked in to split out various roles to independent hosts. By default, Elasticsearch will run every role on each host, but this can be configured per host as you grow. This is covered in pretty good detail on the Node  page in the official Elasticsearch docs, but the summary of the roles we use on ObjectRocket are:

What’s in an ObjectRocket for Elasticsearch Instance?

When you create an “Instance” in the ObjectRocket UI, what you’re actually getting is an 11-node cluster.

That’s right 11 nodes minimum, for every plan size, on their own containers on different hosts. We split these hosts up four ways:

Why?

With all of that being said, I’d like to address a few question about why we designed the service this way:

Why so many nodes?

The big reasons are performance, scale, and stability. When you pick your plan size, we want to make sure that all of that RAM and CPU go straight to search performance. If we had left coordinating or master workload on the data nodes, you could end up with resource contention in your cluster that could drive down overall performance while still leaving some nodes underutilized.

This also helps the cluster scale better, because the dedicated client and master nodes are sized to support many more data nodes than you may start with. This makes scaling as easy as adding a data node.

Finally, dedicated nodes for these different traffic types leads to a more stable cluster. You’ve got HA for every function of the Elasticsearch and Kibana cluster. You also don’t have to worry about a search traffic spike on an extra busy node choking out master functions. This is best practice in many production Elasticsearch clusters and that’s our target at ObjectRocket.

Why do I need three dedicated master nodes?

When you’re talking clustering, three’s the magic number. One leaves you with a single point of failure, two leaves you at risk of split brain. Three is the smallest number that makes sense for a stable cluster.

What about the four clients?

Back when we created this layout, ingest nodes did not exist, so these nodes managed security and acted as coordinating nodes. The reason we start with so many is redundancy and to ensure that in ingest scenarios that you have plenty of end points to spread your traffic across. There’s very little queueing on each coordinating node, so having multiple nodes to spread traffic to helps avoid the problem of a coordinating node backing up.

This also creates the opportunity in use cases where you want a dedicated endpoint for one part of your application to avoid congestion, while the rest of the application uses the other endpoints. Most Elasticsearch clients handle multiple endpoints out of the box, so the only configuration required on your end is dropping all of the client hostnames into your connection code.

Now, with the addition of ingest nodes in Elasticsearch 5.x, there are new ways to split up these clients. All client nodes are capable of accepting ingest pipelines and traffic, but you can only choose to point your application at a subset. This gives you a nice easy way of giving some dedicated horsepower to your ingest pipelines, while not bogging down the rest of your traffic.

How does this setup impact security?

One of the other advantages of this arrangement is that we’re able to keep all of the data on internal private networks. Only the client nodes and Kibana nodes are accessible from the public internet, AWS Direct Connect, or Rackspace ServiceNet. Both the Kibana and client nodes manage their own set of access control lists (ACLs) and user authentication to ensure that our service is secure at the edge. All of the data nodes and master nodes are isolated on private internal networks.

Conclusions

If you’ve read the whole blog, I applaud you and thank you for your time. The bottom line is that our goal at ObjectRocket is for everyone to have a fast, stable cluster that can scale. Elasticsearch has a lot of the tools to enable that baked in, but we’ve tweaked where we can, and that’s by putting together a cluster design that makes Elasticsearch and Kibana fly for you and scale easily as you grow. You focus on your app, we’ve got the data.

Exit mobile version