We hear it a lot. One of the most painful aspects of database maintenance are the upgrades. It can be scary to upgrade to the latest version of Elasticsearch or any datastore. The Elastic Stack is updated more regularly than most datastores and each release brings major new features, bug fixes, or enhancements. When evaluating an upgrade to a newer version, you may have some features you’d like to add and there may be some nagging bugs you’d like to squash. Other than that, your app is working and you don’t want to mess with a good thing and decide to stay on the version that’s working. So, you get farther behind and it becomes even harder to move on to the latest version. We talk to many businesses that find themselves in this situation; they use early versions of Elasticsearch and they’ve gotten really far behind. They feel stuck.
There are a lot of reasons it can be painful to upgrade you Elasticsearch instances:
- Downtime Minor version upgrades can generally complete without downtime, but if you’re moving across major versions, at a minimum you’ll need to do a full cluster restart, which brings down the cluster, or you may need to reindex your data, which can lead to even more complications.
- Rewriting Code Across versions of Elasticsearch, it’s not uncommon for major APIs to change, for features to be deprecated, or for field and mapping formats to change. From 1.x to 5.x especially, there have been major changes in data types, scripting, field naming conventions, percolators, and more. If you’re using all of those features, there’s a good chance some of your queries will need to change.
- Deprecated Features
You’ve worked hard to get Elasticsearch to do exactly what you want it to do. It can be overwhelming to think about losing those features that work so well for you today. For example, using Elasticsearch 6.0, the “_all” field and document types were removed. There are some workarounds, but those features are widely used and have now been removed from the product.
Why it makes sense to upgrade your Elasticsearch clusters
With any software, it’s important to upgrade Elasticsearch so that you can take advantage of new features and bug fixes.
- Stability: As stable as the 1.7 version has been for many customers, we’ve seen a number of issues over the years with how this version of Elasticsearch (and earlier) handles exceptions. These exceptions can cause Elasticsearch clusters to fall over despite all of the built in HA features. We’ve also learned first hand that recovery is far more problematic on older clusters.
- Features: It’s impressive how often the Elastic Stack is updated with new features and functionality. Many customers want to take advantage of the new scripting capabilities or ingest functionality of Elasticsearch 5.0. Kibana has also added big changes and enhancements over the years that are extremely beneficial for customers.
- Support: Already, 1.x versions of Elasticsearch have not been officially supported for almost a year. We’re now approaching end of support for 2.x versions in the next few months. At ObjectRocket, we generally support Elasticsearch instances well beyond when they’re officially supported, but support for bug fixes from the community falls off after support ceases.
Why we recommend upgrading to Elasticsearch 5.x
At this point in time we usually recommend an upgrade to 5.x. Why do we recommend 5.x? - It’s very stable - It has a number of important performance, stability, and feature improvements over previous versions - It will be supported for a long time. - There is a migration path from 5.6 to 6.x via a rolling restart.
We handle data migration and assist with Elasticsearch version upgrades for all of our customers. If you choose to do it yourself, there are a number of major changes that have happened between 1.x/2.x and 5.x that can result in breaking an existing 1.x/2.x installation upon upgrade. Here are a few tips and tricks from our experiences upgrading customers.
The Migration Plugin
The first thing you’ll want to do when investigating an upgrade of your cluster is to download and install the elasticsearch-migration plugin. The migration plugin checks mappings, index settings, and Lucene segment versions against changes between versions. The plugin highlights what needs to be fixed or any deprecated features that are being used before starting the migration process. It does NOT actually fix the issues, though. It also doesn’t check for index templates, so you’ll need to review all of Elasticsearch’s breaking changes documentation before making any of the required changes manually on the new version.
Version 2.x will support an index created in Elasticsearch 1.x, and Elasticsearch 5.x can support an index created in 2.x, but you must reindex if you’d like to use an index created before 2.0 in a 5.x or later cluster. This practice has continued into 6.0 as well. In the event you do need to reindex there are a few things to know.
Although there are ways to reindex manually in earlier versions of Elasticsearch, the Reindex API, introduced in Elasticsearch 2.3, is a life saver when it comes to reindexing. The reindex API allows you to reindex with a single API call and customize how the data is reindexed.
For an upgrade to 2.x, though not required, this provides an easy path you an easy path to get your pre-2.0. indexes current. You can: - Upgrade your older cluster to 2.3 or later, via either a full cluster restart or via a snapshot/restore - Use the reindex API to recreate the index to take advantage of the later version of Lucene and prepare you for 5.x
When upgrading to 5.x, Elasticsearch has a few more tools to make the reindexing a little more seamless. The first method is via migration plugin, but an extremely useful feature available in Elasticsearch 5.x is called “reindex from remote”. Reindex from remote allows you to create a new index in any cluster from data in a secondary cluster and the best part is that it works across versions of Elasticsearch with minimal downtime. In this case, you can: - Create an empty 5.x/6.x cluster - Use reindex from remote to grab data from your older cluster and reindex it in the new cluster - Repoint the application that uses Elasticsearch to the new cluster - Once services are no longer writing to the old cluster, do a second pass of reindex from remote to bring in any documents that were created during or after the initial reindex
Upgrading Elasticsearch is almost never simple. One big gotcha are changes in what’s allowed in fields between versions of Elasticsearch. This can cause docs to be dropped and the reindexing to fail. For example, Elasticsearch 1.x was extremely lenient about what a field could be named. You could have fields in the source named the same as Elasticsearch meta fields like “id”, “type”, etc. You could also use “.”s in field names, which is now handled differently. We’ve also seen differences in allowable field values between versions, as well.
The first way to deal with these is to use a script or the ingest pipeline functionality to rename, remove, or recast these field values to something usable in the newer version. Both ways are possible and it really just depends on where you’re most comfortable, be it the ingest pipeline or the various scripting languages ES supports.
The second way is to filter out the field names that you don’t want to reindex using the “_source” and “exclude” settings in the reindex API.
Upgrades are painful. We do this all the time. As you can see, upgrading Elasticsearch takes planning. When you work with ObjectRocket, we ensure your upgrade won’t fail and we can help ensure that there aren’t inconsistencies in your data. While Elasticsearch offers tools, we can handle all of the upgrade details for you.
Want to play around with a free trial of Elasticsearch 6 with Kibana? Get started and let us know if you have any questions.